postgresql/src/include/port/pg_bitutils.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

295 lines
6.6 KiB
C
Raw Normal View History

Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
/*-------------------------------------------------------------------------
*
* pg_bitutils.h
* Miscellaneous functions for bit-wise operations.
*
*
* Copyright (c) 2019-2022, PostgreSQL Global Development Group
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
*
* src/include/port/pg_bitutils.h
*
*-------------------------------------------------------------------------
*/
#ifndef PG_BITUTILS_H
#define PG_BITUTILS_H
extern PGDLLIMPORT const uint8 pg_leftmost_one_pos[256];
extern PGDLLIMPORT const uint8 pg_rightmost_one_pos[256];
extern PGDLLIMPORT const uint8 pg_number_of_ones[256];
/*
* pg_leftmost_one_pos32
* Returns the position of the most significant set bit in "word",
* measured from the least significant bit. word must not be 0.
*/
static inline int
pg_leftmost_one_pos32(uint32 word)
{
#ifdef HAVE__BUILTIN_CLZ
Assert(word != 0);
return 31 - __builtin_clz(word);
#else
int shift = 32 - 8;
Assert(word != 0);
while ((word >> shift) == 0)
shift -= 8;
return shift + pg_leftmost_one_pos[(word >> shift) & 255];
#endif /* HAVE__BUILTIN_CLZ */
}
/*
* pg_leftmost_one_pos64
* As above, but for a 64-bit word.
*/
static inline int
pg_leftmost_one_pos64(uint64 word)
{
#ifdef HAVE__BUILTIN_CLZ
Assert(word != 0);
#if defined(HAVE_LONG_INT_64)
return 63 - __builtin_clzl(word);
#elif defined(HAVE_LONG_LONG_INT_64)
return 63 - __builtin_clzll(word);
#else
#error must have a working 64-bit integer datatype
#endif
#else /* !HAVE__BUILTIN_CLZ */
int shift = 64 - 8;
Assert(word != 0);
while ((word >> shift) == 0)
shift -= 8;
return shift + pg_leftmost_one_pos[(word >> shift) & 255];
#endif /* HAVE__BUILTIN_CLZ */
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
}
/*
* pg_rightmost_one_pos32
* Returns the position of the least significant set bit in "word",
* measured from the least significant bit. word must not be 0.
*/
static inline int
pg_rightmost_one_pos32(uint32 word)
{
#ifdef HAVE__BUILTIN_CTZ
Assert(word != 0);
return __builtin_ctz(word);
#else
int result = 0;
Assert(word != 0);
while ((word & 255) == 0)
{
word >>= 8;
result += 8;
}
result += pg_rightmost_one_pos[word & 255];
return result;
#endif /* HAVE__BUILTIN_CTZ */
}
/*
* pg_rightmost_one_pos64
* As above, but for a 64-bit word.
*/
static inline int
pg_rightmost_one_pos64(uint64 word)
{
#ifdef HAVE__BUILTIN_CTZ
Assert(word != 0);
#if defined(HAVE_LONG_INT_64)
return __builtin_ctzl(word);
#elif defined(HAVE_LONG_LONG_INT_64)
return __builtin_ctzll(word);
#else
#error must have a working 64-bit integer datatype
#endif
#else /* !HAVE__BUILTIN_CTZ */
int result = 0;
Assert(word != 0);
while ((word & 255) == 0)
{
word >>= 8;
result += 8;
}
result += pg_rightmost_one_pos[word & 255];
return result;
#endif /* HAVE__BUILTIN_CTZ */
}
/*
* pg_nextpower2_32
* Returns the next higher power of 2 above 'num', or 'num' if it's
* already a power of 2.
*
* 'num' mustn't be 0 or be above PG_UINT32_MAX / 2 + 1.
*/
static inline uint32
pg_nextpower2_32(uint32 num)
{
Assert(num > 0 && num <= PG_UINT32_MAX / 2 + 1);
/*
* A power 2 number has only 1 bit set. Subtracting 1 from such a number
* will turn on all previous bits resulting in no common bits being set
* between num and num-1.
*/
if ((num & (num - 1)) == 0)
return num; /* already power 2 */
return ((uint32) 1) << (pg_leftmost_one_pos32(num) + 1);
}
/*
* pg_nextpower2_64
* Returns the next higher power of 2 above 'num', or 'num' if it's
* already a power of 2.
*
* 'num' mustn't be 0 or be above PG_UINT64_MAX / 2 + 1.
*/
static inline uint64
pg_nextpower2_64(uint64 num)
{
Assert(num > 0 && num <= PG_UINT64_MAX / 2 + 1);
/*
* A power 2 number has only 1 bit set. Subtracting 1 from such a number
* will turn on all previous bits resulting in no common bits being set
* between num and num-1.
*/
if ((num & (num - 1)) == 0)
return num; /* already power 2 */
return ((uint64) 1) << (pg_leftmost_one_pos64(num) + 1);
}
/*
* pg_prevpower2_32
* Returns the next lower power of 2 below 'num', or 'num' if it's
* already a power of 2.
*
* 'num' mustn't be 0.
*/
static inline uint32
pg_prevpower2_32(uint32 num)
{
return ((uint32) 1) << pg_leftmost_one_pos32(num);
}
/*
* pg_prevpower2_64
* Returns the next lower power of 2 below 'num', or 'num' if it's
* already a power of 2.
*
* 'num' mustn't be 0.
*/
static inline uint64
pg_prevpower2_64(uint64 num)
{
return ((uint64) 1) << pg_leftmost_one_pos64(num);
}
/*
* pg_ceil_log2_32
* Returns equivalent of ceil(log2(num))
*/
static inline uint32
pg_ceil_log2_32(uint32 num)
{
if (num < 2)
return 0;
else
return pg_leftmost_one_pos32(num - 1) + 1;
}
/*
* pg_ceil_log2_64
* Returns equivalent of ceil(log2(num))
*/
static inline uint64
pg_ceil_log2_64(uint64 num)
{
if (num < 2)
return 0;
else
return pg_leftmost_one_pos64(num - 1) + 1;
}
/*
* With MSVC on x86_64 builds, try using native popcnt instructions via the
* __popcnt and __popcnt64 intrinsics. These don't work the same as GCC's
* __builtin_popcount* intrinsic functions as they always emit popcnt
* instructions.
*/
#if defined(_MSC_VER) && defined(_M_AMD64)
#define HAVE_X86_64_POPCNTQ
#endif
/*
* On x86_64, we can use the hardware popcount instruction, but only if
* we can verify that the CPU supports it via the cpuid instruction.
*
* Otherwise, we fall back to a hand-rolled implementation.
*/
#ifdef HAVE_X86_64_POPCNTQ
#if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID)
#define TRY_POPCNT_FAST 1
#endif
#endif
#ifdef TRY_POPCNT_FAST
/* Attempt to use the POPCNT instruction, but perform a runtime check first */
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
extern int (*pg_popcount32) (uint32 word);
extern int (*pg_popcount64) (uint64 word);
#else
/* Use a portable implementation -- no need for a function pointer. */
extern int pg_popcount32(uint32 word);
extern int pg_popcount64(uint64 word);
#endif /* TRY_POPCNT_FAST */
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
/* Count the number of one-bits in a byte array */
extern uint64 pg_popcount(const char *buf, int bytes);
/*
* Rotate the bits of "word" to the right/left by n bits.
*/
static inline uint32
pg_rotate_right32(uint32 word, int n)
{
return (word >> n) | (word << (32 - n));
}
static inline uint32
pg_rotate_left32(uint32 word, int n)
{
return (word << n) | (word >> (32 - n));
}
/* size_t variants of the above, as required */
#if SIZEOF_SIZE_T == 4
#define pg_leftmost_one_pos_size_t pg_leftmost_one_pos32
#define pg_nextpower2_size_t pg_nextpower2_32
#define pg_prevpower2_size_t pg_prevpower2_32
#else
#define pg_leftmost_one_pos_size_t pg_leftmost_one_pos64
#define pg_nextpower2_size_t pg_nextpower2_64
#define pg_prevpower2_size_t pg_prevpower2_64
#endif
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
#endif /* PG_BITUTILS_H */