postgresql/src/include/port/pg_bitutils.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

273 lines
5.9 KiB
C
Raw Normal View History

Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
/*-------------------------------------------------------------------------
*
* pg_bitutils.h
* Miscellaneous functions for bit-wise operations.
*
*
* Copyright (c) 2019-2021, PostgreSQL Global Development Group
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
*
* src/include/port/pg_bitutils.h
*
*-------------------------------------------------------------------------
*/
#ifndef PG_BITUTILS_H
#define PG_BITUTILS_H
#ifndef FRONTEND
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
extern PGDLLIMPORT const uint8 pg_leftmost_one_pos[256];
extern PGDLLIMPORT const uint8 pg_rightmost_one_pos[256];
extern PGDLLIMPORT const uint8 pg_number_of_ones[256];
#else
extern const uint8 pg_leftmost_one_pos[256];
extern const uint8 pg_rightmost_one_pos[256];
extern const uint8 pg_number_of_ones[256];
#endif
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
/*
* pg_leftmost_one_pos32
* Returns the position of the most significant set bit in "word",
* measured from the least significant bit. word must not be 0.
*/
static inline int
pg_leftmost_one_pos32(uint32 word)
{
#ifdef HAVE__BUILTIN_CLZ
Assert(word != 0);
return 31 - __builtin_clz(word);
#else
int shift = 32 - 8;
Assert(word != 0);
while ((word >> shift) == 0)
shift -= 8;
return shift + pg_leftmost_one_pos[(word >> shift) & 255];
#endif /* HAVE__BUILTIN_CLZ */
}
/*
* pg_leftmost_one_pos64
* As above, but for a 64-bit word.
*/
static inline int
pg_leftmost_one_pos64(uint64 word)
{
#ifdef HAVE__BUILTIN_CLZ
Assert(word != 0);
#if defined(HAVE_LONG_INT_64)
return 63 - __builtin_clzl(word);
#elif defined(HAVE_LONG_LONG_INT_64)
return 63 - __builtin_clzll(word);
#else
#error must have a working 64-bit integer datatype
#endif
#else /* !HAVE__BUILTIN_CLZ */
int shift = 64 - 8;
Assert(word != 0);
while ((word >> shift) == 0)
shift -= 8;
return shift + pg_leftmost_one_pos[(word >> shift) & 255];
#endif /* HAVE__BUILTIN_CLZ */
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
}
/*
* pg_rightmost_one_pos32
* Returns the position of the least significant set bit in "word",
* measured from the least significant bit. word must not be 0.
*/
static inline int
pg_rightmost_one_pos32(uint32 word)
{
#ifdef HAVE__BUILTIN_CTZ
Assert(word != 0);
return __builtin_ctz(word);
#else
int result = 0;
Assert(word != 0);
while ((word & 255) == 0)
{
word >>= 8;
result += 8;
}
result += pg_rightmost_one_pos[word & 255];
return result;
#endif /* HAVE__BUILTIN_CTZ */
}
/*
* pg_rightmost_one_pos64
* As above, but for a 64-bit word.
*/
static inline int
pg_rightmost_one_pos64(uint64 word)
{
#ifdef HAVE__BUILTIN_CTZ
Assert(word != 0);
#if defined(HAVE_LONG_INT_64)
return __builtin_ctzl(word);
#elif defined(HAVE_LONG_LONG_INT_64)
return __builtin_ctzll(word);
#else
#error must have a working 64-bit integer datatype
#endif
#else /* !HAVE__BUILTIN_CTZ */
int result = 0;
Assert(word != 0);
while ((word & 255) == 0)
{
word >>= 8;
result += 8;
}
result += pg_rightmost_one_pos[word & 255];
return result;
#endif /* HAVE__BUILTIN_CTZ */
}
/*
* pg_nextpower2_32
* Returns the next higher power of 2 above 'num', or 'num' if it's
* already a power of 2.
*
* 'num' mustn't be 0 or be above PG_UINT32_MAX / 2 + 1.
*/
static inline uint32
pg_nextpower2_32(uint32 num)
{
Assert(num > 0 && num <= PG_UINT32_MAX / 2 + 1);
/*
* A power 2 number has only 1 bit set. Subtracting 1 from such a number
* will turn on all previous bits resulting in no common bits being set
* between num and num-1.
*/
if ((num & (num - 1)) == 0)
return num; /* already power 2 */
return ((uint32) 1) << (pg_leftmost_one_pos32(num) + 1);
}
/*
* pg_nextpower2_64
* Returns the next higher power of 2 above 'num', or 'num' if it's
* already a power of 2.
*
* 'num' mustn't be 0 or be above PG_UINT64_MAX / 2 + 1.
*/
static inline uint64
pg_nextpower2_64(uint64 num)
{
Assert(num > 0 && num <= PG_UINT64_MAX / 2 + 1);
/*
* A power 2 number has only 1 bit set. Subtracting 1 from such a number
* will turn on all previous bits resulting in no common bits being set
* between num and num-1.
*/
if ((num & (num - 1)) == 0)
return num; /* already power 2 */
return ((uint64) 1) << (pg_leftmost_one_pos64(num) + 1);
}
/*
* pg_nextpower2_size_t
* Returns the next higher power of 2 above 'num', for a size_t input.
*/
#if SIZEOF_SIZE_T == 4
#define pg_nextpower2_size_t(num) pg_nextpower2_32(num)
#else
#define pg_nextpower2_size_t(num) pg_nextpower2_64(num)
#endif
/*
* pg_prevpower2_32
* Returns the next lower power of 2 below 'num', or 'num' if it's
* already a power of 2.
*
* 'num' mustn't be 0.
*/
static inline uint32
pg_prevpower2_32(uint32 num)
{
return ((uint32) 1) << pg_leftmost_one_pos32(num);
}
/*
* pg_prevpower2_64
* Returns the next lower power of 2 below 'num', or 'num' if it's
* already a power of 2.
*
* 'num' mustn't be 0.
*/
static inline uint64
pg_prevpower2_64(uint64 num)
{
return ((uint64) 1) << pg_leftmost_one_pos64(num);
}
/*
* pg_prevpower2_size_t
* Returns the next lower power of 2 below 'num', for a size_t input.
*/
#if SIZEOF_SIZE_T == 4
#define pg_prevpower2_size_t(num) pg_prevpower2_32(num)
#else
#define pg_prevpower2_size_t(num) pg_prevpower2_64(num)
#endif
/*
* pg_ceil_log2_32
* Returns equivalent of ceil(log2(num))
*/
static inline uint32
pg_ceil_log2_32(uint32 num)
{
if (num < 2)
return 0;
else
return pg_leftmost_one_pos32(num - 1) + 1;
}
/*
* pg_ceil_log2_64
* Returns equivalent of ceil(log2(num))
*/
static inline uint64
pg_ceil_log2_64(uint64 num)
{
if (num < 2)
return 0;
else
return pg_leftmost_one_pos64(num - 1) + 1;
}
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
/* Count the number of one-bits in a uint32 or uint64 */
extern int (*pg_popcount32) (uint32 word);
extern int (*pg_popcount64) (uint64 word);
/* Count the number of one-bits in a byte array */
extern uint64 pg_popcount(const char *buf, int bytes);
/*
* Rotate the bits of "word" to the right by n bits.
*/
static inline uint32
pg_rotate_right32(uint32 word, int n)
{
return (word >> n) | (word << (sizeof(word) * BITS_PER_BYTE - n));
}
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT. Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
#endif /* PG_BITUTILS_H */