Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.
Test for the compiler builtins __builtin_clz, __builtin_ctz, and
__builtin_popcount, and make use of these in preference to
handwritten C code if they're available. Create src/port
infrastructure for "leftmost one", "rightmost one", and "popcount"
so as to centralize these decisions.
On x86_64, __builtin_popcount generally won't make use of the POPCNT
opcode because that's not universally supported yet. Provide code
that checks CPUID and then calls POPCNT via asm() if available.
This requires indirecting through a function pointer, which is
an annoying amount of overhead for a one-instruction operation,
but it's probably not worth working harder than this for our
current use-cases.
I'm not sure we've found all the existing places that could profit
from this new infrastructure; but we at least touched all the
ones that used copied-and-pasted versions of the bitmapset.c code,
and got rid of multiple copies of the associated constant arrays.
While at it, replace c-compiler.m4's one-per-builtin-function
macros with a single one that can handle all the cases we need
to worry about so far. Also, because I'm paranoid, make those
checks into AC_LINK checks rather than just AC_COMPILE; the
former coding failed to verify that libgcc has support for the
builtin, in cases where it's not inline code.
David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* pg_bitutils.h
|
|
|
|
* Miscellaneous functions for bit-wise operations.
|
|
|
|
*
|
|
|
|
*
|
2021-01-02 19:06:25 +01:00
|
|
|
* Copyright (c) 2019-2021, PostgreSQL Global Development Group
|
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.
Test for the compiler builtins __builtin_clz, __builtin_ctz, and
__builtin_popcount, and make use of these in preference to
handwritten C code if they're available. Create src/port
infrastructure for "leftmost one", "rightmost one", and "popcount"
so as to centralize these decisions.
On x86_64, __builtin_popcount generally won't make use of the POPCNT
opcode because that's not universally supported yet. Provide code
that checks CPUID and then calls POPCNT via asm() if available.
This requires indirecting through a function pointer, which is
an annoying amount of overhead for a one-instruction operation,
but it's probably not worth working harder than this for our
current use-cases.
I'm not sure we've found all the existing places that could profit
from this new infrastructure; but we at least touched all the
ones that used copied-and-pasted versions of the bitmapset.c code,
and got rid of multiple copies of the associated constant arrays.
While at it, replace c-compiler.m4's one-per-builtin-function
macros with a single one that can handle all the cases we need
to worry about so far. Also, because I'm paranoid, make those
checks into AC_LINK checks rather than just AC_COMPILE; the
former coding failed to verify that libgcc has support for the
builtin, in cases where it's not inline code.
David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
|
|
|
*
|
|
|
|
* src/include/port/pg_bitutils.h
|
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#ifndef PG_BITUTILS_H
|
|
|
|
#define PG_BITUTILS_H
|
|
|
|
|
2020-04-08 06:55:03 +02:00
|
|
|
#ifndef FRONTEND
|
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.
Test for the compiler builtins __builtin_clz, __builtin_ctz, and
__builtin_popcount, and make use of these in preference to
handwritten C code if they're available. Create src/port
infrastructure for "leftmost one", "rightmost one", and "popcount"
so as to centralize these decisions.
On x86_64, __builtin_popcount generally won't make use of the POPCNT
opcode because that's not universally supported yet. Provide code
that checks CPUID and then calls POPCNT via asm() if available.
This requires indirecting through a function pointer, which is
an annoying amount of overhead for a one-instruction operation,
but it's probably not worth working harder than this for our
current use-cases.
I'm not sure we've found all the existing places that could profit
from this new infrastructure; but we at least touched all the
ones that used copied-and-pasted versions of the bitmapset.c code,
and got rid of multiple copies of the associated constant arrays.
While at it, replace c-compiler.m4's one-per-builtin-function
macros with a single one that can handle all the cases we need
to worry about so far. Also, because I'm paranoid, make those
checks into AC_LINK checks rather than just AC_COMPILE; the
former coding failed to verify that libgcc has support for the
builtin, in cases where it's not inline code.
David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
|
|
|
extern PGDLLIMPORT const uint8 pg_leftmost_one_pos[256];
|
|
|
|
extern PGDLLIMPORT const uint8 pg_rightmost_one_pos[256];
|
|
|
|
extern PGDLLIMPORT const uint8 pg_number_of_ones[256];
|
2020-04-08 06:55:03 +02:00
|
|
|
#else
|
|
|
|
extern const uint8 pg_leftmost_one_pos[256];
|
|
|
|
extern const uint8 pg_rightmost_one_pos[256];
|
|
|
|
extern const uint8 pg_number_of_ones[256];
|
|
|
|
#endif
|
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.
Test for the compiler builtins __builtin_clz, __builtin_ctz, and
__builtin_popcount, and make use of these in preference to
handwritten C code if they're available. Create src/port
infrastructure for "leftmost one", "rightmost one", and "popcount"
so as to centralize these decisions.
On x86_64, __builtin_popcount generally won't make use of the POPCNT
opcode because that's not universally supported yet. Provide code
that checks CPUID and then calls POPCNT via asm() if available.
This requires indirecting through a function pointer, which is
an annoying amount of overhead for a one-instruction operation,
but it's probably not worth working harder than this for our
current use-cases.
I'm not sure we've found all the existing places that could profit
from this new infrastructure; but we at least touched all the
ones that used copied-and-pasted versions of the bitmapset.c code,
and got rid of multiple copies of the associated constant arrays.
While at it, replace c-compiler.m4's one-per-builtin-function
macros with a single one that can handle all the cases we need
to worry about so far. Also, because I'm paranoid, make those
checks into AC_LINK checks rather than just AC_COMPILE; the
former coding failed to verify that libgcc has support for the
builtin, in cases where it's not inline code.
David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* pg_leftmost_one_pos32
|
|
|
|
* Returns the position of the most significant set bit in "word",
|
|
|
|
* measured from the least significant bit. word must not be 0.
|
|
|
|
*/
|
|
|
|
static inline int
|
|
|
|
pg_leftmost_one_pos32(uint32 word)
|
|
|
|
{
|
|
|
|
#ifdef HAVE__BUILTIN_CLZ
|
|
|
|
Assert(word != 0);
|
|
|
|
|
|
|
|
return 31 - __builtin_clz(word);
|
|
|
|
#else
|
|
|
|
int shift = 32 - 8;
|
|
|
|
|
|
|
|
Assert(word != 0);
|
|
|
|
|
|
|
|
while ((word >> shift) == 0)
|
|
|
|
shift -= 8;
|
|
|
|
|
|
|
|
return shift + pg_leftmost_one_pos[(word >> shift) & 255];
|
|
|
|
#endif /* HAVE__BUILTIN_CLZ */
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pg_leftmost_one_pos64
|
|
|
|
* As above, but for a 64-bit word.
|
|
|
|
*/
|
|
|
|
static inline int
|
|
|
|
pg_leftmost_one_pos64(uint64 word)
|
|
|
|
{
|
|
|
|
#ifdef HAVE__BUILTIN_CLZ
|
|
|
|
Assert(word != 0);
|
|
|
|
|
|
|
|
#if defined(HAVE_LONG_INT_64)
|
|
|
|
return 63 - __builtin_clzl(word);
|
|
|
|
#elif defined(HAVE_LONG_LONG_INT_64)
|
|
|
|
return 63 - __builtin_clzll(word);
|
|
|
|
#else
|
|
|
|
#error must have a working 64-bit integer datatype
|
|
|
|
#endif
|
|
|
|
#else /* !HAVE__BUILTIN_CLZ */
|
|
|
|
int shift = 64 - 8;
|
|
|
|
|
|
|
|
Assert(word != 0);
|
|
|
|
|
|
|
|
while ((word >> shift) == 0)
|
|
|
|
shift -= 8;
|
|
|
|
|
|
|
|
return shift + pg_leftmost_one_pos[(word >> shift) & 255];
|
2019-06-17 09:13:16 +02:00
|
|
|
#endif /* HAVE__BUILTIN_CLZ */
|
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.
Test for the compiler builtins __builtin_clz, __builtin_ctz, and
__builtin_popcount, and make use of these in preference to
handwritten C code if they're available. Create src/port
infrastructure for "leftmost one", "rightmost one", and "popcount"
so as to centralize these decisions.
On x86_64, __builtin_popcount generally won't make use of the POPCNT
opcode because that's not universally supported yet. Provide code
that checks CPUID and then calls POPCNT via asm() if available.
This requires indirecting through a function pointer, which is
an annoying amount of overhead for a one-instruction operation,
but it's probably not worth working harder than this for our
current use-cases.
I'm not sure we've found all the existing places that could profit
from this new infrastructure; but we at least touched all the
ones that used copied-and-pasted versions of the bitmapset.c code,
and got rid of multiple copies of the associated constant arrays.
While at it, replace c-compiler.m4's one-per-builtin-function
macros with a single one that can handle all the cases we need
to worry about so far. Also, because I'm paranoid, make those
checks into AC_LINK checks rather than just AC_COMPILE; the
former coding failed to verify that libgcc has support for the
builtin, in cases where it's not inline code.
David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pg_rightmost_one_pos32
|
|
|
|
* Returns the position of the least significant set bit in "word",
|
|
|
|
* measured from the least significant bit. word must not be 0.
|
|
|
|
*/
|
|
|
|
static inline int
|
|
|
|
pg_rightmost_one_pos32(uint32 word)
|
|
|
|
{
|
|
|
|
#ifdef HAVE__BUILTIN_CTZ
|
|
|
|
Assert(word != 0);
|
|
|
|
|
|
|
|
return __builtin_ctz(word);
|
|
|
|
#else
|
|
|
|
int result = 0;
|
|
|
|
|
|
|
|
Assert(word != 0);
|
|
|
|
|
|
|
|
while ((word & 255) == 0)
|
|
|
|
{
|
|
|
|
word >>= 8;
|
|
|
|
result += 8;
|
|
|
|
}
|
|
|
|
result += pg_rightmost_one_pos[word & 255];
|
|
|
|
return result;
|
|
|
|
#endif /* HAVE__BUILTIN_CTZ */
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pg_rightmost_one_pos64
|
|
|
|
* As above, but for a 64-bit word.
|
|
|
|
*/
|
|
|
|
static inline int
|
|
|
|
pg_rightmost_one_pos64(uint64 word)
|
|
|
|
{
|
|
|
|
#ifdef HAVE__BUILTIN_CTZ
|
|
|
|
Assert(word != 0);
|
|
|
|
|
|
|
|
#if defined(HAVE_LONG_INT_64)
|
|
|
|
return __builtin_ctzl(word);
|
|
|
|
#elif defined(HAVE_LONG_LONG_INT_64)
|
|
|
|
return __builtin_ctzll(word);
|
|
|
|
#else
|
|
|
|
#error must have a working 64-bit integer datatype
|
|
|
|
#endif
|
|
|
|
#else /* !HAVE__BUILTIN_CTZ */
|
|
|
|
int result = 0;
|
|
|
|
|
|
|
|
Assert(word != 0);
|
|
|
|
|
|
|
|
while ((word & 255) == 0)
|
|
|
|
{
|
|
|
|
word >>= 8;
|
|
|
|
result += 8;
|
|
|
|
}
|
|
|
|
result += pg_rightmost_one_pos[word & 255];
|
|
|
|
return result;
|
|
|
|
#endif /* HAVE__BUILTIN_CTZ */
|
|
|
|
}
|
|
|
|
|
2020-04-08 06:22:52 +02:00
|
|
|
/*
|
|
|
|
* pg_nextpower2_32
|
Get rid of artificial restriction on hash table sizes on Windows.
The point of introducing the hash_mem_multiplier GUC was to let users
reproduce the old behavior of hash aggregation, i.e. that it could use
more than work_mem at need. However, the implementation failed to get
the job done on Win64, where work_mem is clamped to 2GB to protect
various places that calculate memory sizes using "long int". As
written, the same clamp was applied to hash_mem. This resulted in
severe performance regressions for queries requiring a bit more than
2GB for hash aggregation, as they now spill to disk and there's no
way to stop that.
Getting rid of the work_mem restriction seems like a good idea, but
it's a big job and could not conceivably be back-patched. However,
there's only a fairly small number of places that are concerned with
the hash_mem value, and it turns out to be possible to remove the
restriction there without too much code churn or any ABI breaks.
So, let's do that for now to fix the regression, and leave the
larger task for another day.
This patch does introduce a bit more infrastructure that should help
with the larger task, namely pg_bitutils.h support for working with
size_t values.
Per gripe from Laurent Hasson. Back-patch to v13 where the
behavior change came in.
Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us
Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
2021-07-25 20:02:27 +02:00
|
|
|
* Returns the next higher power of 2 above 'num', or 'num' if it's
|
2020-04-08 06:22:52 +02:00
|
|
|
* already a power of 2.
|
|
|
|
*
|
|
|
|
* 'num' mustn't be 0 or be above PG_UINT32_MAX / 2 + 1.
|
|
|
|
*/
|
|
|
|
static inline uint32
|
|
|
|
pg_nextpower2_32(uint32 num)
|
|
|
|
{
|
|
|
|
Assert(num > 0 && num <= PG_UINT32_MAX / 2 + 1);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A power 2 number has only 1 bit set. Subtracting 1 from such a number
|
|
|
|
* will turn on all previous bits resulting in no common bits being set
|
|
|
|
* between num and num-1.
|
|
|
|
*/
|
|
|
|
if ((num & (num - 1)) == 0)
|
|
|
|
return num; /* already power 2 */
|
|
|
|
|
|
|
|
return ((uint32) 1) << (pg_leftmost_one_pos32(num) + 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pg_nextpower2_64
|
Get rid of artificial restriction on hash table sizes on Windows.
The point of introducing the hash_mem_multiplier GUC was to let users
reproduce the old behavior of hash aggregation, i.e. that it could use
more than work_mem at need. However, the implementation failed to get
the job done on Win64, where work_mem is clamped to 2GB to protect
various places that calculate memory sizes using "long int". As
written, the same clamp was applied to hash_mem. This resulted in
severe performance regressions for queries requiring a bit more than
2GB for hash aggregation, as they now spill to disk and there's no
way to stop that.
Getting rid of the work_mem restriction seems like a good idea, but
it's a big job and could not conceivably be back-patched. However,
there's only a fairly small number of places that are concerned with
the hash_mem value, and it turns out to be possible to remove the
restriction there without too much code churn or any ABI breaks.
So, let's do that for now to fix the regression, and leave the
larger task for another day.
This patch does introduce a bit more infrastructure that should help
with the larger task, namely pg_bitutils.h support for working with
size_t values.
Per gripe from Laurent Hasson. Back-patch to v13 where the
behavior change came in.
Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us
Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
2021-07-25 20:02:27 +02:00
|
|
|
* Returns the next higher power of 2 above 'num', or 'num' if it's
|
2020-04-08 06:22:52 +02:00
|
|
|
* already a power of 2.
|
|
|
|
*
|
|
|
|
* 'num' mustn't be 0 or be above PG_UINT64_MAX / 2 + 1.
|
|
|
|
*/
|
|
|
|
static inline uint64
|
|
|
|
pg_nextpower2_64(uint64 num)
|
|
|
|
{
|
|
|
|
Assert(num > 0 && num <= PG_UINT64_MAX / 2 + 1);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A power 2 number has only 1 bit set. Subtracting 1 from such a number
|
|
|
|
* will turn on all previous bits resulting in no common bits being set
|
|
|
|
* between num and num-1.
|
|
|
|
*/
|
|
|
|
if ((num & (num - 1)) == 0)
|
|
|
|
return num; /* already power 2 */
|
|
|
|
|
|
|
|
return ((uint64) 1) << (pg_leftmost_one_pos64(num) + 1);
|
|
|
|
}
|
|
|
|
|
Get rid of artificial restriction on hash table sizes on Windows.
The point of introducing the hash_mem_multiplier GUC was to let users
reproduce the old behavior of hash aggregation, i.e. that it could use
more than work_mem at need. However, the implementation failed to get
the job done on Win64, where work_mem is clamped to 2GB to protect
various places that calculate memory sizes using "long int". As
written, the same clamp was applied to hash_mem. This resulted in
severe performance regressions for queries requiring a bit more than
2GB for hash aggregation, as they now spill to disk and there's no
way to stop that.
Getting rid of the work_mem restriction seems like a good idea, but
it's a big job and could not conceivably be back-patched. However,
there's only a fairly small number of places that are concerned with
the hash_mem value, and it turns out to be possible to remove the
restriction there without too much code churn or any ABI breaks.
So, let's do that for now to fix the regression, and leave the
larger task for another day.
This patch does introduce a bit more infrastructure that should help
with the larger task, namely pg_bitutils.h support for working with
size_t values.
Per gripe from Laurent Hasson. Back-patch to v13 where the
behavior change came in.
Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us
Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
2021-07-25 20:02:27 +02:00
|
|
|
/*
|
|
|
|
* pg_nextpower2_size_t
|
|
|
|
* Returns the next higher power of 2 above 'num', for a size_t input.
|
|
|
|
*/
|
|
|
|
#if SIZEOF_SIZE_T == 4
|
|
|
|
#define pg_nextpower2_size_t(num) pg_nextpower2_32(num)
|
|
|
|
#else
|
|
|
|
#define pg_nextpower2_size_t(num) pg_nextpower2_64(num)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pg_prevpower2_32
|
|
|
|
* Returns the next lower power of 2 below 'num', or 'num' if it's
|
|
|
|
* already a power of 2.
|
|
|
|
*
|
|
|
|
* 'num' mustn't be 0.
|
|
|
|
*/
|
|
|
|
static inline uint32
|
|
|
|
pg_prevpower2_32(uint32 num)
|
|
|
|
{
|
|
|
|
return ((uint32) 1) << pg_leftmost_one_pos32(num);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pg_prevpower2_64
|
|
|
|
* Returns the next lower power of 2 below 'num', or 'num' if it's
|
|
|
|
* already a power of 2.
|
|
|
|
*
|
|
|
|
* 'num' mustn't be 0.
|
|
|
|
*/
|
|
|
|
static inline uint64
|
|
|
|
pg_prevpower2_64(uint64 num)
|
|
|
|
{
|
|
|
|
return ((uint64) 1) << pg_leftmost_one_pos64(num);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pg_prevpower2_size_t
|
|
|
|
* Returns the next lower power of 2 below 'num', for a size_t input.
|
|
|
|
*/
|
|
|
|
#if SIZEOF_SIZE_T == 4
|
|
|
|
#define pg_prevpower2_size_t(num) pg_prevpower2_32(num)
|
|
|
|
#else
|
|
|
|
#define pg_prevpower2_size_t(num) pg_prevpower2_64(num)
|
|
|
|
#endif
|
|
|
|
|
2020-04-08 06:22:52 +02:00
|
|
|
/*
|
|
|
|
* pg_ceil_log2_32
|
|
|
|
* Returns equivalent of ceil(log2(num))
|
|
|
|
*/
|
|
|
|
static inline uint32
|
|
|
|
pg_ceil_log2_32(uint32 num)
|
|
|
|
{
|
|
|
|
if (num < 2)
|
|
|
|
return 0;
|
|
|
|
else
|
|
|
|
return pg_leftmost_one_pos32(num - 1) + 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pg_ceil_log2_64
|
|
|
|
* Returns equivalent of ceil(log2(num))
|
|
|
|
*/
|
|
|
|
static inline uint64
|
|
|
|
pg_ceil_log2_64(uint64 num)
|
|
|
|
{
|
|
|
|
if (num < 2)
|
|
|
|
return 0;
|
|
|
|
else
|
|
|
|
return pg_leftmost_one_pos64(num - 1) + 1;
|
|
|
|
}
|
|
|
|
|
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.
Test for the compiler builtins __builtin_clz, __builtin_ctz, and
__builtin_popcount, and make use of these in preference to
handwritten C code if they're available. Create src/port
infrastructure for "leftmost one", "rightmost one", and "popcount"
so as to centralize these decisions.
On x86_64, __builtin_popcount generally won't make use of the POPCNT
opcode because that's not universally supported yet. Provide code
that checks CPUID and then calls POPCNT via asm() if available.
This requires indirecting through a function pointer, which is
an annoying amount of overhead for a one-instruction operation,
but it's probably not worth working harder than this for our
current use-cases.
I'm not sure we've found all the existing places that could profit
from this new infrastructure; but we at least touched all the
ones that used copied-and-pasted versions of the bitmapset.c code,
and got rid of multiple copies of the associated constant arrays.
While at it, replace c-compiler.m4's one-per-builtin-function
macros with a single one that can handle all the cases we need
to worry about so far. Also, because I'm paranoid, make those
checks into AC_LINK checks rather than just AC_COMPILE; the
former coding failed to verify that libgcc has support for the
builtin, in cases where it's not inline code.
David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
|
|
|
/* Count the number of one-bits in a uint32 or uint64 */
|
|
|
|
extern int (*pg_popcount32) (uint32 word);
|
|
|
|
extern int (*pg_popcount64) (uint64 word);
|
|
|
|
|
|
|
|
/* Count the number of one-bits in a byte array */
|
|
|
|
extern uint64 pg_popcount(const char *buf, int bytes);
|
|
|
|
|
2019-12-23 23:31:24 +01:00
|
|
|
/*
|
|
|
|
* Rotate the bits of "word" to the right by n bits.
|
|
|
|
*/
|
|
|
|
static inline uint32
|
|
|
|
pg_rotate_right32(uint32 word, int n)
|
|
|
|
{
|
|
|
|
return (word >> n) | (word << (sizeof(word) * BITS_PER_BYTE - n));
|
|
|
|
}
|
|
|
|
|
Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.
Test for the compiler builtins __builtin_clz, __builtin_ctz, and
__builtin_popcount, and make use of these in preference to
handwritten C code if they're available. Create src/port
infrastructure for "leftmost one", "rightmost one", and "popcount"
so as to centralize these decisions.
On x86_64, __builtin_popcount generally won't make use of the POPCNT
opcode because that's not universally supported yet. Provide code
that checks CPUID and then calls POPCNT via asm() if available.
This requires indirecting through a function pointer, which is
an annoying amount of overhead for a one-instruction operation,
but it's probably not worth working harder than this for our
current use-cases.
I'm not sure we've found all the existing places that could profit
from this new infrastructure; but we at least touched all the
ones that used copied-and-pasted versions of the bitmapset.c code,
and got rid of multiple copies of the associated constant arrays.
While at it, replace c-compiler.m4's one-per-builtin-function
macros with a single one that can handle all the cases we need
to worry about so far. Also, because I'm paranoid, make those
checks into AC_LINK checks rather than just AC_COMPILE; the
former coding failed to verify that libgcc has support for the
builtin, in cases where it's not inline code.
David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-16 05:22:27 +01:00
|
|
|
#endif /* PG_BITUTILS_H */
|