Use Intel SSE 4.2 CRC instructions where available.
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.
Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)
Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
2015-04-14 16:05:03 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* pg_crc32c_sse42.c
|
|
|
|
* Compute CRC-32C checksum using Intel SSE 4.2 instructions.
|
|
|
|
*
|
|
|
|
* Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
|
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
|
|
|
* src/port/pg_crc32c_sse42.c
|
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "c.h"
|
|
|
|
|
|
|
|
#include "port/pg_crc32c.h"
|
|
|
|
|
|
|
|
#include <nmmintrin.h>
|
|
|
|
|
|
|
|
pg_crc32c
|
|
|
|
pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len)
|
|
|
|
{
|
|
|
|
const unsigned char *p = data;
|
2015-04-14 22:58:16 +02:00
|
|
|
const unsigned char *pend = p + len;
|
Use Intel SSE 4.2 CRC instructions where available.
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.
Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)
Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
2015-04-14 16:05:03 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Process eight bytes of data at a time.
|
|
|
|
*
|
2015-04-14 22:58:16 +02:00
|
|
|
* NB: We do unaligned accesses here. The Intel architecture allows that,
|
|
|
|
* and performance testing didn't show any performance gain from aligning
|
|
|
|
* the begin address.
|
Use Intel SSE 4.2 CRC instructions where available.
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.
Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)
Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
2015-04-14 16:05:03 +02:00
|
|
|
*/
|
2015-04-14 22:58:16 +02:00
|
|
|
#ifdef __x86_64__
|
|
|
|
while (p + 8 <= pend)
|
Use Intel SSE 4.2 CRC instructions where available.
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.
Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)
Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
2015-04-14 16:05:03 +02:00
|
|
|
{
|
2015-04-14 22:58:16 +02:00
|
|
|
crc = (uint32) _mm_crc32_u64(crc, *((const uint64 *) p));
|
|
|
|
p += 8;
|
Use Intel SSE 4.2 CRC instructions where available.
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.
Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)
Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
2015-04-14 16:05:03 +02:00
|
|
|
}
|
|
|
|
|
2015-04-14 22:58:16 +02:00
|
|
|
/* Process remaining full four bytes if any */
|
|
|
|
if (p + 4 <= pend)
|
|
|
|
{
|
|
|
|
crc = _mm_crc32_u32(crc, *((const unsigned int *) p));
|
|
|
|
p += 4;
|
|
|
|
}
|
|
|
|
#else
|
2015-05-24 03:35:49 +02:00
|
|
|
|
Use Intel SSE 4.2 CRC instructions where available.
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.
Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)
Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
2015-04-14 16:05:03 +02:00
|
|
|
/*
|
2015-04-14 22:58:16 +02:00
|
|
|
* Process four bytes at a time. (The eight byte instruction is not
|
|
|
|
* available on the 32-bit x86 architecture).
|
Use Intel SSE 4.2 CRC instructions where available.
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.
Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)
Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
2015-04-14 16:05:03 +02:00
|
|
|
*/
|
2015-04-14 22:58:16 +02:00
|
|
|
while (p + 4 <= pend)
|
|
|
|
{
|
|
|
|
crc = _mm_crc32_u32(crc, *((const unsigned int *) p));
|
|
|
|
p += 4;
|
|
|
|
}
|
2015-05-24 03:35:49 +02:00
|
|
|
#endif /* __x86_64__ */
|
2015-04-14 22:58:16 +02:00
|
|
|
|
|
|
|
/* Process any remaining bytes one at a time. */
|
|
|
|
while (p < pend)
|
Use Intel SSE 4.2 CRC instructions where available.
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.
Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)
Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
2015-04-14 16:05:03 +02:00
|
|
|
{
|
2015-04-14 22:58:16 +02:00
|
|
|
crc = _mm_crc32_u8(crc, *p);
|
|
|
|
p++;
|
Use Intel SSE 4.2 CRC instructions where available.
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.
Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)
Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
2015-04-14 16:05:03 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
return crc;
|
|
|
|
}
|