Eliminate the separate 'len' variable from the loops, and also use the 4
byte instruction. This shaves off a few more cycles. Even though this
routine that uses the special SSE 4.2 instructions is much faster than a
generic routine, it's still a hot spot, so let's make it as fast as
possible.
Change the configure test to not test _mm_crc32_u64. That variant is only
available in the 64-bit x86-64 architecture, not in 32-bit x86. Modify
pg_comp_crc32c_sse42 so that it only uses _mm_crc32_u64 on x86-64. With
these changes, the SSE accelerated CRC-32C implementation can also be used
on 32-bit x86 systems.
This also fixes the 32-bit MSVC build.