postgresql/src/test/regress
John Naylor 911588a3f8 Add fast path for validating UTF-8 text
Our previous validator used a traditional algorithm that performed
comparison and branching one byte at a time. It's useful in that
we always know exactly how many bytes we have validated, but that
precision comes at a cost. Input validation can show up prominently
in profiles of COPY FROM, and future improvements to COPY FROM such
as parallelism or faster line parsing will put more pressure on input
validation. Hence, add fast paths for both ASCII and multibyte UTF-8:

Use bitwise operations to check 16 bytes at a time for ASCII. If
that fails, use a "shift-based" DFA on those bytes to handle the
general case, including multibyte. These paths are relatively free
of branches and thus robust against all kinds of byte patterns. With
these algorithms, UTF-8 validation is several times faster, depending
on platform and the input byte distribution.

The previous coding in pg_utf8_verifystr() is retained for short
strings and for when the fast path returns an error.

Review, performance testing, and additional hacking by: Heikki
Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and
Greg Stark

Discussion:
https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com
2021-12-20 10:07:29 -04:00
..
data Fix full text search to handle NOT above a phrase search correctly. 2020-04-27 12:21:04 -04:00
expected Add fast path for validating UTF-8 text 2021-12-20 10:07:29 -04:00
input Fix the public schema's permissions in a separate test script. 2021-12-17 16:22:26 -05:00
output Fix the public schema's permissions in a separate test script. 2021-12-17 16:22:26 -05:00
sql Add fast path for validating UTF-8 text 2021-12-20 10:07:29 -04:00
.gitignore Fix inconsistencies and typos in the tree, take 10 2019-08-13 13:53:41 +09:00
GNUmakefile Avoid creating testtablespace directories where not wanted. 2021-05-19 14:04:01 -04:00
Makefile Fix non-GNU makefiles for AIX make. 2017-11-30 00:57:22 -08:00
README
parallel_schedule Fix the public schema's permissions in a separate test script. 2021-12-17 16:22:26 -05:00
pg_regress.c Move Perl test modules to a better namespace 2021-10-24 10:28:19 -04:00
pg_regress.h Allow pg_regress.c wrappers to postprocess test result files. 2021-01-11 13:43:19 -05:00
pg_regress_main.c Allow configurable LZ4 TOAST compression. 2021-03-19 15:10:38 -04:00
regress.c Initial pgindent and pgperltidy run for v14. 2021-05-12 13:14:10 -04:00
regressplans.sh Fix inconsistencies in the code 2019-07-08 13:15:09 +09:00
resultmap Cygwin and Mingw floating-point fixes. 2019-02-16 01:50:16 +00:00
standby_schedule

README

Documentation concerning how to run these regression tests and interpret
the results can be found in the PostgreSQL manual, in the chapter
"Regression Tests".