Go to file
John Naylor 911588a3f8 Add fast path for validating UTF-8 text
Our previous validator used a traditional algorithm that performed
comparison and branching one byte at a time. It's useful in that
we always know exactly how many bytes we have validated, but that
precision comes at a cost. Input validation can show up prominently
in profiles of COPY FROM, and future improvements to COPY FROM such
as parallelism or faster line parsing will put more pressure on input
validation. Hence, add fast paths for both ASCII and multibyte UTF-8:

Use bitwise operations to check 16 bytes at a time for ASCII. If
that fails, use a "shift-based" DFA on those bytes to handle the
general case, including multibyte. These paths are relatively free
of branches and thus robust against all kinds of byte patterns. With
these algorithms, UTF-8 validation is several times faster, depending
on platform and the input byte distribution.

The previous coding in pg_utf8_verifystr() is retained for short
strings and for when the fast path returns an error.

Review, performance testing, and additional hacking by: Heikki
Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and
Greg Stark

Discussion:
https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com
2021-12-20 10:07:29 -04:00
config Pacify perlcritic. 2021-11-22 15:57:31 -05:00
contrib Remove assertion for replication origins in PREPARE TRANSACTION 2021-12-14 10:58:15 +09:00
doc doc: More documentation on regular expressions and SQL standard 2021-12-20 10:36:44 +01:00
src Add fast path for validating UTF-8 text 2021-12-20 10:07:29 -04:00
.dir-locals.el
.editorconfig Add .editorconfig 2019-12-18 09:13:13 +01:00
.git-blame-ignore-revs Add another old commit to git-blame-ignore-revs. 2021-11-03 17:34:19 -07:00
.gitattributes gitattributes: Add new entry to silence whitespace error 2021-06-05 07:57:31 +02:00
.gitignore
aclocal.m4 Probe $PROVE not $PERL while checking for modules needed by TAP tests. 2021-11-22 12:54:52 -05:00
configure Check for STATUS_DELETE_PENDING on Windows. 2021-12-10 16:19:43 +13:00
configure.ac Check for STATUS_DELETE_PENDING on Windows. 2021-12-10 16:19:43 +13:00
COPYRIGHT Update copyright for 2021 2021-01-02 13:06:25 -05:00
GNUmakefile.in add missing tag from commit b8c4261e5e 2021-07-01 15:47:46 -04:00
HISTORY Canonicalize some URLs 2020-02-10 20:47:50 +01:00
Makefile
README Canonicalize some URLs 2020-02-10 20:47:50 +01:00
README.git Canonicalize some URLs 2020-02-10 20:47:50 +01:00

PostgreSQL Database Management System
=====================================

This directory contains the source code distribution of the PostgreSQL
database management system.

PostgreSQL is an advanced object-relational database management system
that supports an extended subset of the SQL standard, including
transactions, foreign keys, subqueries, triggers, user-defined types
and functions.  This distribution also contains C language bindings.

PostgreSQL has many language interfaces, many of which are listed here:

	https://www.postgresql.org/download/

See the file INSTALL for instructions on how to build and install
PostgreSQL.  That file also lists supported operating systems and
hardware platforms and contains information regarding any other
software packages that are required to build or run the PostgreSQL
system.  Copyright and license information can be found in the
file COPYRIGHT.  A comprehensive documentation set is included in this
distribution; it can be read as described in the installation
instructions.

The latest version of this software may be obtained at
https://www.postgresql.org/download/.  For more information look at our
web site located at https://www.postgresql.org/.