postgresql/src/test/regress/sql
John Naylor 911588a3f8 Add fast path for validating UTF-8 text
Our previous validator used a traditional algorithm that performed
comparison and branching one byte at a time. It's useful in that
we always know exactly how many bytes we have validated, but that
precision comes at a cost. Input validation can show up prominently
in profiles of COPY FROM, and future improvements to COPY FROM such
as parallelism or faster line parsing will put more pressure on input
validation. Hence, add fast paths for both ASCII and multibyte UTF-8:

Use bitwise operations to check 16 bytes at a time for ASCII. If
that fails, use a "shift-based" DFA on those bytes to handle the
general case, including multibyte. These paths are relatively free
of branches and thus robust against all kinds of byte patterns. With
these algorithms, UTF-8 validation is several times faster, depending
on platform and the input byte distribution.

The previous coding in pg_utf8_verifystr() is retained for short
strings and for when the fast path returns an error.

Review, performance testing, and additional hacking by: Heikki
Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and
Greg Stark

Discussion:
https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com
2021-12-20 10:07:29 -04:00
..
.gitignore Replace opr_sanity test's binary_coercible() function with C code. 2021-05-11 14:28:11 -04:00
advisory_lock.sql
aggregates.sql Fix check_agg_arguments' examination of aggregate FILTER clauses. 2021-08-18 18:12:51 -04:00
alter_generic.sql Implement operator class parameters 2020-03-30 19:17:23 +03:00
alter_operator.sql Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers. 2018-01-26 18:25:14 -05:00
alter_table.sql Allow publishing the tables of schema. 2021-10-27 07:44:52 +05:30
amutils.sql Add support for nearest-neighbor (KNN) searches to SP-GiST 2018-09-19 01:54:10 +03:00
arrays.sql Add trim_array() function. 2021-03-03 16:39:57 -05:00
async.sql Add new function pg_notification_queue_usage. 2015-07-17 09:12:03 -04:00
bit.sql Add bit_count SQL function 2021-03-23 10:13:58 +01:00
bitmapops.sql
boolean.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
box.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
brin.sql Move test for BRIN HOT behavior to stats.sql 2021-12-11 05:32:35 +01:00
brin_bloom.sql BRIN bloom indexes 2021-03-26 13:35:32 +01:00
brin_multi.sql Fix handling of NaN values in BRIN minmax multi 2021-11-06 01:50:44 +01:00
btree_index.sql Block ALTER INDEX/TABLE index_name ALTER COLUMN colname SET (options) 2021-10-19 11:03:52 +09:00
case.sql Add support for NullIfExpr in eval_const_expressions 2021-04-02 11:01:49 +02:00
char.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
circle.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
cluster.sql Avoid possible crash while finishing up a heap rewrite. 2021-03-23 11:24:16 -04:00
collate.icu.utf8.sql Revert per-index collation version tracking feature. 2021-05-07 21:10:11 +12:00
collate.linux.utf8.sql Revert per-index collation version tracking feature. 2021-05-07 21:10:11 +12:00
collate.sql Improve error checking of CREATE COLLATION options. 2021-07-18 11:08:34 +01:00
combocid.sql Sanitize the term "combo CID" in code comments 2021-03-25 16:08:03 +09:00
comments.sql
compression.sql Remove forced toast recompression in VACUUM FULL/CLUSTER 2021-06-14 09:25:50 +09:00
conversion.sql Add fast path for validating UTF-8 text 2021-12-20 10:07:29 -04:00
copy2.sql Fix handling of redundant options with COPY for "freeze" and "header" 2020-10-05 09:43:17 +09:00
copydml.sql Extend a test case a little 2021-02-26 09:11:15 +01:00
copyselect.sql Revert "psql: Show all query results by default" 2021-04-15 19:42:55 +02:00
create_aggregate.sql Introduce "anycompatible" family of polymorphic types. 2020-03-19 11:43:11 -04:00
create_am.sql Add support for SET ACCESS METHOD in ALTER TABLE 2021-07-28 10:10:44 +09:00
create_cast.sql
create_function_3.sql Fix display of SQL-standard function's arguments in INSERT/SELECT. 2021-11-17 11:31:31 -05:00
create_index.sql Preserve opclass parameters across REINDEX CONCURRENTLY 2021-11-01 11:38:23 +09:00
create_index_spgist.sql Add a planner support function for starts_with(). 2021-11-17 16:54:12 -05:00
create_misc.sql Remove gratuitous uses of deprecated SELECT INTO 2021-01-28 14:28:41 +01:00
create_operator.sql Remove support for postfix (right-unary) operators. 2020-09-17 19:38:05 -04:00
create_procedure.sql Reconsider the handling of procedure OUT parameters. 2021-06-10 17:11:36 -04:00
create_table.sql Fix bogus logic for reporting which hash partition conflicts. 2021-06-29 14:34:31 -04:00
create_table_like.sql Extended statistics on expressions 2021-03-27 00:01:11 +01:00
create_type.sql Allow ALTER TYPE to update an existing type's typsubscript value. 2020-12-11 18:58:21 -05:00
create_view.sql Allow an alias to be attached to a JOIN ... USING 2021-03-31 17:10:50 +02:00
date.sql Change return type of EXTRACT to numeric 2021-04-06 07:20:42 +02:00
dbsize.sql Teach pg_size_pretty and pg_size_bytes about petabytes 2021-07-09 18:56:00 +12:00
delete.sql
dependency.sql Un-hide most cascaded-drop details in regression test results. 2019-03-24 19:15:37 -04:00
domain.sql Fix assignment to array of domain over composite. 2021-10-19 13:54:45 -04:00
drop_if_exists.sql Introduce the 'force' option for the Drop Database command. 2019-11-13 08:25:33 +05:30
drop_operator.sql Fix DROP OPERATOR to reset oprcom/oprnegate links to the dropped operator. 2016-03-25 12:33:16 -04:00
enum.sql Relax transactional restrictions on ALTER TYPE ... ADD VALUE (redux). 2018-10-09 12:51:01 +13:00
equivclass.sql Suppress unnecessary RelabelType nodes in more cases. 2020-02-26 18:14:12 -05:00
errors.sql Reject SELECT ... GROUP BY GROUPING SETS (()) FOR UPDATE. 2021-06-01 11:12:56 -04:00
event_trigger.sql Improve handling of dropped objects in pg_event_trigger_ddl_commands() 2021-06-14 14:57:22 +09:00
explain.sql Stabilize output of new regression test. 2021-07-27 12:49:45 -04:00
expressions.sql Ensure casting to typmod -1 generates a RelabelType. 2021-12-16 15:36:02 -05:00
fast_default.sql Don't set a fast default for anything but a plain table 2021-06-18 06:51:12 -04:00
float4.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
float8.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
foreign_data.sql Improve HINT message that FDW reports when there are no valid options. 2021-10-27 00:46:52 +09:00
foreign_key.sql Allow specifying column list for foreign key ON DELETE SET actions 2021-12-08 11:13:57 +01:00
functional_deps.sql
generated.sql Disallow whole-row variables in GENERATED expressions. 2021-05-21 15:12:08 -04:00
geometry.sql Remove unimplemented/undocumented geometric functions & operators. 2021-12-13 18:08:28 -05:00
gin.sql Improve test coverage of ginvacuum.c. 2020-09-01 18:40:43 -04:00
gist.sql Add support for <-> (box, point) operator to GiST box_ops 2019-07-14 15:09:15 +03:00
groupingsets.sql Implement GROUP BY DISTINCT 2021-03-18 18:22:18 +01:00
guc.sql Warning on SET of nonexisting setting with a prefix reserved by an extension 2021-12-01 15:08:32 +01:00
hash_func.sql Fix portability issue in tests from commit ce773f230. 2021-09-03 10:01:02 -04:00
hash_index.sql Add more tests for reloptions 2017-10-19 14:22:05 +02:00
hash_part.sql Fix typo in test comment. 2020-05-28 12:35:18 +03:00
horology.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
hs_primary_extremes.sql Remove all references to "xlog" from SQL-callable functions in pg_proc. 2017-02-09 15:10:09 -05:00
hs_primary_setup.sql Remove all references to "xlog" from SQL-callable functions in pg_proc. 2017-02-09 15:10:09 -05:00
hs_standby_allowed.sql Allow UNLISTEN in hot-standby mode. 2019-01-25 21:14:49 -05:00
hs_standby_check.sql
hs_standby_disallowed.sql Allow UNLISTEN in hot-standby mode. 2019-01-25 21:14:49 -05:00
hs_standby_functions.sql Introduce xid8-based functions to replace txid_XXX. 2020-04-07 12:04:32 +12:00
identity.sql Forbid marking an identity column as nullable. 2021-03-12 11:08:42 -05:00
incremental_sort.sql Fix planner failure in some cases of sorting by an aggregate. 2021-04-20 11:32:02 -04:00
index_including.sql Support INCLUDE'd columns in SP-GiST. 2021-04-05 18:41:21 -04:00
index_including_gist.sql Support for INCLUDE attributes in GiST indexes 2019-03-10 11:37:17 +03:00
indexing.sql Raise error on concurrent drop of partitioned index 2020-09-01 13:40:43 -04:00
indirect_toast.sql Fix portability issue in test indirect_toast 2021-06-07 18:12:29 +09:00
inet.sql Add test case for abbrev(cidr) 2021-02-11 09:56:14 +01:00
infinite_recurse.sql Paper over regression failures in infinite_recurse() on PPC64 Linux. 2020-10-13 17:44:56 -04:00
inherit.sql Allow ordered partition scans in more cases 2021-08-03 12:25:52 +12:00
init_privs.sql Fix typos in comments. 2017-02-06 11:33:58 +02:00
insert.sql Accept slightly-filled pages for tuples larger than fillfactor. 2021-03-30 18:53:44 -07:00
insert_conflict.sql Allow table-qualified variable names in ON CONFLICT ... WHERE. 2021-04-13 15:39:41 -04:00
int2.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
int4.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
int8.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
interval.sql Change return type of EXTRACT to numeric 2021-04-06 07:20:42 +02:00
join.sql Fix pull_varnos to cope with translated PlaceHolderVars. 2021-09-17 15:41:16 -04:00
join_hash.sql Fix representation of hash keys in Hash/HashJoin nodes. 2019-08-02 00:02:46 -07:00
json.sql Improve reporting for syntax errors in multi-line JSON data. 2021-03-01 16:44:17 -05:00
json_encoding.sql Allow Unicode escapes in any server encoding, not only UTF-8. 2020-03-06 14:17:43 -05:00
jsonb.sql Improve reporting for syntax errors in multi-line JSON data. 2021-03-01 16:44:17 -05:00
jsonb_jsonpath.sql Support for ISO 8601 in the jsonpath .datetime() method 2020-09-29 12:00:04 +03:00
jsonpath.sql Implement jsonpath .datetime() method 2019-09-25 22:51:51 +03:00
jsonpath_encoding.sql Allow Unicode escapes in any server encoding, not only UTF-8. 2020-03-06 14:17:43 -05:00
limit.sql Error out if SKIP LOCKED and WITH TIES are both specified 2021-10-01 18:29:18 -03:00
line.sql Improve test coverage of geometric types 2018-09-26 10:45:21 +02:00
lock.sql Revert "Accept relations of any kind in LOCK TABLE". 2020-11-06 16:17:56 -05:00
lseg.sql Improve test coverage of geometric types 2018-09-26 10:45:21 +02:00
macaddr.sql
macaddr8.sql Add support for EUI-64 MAC addresses as macaddr8 2017-03-15 11:16:25 -04:00
matview.sql Really fix the ambiguity in REFRESH MATERIALIZED VIEW CONCURRENTLY. 2021-08-07 13:29:32 -04:00
memoize.sql Flush Memoize cache when non-key parameters change, take 2 2021-11-24 23:29:14 +13:00
misc_functions.sql Add SQL functions to monitor the directory contents of replication slots 2021-11-23 19:29:42 +09:00
misc_sanity.sql Replace explicit PIN entries in pg_depend with an OID range test. 2021-07-15 11:41:47 -04:00
money.sql Fix loss of fractional digits for large values in cash_numeric(). 2019-07-26 11:59:00 -04:00
multirangetypes.sql Fix alignment in multirange_get_range() function 2021-12-13 17:17:33 +03:00
mvcc.sql Increment xactCompletionCount during subtransaction abort. 2021-04-06 09:24:50 -07:00
name.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
namespace.sql Clean up duplicate role and schema names in regression tests. 2018-03-15 14:00:31 -04:00
numeric.sql Fix corner-case loss of precision in numeric_power(). 2021-10-06 13:16:51 +01:00
numeric_big.sql Fix corner-case loss of precision in numeric ln(). 2020-03-01 14:49:25 +00:00
numerology.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
object_address.sql Allow publishing the tables of schema. 2021-10-27 07:44:52 +05:30
oid.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
oidjoins.sql Build in some knowledge about foreign-key relationships in the catalogs. 2021-02-02 17:11:55 -05:00
opr_sanity.sql Revert 29854ee8d1 due to buildfarm failures 2021-06-15 21:44:40 +03:00
partition_aggregate.sql Move per-agg and per-trans duplicate finding to the planner. 2020-11-24 10:45:00 +02:00
partition_info.sql Fix crash with pg_partition_root 2019-03-22 17:27:38 +09:00
partition_join.sql Copy editing: fix a bunch of misspellings and poor wording. 2020-09-21 12:43:42 -04:00
partition_prune.sql Change the name of the Result Cache node to Memoize 2021-07-14 12:43:58 +12:00
password.sql Change default of password_encryption to scram-sha-256 2020-06-10 16:42:55 +02:00
path.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
pg_lsn.sql Add +(pg_lsn,numeric) and -(pg_lsn,numeric) operators. 2020-06-30 23:55:07 +09:00
plancache.sql Add generic_plans and custom_plans fields into pg_prepared_statements. 2020-07-20 11:55:50 +09:00
plpgsql.sql Test and document the behavior of initialization cross-refs in plpgsql. 2021-10-29 12:45:33 -04:00
point.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
polygon.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
polymorphism.sql Fix bugs in polymorphic-argument resolution for multiranges. 2021-07-27 15:01:49 -04:00
portals.sql Fix some anomalies with NO SCROLL cursors. 2021-09-10 13:18:32 -04:00
portals_p2.sql
prepare.sql Add more tests for CREATE TABLE AS with WITH NO DATA 2019-02-07 09:21:57 +09:00
prepared_xacts.sql Fix check for conflicting session- vs transaction-level locks. 2021-07-24 18:35:52 -04:00
privileges.sql Add test for REVOKE ADMIN OPTION 2021-11-26 14:02:14 +01:00
psql.sql psql: Add various tests 2021-09-29 23:17:10 +02:00
psql_crosstab.sql Fix incorrect error reporting for duplicate data in \crosstabview. 2016-12-25 16:04:45 -05:00
publication.sql Fix double publish of child table's data. 2021-12-09 08:36:59 +05:30
random.sql Remove gratuitous uses of deprecated SELECT INTO 2021-01-28 14:28:41 +01:00
rangefuncs.sql Fix planner error with pulling up subquery expressions into function RTEs. 2021-10-14 12:43:55 -04:00
rangetypes.sql Fix alignment in multirange_get_range() function 2021-12-13 17:17:33 +03:00
regex.linux.utf8.sql Make locale-dependent regex character classes work for large char codes. 2016-09-05 17:06:29 -04:00
regex.sql Fix regexp misbehavior with capturing parens inside "{0}". 2021-08-24 16:37:26 -04:00
regproc.sql Implement type regcollation 2020-03-18 21:21:00 +01:00
reindex_catalog.sql Fix rd_firstRelfilenodeSubid for nailed relations, in parallel workers. 2020-09-09 18:50:24 -07:00
reloptions.sql Improve stability of test with vacuum_truncate in reloptions.sql 2021-04-02 09:44:42 +09:00
replica_identity.sql Block ALTER TABLE .. DROP NOT NULL on columns in replica identity index 2021-11-25 15:04:56 +09:00
returning.sql
roleattributes.sql Remove WITH OIDS support, change oid catalog column visibility. 2018-11-20 16:00:17 -08:00
rowsecurity.sql Fix misbehavior of DROP OWNED BY with duplicate polroles entries. 2021-06-18 18:00:09 -04:00
rowtypes.sql Add a couple of regression test cases related to array subscripting. 2020-12-07 11:10:21 -05:00
rules.sql Avoid trying to lock OLD/NEW in a rule with FOR UPDATE. 2021-08-19 12:12:35 -04:00
sanity_check.sql Don't create relfilenode for relations without storage 2019-01-04 14:51:17 -03:00
security_label.sql Establish conventions about global object names used in regression tests. 2016-07-17 18:42:43 -04:00
select.sql Make some subquery-using test cases a bit more robust. 2018-10-14 14:02:59 -04:00
select_distinct.sql Fix broken regression test caused by 22c4e88eb 2021-08-23 01:44:20 +12:00
select_distinct_on.sql
select_having.sql
select_implicit.sql Remove gratuitous uses of deprecated SELECT INTO 2021-01-28 14:28:41 +01:00
select_into.sql Sanitize IF NOT EXISTS in EXPLAIN for CTAS and matviews 2020-12-30 21:23:24 +09:00
select_parallel.sql Fix mis-planning of repeated application of a projection. 2021-05-31 12:03:00 -04:00
select_views.sql Avoid locale-dependent output in select_views regression test. 2017-05-28 14:52:18 -04:00
sequence.sql Make command order in test more sensible 2019-10-22 10:35:54 +02:00
spgist.sql Fix SP-GiST scan initialization logic for binary-compatible cases. 2021-11-20 14:29:56 -05:00
stats.sql Move test for BRIN HOT behavior to stats.sql 2021-12-11 05:32:35 +01:00
stats_ext.sql Identify simple column references in extended statistics 2021-09-01 17:41:56 +02:00
strings.sql Let regexp_replace() make use of REG_NOSUB when feasible. 2021-08-09 20:53:25 -04:00
subscription.sql Improve parsing of options of CREATE/ALTER SUBSCRIPTION 2021-12-08 12:36:31 +09:00
subselect.sql Fix planner error with multiple copies of an AlternativeSubPlan. 2021-09-14 15:11:21 -04:00
sysviews.sql Fix memory overrun when querying pg_stat_slru 2021-11-12 21:49:21 +09:00
tablesample.sql Fix some anomalies with NO SCROLL cursors. 2021-09-10 13:18:32 -04:00
temp.sql Fix misbehavior with expression indexes on ON COMMIT DELETE ROWS tables. 2019-12-01 13:09:26 -05:00
test_setup.sql Fix the public schema's permissions in a separate test script. 2021-12-17 16:22:26 -05:00
text.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
tid.sql Remove catalog function currtid() 2020-11-25 12:18:26 +09:00
tidrangescan.sql Add TID Range Scans to support efficient scanning ranges of TIDs 2021-02-27 22:59:36 +13:00
tidscan.sql Fix bug in Tid scan. 2020-02-07 22:06:31 +09:00
time.sql Change return type of EXTRACT to numeric 2021-04-06 07:20:42 +02:00
timestamp.sql Disallow negative strides in date_bin() 2021-07-28 12:10:12 -04:00
timestamptz.sql Disallow negative strides in date_bin() 2021-07-28 12:10:12 -04:00
timetz.sql Change return type of EXTRACT to numeric 2021-04-06 07:20:42 +02:00
transactions.sql Revert "psql: Show all query results by default" 2021-04-15 19:42:55 +02:00
triggers.sql Make new test immune to collation 2021-07-23 11:52:48 -04:00
truncate.sql Fix TRUNCATE .. CASCADE on partitions 2020-02-07 17:09:36 -03:00
tsdicts.sql Preserve integer and float values accurately in (de)serialize_deflist. 2020-03-10 12:30:02 -04:00
tsearch.sql Make websearch_to_tsquery() parse text in quotes as a single token 2021-05-03 04:18:19 +03:00
tsrf.sql Fix handling of targetlist SRFs when scan/join relation is known empty. 2019-03-07 14:22:13 -05:00
tstypes.sql Disallow making an empty lexeme via array_to_tsvector(). 2021-11-06 13:28:53 -04:00
tuplesort.sql Fix some typos, grammar and style in docs and comments 2021-02-24 16:13:17 +09:00
txid.sql Introduce xid8-based functions to replace txid_XXX. 2020-04-07 12:04:32 +12:00
type_sanity.sql Fix quoting of ACL item in table for upgrade binary compatibility checks 2021-11-18 12:52:49 +09:00
typed_table.sql Suppress less info in regression tests using DROP CASCADE. 2017-08-01 16:49:23 -04:00
unicode.sql Fix buffer overrun in unicode string normalization with empty input 2021-11-11 15:00:59 +09:00
union.sql Disable anonymous record hash support except in special cases 2021-09-08 09:55:04 +02:00
updatable_views.sql Calculate extraUpdatedCols in query rewriter, not parser. 2020-10-28 13:47:02 -04:00
update.sql Fix mishandling of resjunk columns in ON CONFLICT ... UPDATE tlists. 2021-05-10 11:02:29 -04:00
uuid.sql Add gen_random_uuid function 2019-07-14 14:30:27 +02:00
vacuum.sql Don't reset relhasindex for partitioned tables on ANALYZE 2021-07-01 12:56:30 -04:00
vacuum_parallel.sql Don't overlook indexes during parallel VACUUM. 2021-11-02 12:06:17 -07:00
varchar.sql Clean up ancient test style 2020-12-15 22:03:39 +01:00
window.sql Add tests for UNBOUNDED syntax ambiguity 2021-07-01 09:27:05 +02:00
with.sql Fix EXPLAIN to handle SEARCH BREADTH FIRST queries. 2021-09-16 10:45:42 -04:00
write_parallel.sql Enable parallelism in REFRESH MATERIALIZED VIEW. 2021-03-17 15:04:17 +13:00
xid.sql Introduce xid8-based functions to replace txid_XXX. 2020-04-07 12:04:32 +12:00
xml.sql Avoid failure when selecting a namespace node in XMLTABLE. 2019-10-25 15:22:45 -04:00
xmlmap.sql Fix cursor_to_xml in tableforest false mode 2017-05-03 21:41:10 -04:00