postgresql/src/backend
Tom Lane ea1268f630 Avoid generating extra subre tree nodes for capturing parentheses.
Previously, each pair of capturing parentheses gave rise to a separate
subre tree node, whose only function was to identify that we ought to
capture the match details for this particular sub-expression.  In
most cases we don't really need that, since we can perfectly well
put a "capture this" annotation on the child node that does the real
matching work.  As with the two preceding commits, the main value
of this is to avoid generating and optimizing an NFA for a tree node
that's not really pulling its weight.

The chosen data representation only allows one capture annotation
per subre node.  In the legal-per-spec, but seemingly not very useful,
case where there are multiple capturing parens around the exact same
bit of the regex (i.e. "((xyz))"), wrap the child node in N-1 capture
nodes that act the same as before.  We could work harder at that but
I'll refrain, pending some evidence that such cases are worth troubling
over.

In passing, improve the comments in regex.h to say what all the
different re_info bits mean.  Some of them were pretty obvious
but others not so much, so reverse-engineer some documentation.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 19:26:41 -05:00
..
access Fix bug in COMMIT AND CHAIN command. 2021-02-19 21:57:52 +09:00
bootstrap Update copyright for 2021 2021-01-02 13:06:25 -05:00
catalog Routine usage information schema tables 2021-02-17 18:16:06 +01:00
commands Use errmsg_internal for debug messages 2021-02-17 11:33:25 +01:00
executor Fix tuple routing to initialize batching only for inserts 2021-02-18 00:03:45 +01:00
foreign Update copyright for 2021 2021-01-02 13:06:25 -05:00
jit Use errmsg_internal for debug messages 2021-02-17 11:33:25 +01:00
lib Update copyright for 2021 2021-01-02 13:06:25 -05:00
libpq Allow specifying CRL directory 2021-02-18 07:59:10 +01:00
main Update copyright for 2021 2021-01-02 13:06:25 -05:00
nodes Remove [Merge]AppendPath.partitioned_rels. 2021-02-01 14:43:54 -05:00
optimizer Remove [Merge]AppendPath.partitioned_rels. 2021-02-01 14:43:54 -05:00
parser Use errmsg_internal for debug messages 2021-02-17 11:33:25 +01:00
partitioning Use errmsg_internal for debug messages 2021-02-17 11:33:25 +01:00
po Translation updates 2020-05-18 12:49:30 +02:00
port Use errmsg_internal for debug messages 2021-02-17 11:33:25 +01:00
postmaster Use errmsg_internal for debug messages 2021-02-17 11:33:25 +01:00
regex Avoid generating extra subre tree nodes for capturing parentheses. 2021-02-20 19:26:41 -05:00
replication Fix "invalid spinlock number: 0" error in pg_stat_wal_receiver. 2021-02-18 23:28:15 +09:00
rewrite Revert "Propagate CTE property flags when copying a CTE list into a rule." 2021-02-07 12:54:08 -05:00
snowball Update snowball 2021-02-19 08:10:15 +01:00
statistics Update copyright for 2021 2021-01-02 13:06:25 -05:00
storage Use errmsg_internal for debug messages 2021-02-17 11:33:25 +01:00
tcop Use errmsg_internal for debug messages 2021-02-17 11:33:25 +01:00
tsearch Fix parsing of complex morphs to tsquery 2021-01-31 20:14:29 +03:00
utils Allow specifying CRL directory 2021-02-18 07:59:10 +01:00
.gitignore
common.mk Remove PARTIAL_LINKING build mode. 2018-03-30 17:33:04 -07:00
Makefile Update copyright for 2021 2021-01-02 13:06:25 -05:00
nls.mk Add missing gettext triggers 2020-04-28 13:35:40 +02:00