Commit Graph

52066 Commits

Author SHA1 Message Date
Tom Lane 6b71c925cb Prevent ALTER TYPE/DOMAIN/OPERATOR from changing extension membership.
If recordDependencyOnCurrentExtension is invoked on a pre-existing,
free-standing object during an extension update script, that object
will become owned by the extension.  In our current code this is
possible in three cases:

* Replacing a "shell" type or operator.
* CREATE OR REPLACE overwriting an existing object.
* ALTER TYPE SET, ALTER DOMAIN SET, and ALTER OPERATOR SET.

The first of these cases is intentional behavior, as noted by the
existing comments for GenerateTypeDependencies.  It seems like
appropriate behavior for CREATE OR REPLACE too; at least, the obvious
alternatives are not better.  However, the fact that it happens during
ALTER is an artifact of trying to share code (GenerateTypeDependencies
and makeOperatorDependencies) between the CREATE and ALTER cases.
Since an extension script would be unlikely to ALTER an object that
didn't already belong to the extension, this behavior is not very
troubling for the direct target object ... but ALTER TYPE SET will
recurse to dependent domains, and it is very uncool for those to
become owned by the extension if they were not already.

Let's fix this by redefining the ALTER cases to never change extension
membership, full stop.  We could minimize the behavioral change by
only changing the behavior when ALTER TYPE SET is recursing to a
domain, but that would complicate the code and it does not seem like
a better definition.

Per bug #17144 from Alex Kozhemyakin.  Back-patch to v13 where ALTER
TYPE SET was added.  (The other cases are older, but since they only
affect the directly-named object, there's not enough of a problem to
justify changing the behavior further back.)

Discussion: https://postgr.es/m/17144-e67d7a8f049de9af@postgresql.org
2021-08-17 14:29:22 -04:00
Tom Lane b66336c4e1 Reduce assumptions about locale's behavior in new regex tests.
I was overoptimistic to assume that UTF8-based locales would all
consider U+1500 to be a member of the [[:alpha:]] char class.
Tweak the test cases added by commit 78a843f11 to avoid that
assumption.  We might need to lobotomize them further, but this
should be enough to fix the early buildfarm reports.
2021-08-17 13:00:36 -04:00
Tom Lane 78a843f119 Improve regex compiler's arc moving/copying logic.
The functions moveins(), copyins(), moveouts(), copyouts() are
required to preserve the invariant that there are no duplicate arcs in
the regex's NFA.  Spencer's original implementation of them was O(N^2)
since it checked separately for a match to each source arc.  In commit
579840ca0 I improved that by adding sort/merge logic to be used if
more than a few arcs are to be moved/copied.  However, I now realize
that that missed a bet.  At many call sites, the target state is newly
made and cannot have any existing in-arcs (respectively out-arcs)
that could be duplicates.  So spending any cycles at all on checking
for duplicates is wasted effort; in these cases we can just blindly
move/copy all the source arcs.  Add code paths to do that.

It turns out that for copyins()/copyouts(), *all* the call sites have
this property, making all the "improved" logic in them flat out
unreachable.  Perhaps we'll need the full capability again someday,
so I just #ifdef'd those paths out rather than removing them entirely.

In passing, add a few test cases to improve code coverage in this
area as well as in regc_locale.c/regc_pg_locale.c.

Discussion: https://postgr.es/m/810272.1629064063@sss.pgh.pa.us
2021-08-17 12:00:02 -04:00
Michael Meskes f576de1db1 Improved ECPG warning as suggested by Michael Paquier and removed test case
that triggers the warning during regression tests.
2021-08-17 15:01:09 +02:00
Daniel Gustafsson 31f860a52b Set type identifier on BIO
In OpenSSL there are two types of BIO's (I/O abstractions):
source/sink and filters. A source/sink BIO is a source and/or
sink of data, ie one acting on a socket or a file. A filter
BIO takes a stream of input from another BIO and transforms it.
In order for BIO_find_type() to be able to traverse the chain
of BIO's and correctly find all BIO's of a certain type they
shall have the type bit set accordingly, source/sink BIO's
(what PostgreSQL implements) use BIO_TYPE_SOURCE_SINK and
filter BIO's use BIO_TYPE_FILTER. In addition to these, file
descriptor based BIO's should have the descriptor bit set,
BIO_TYPE_DESCRIPTOR.

The PostgreSQL implementation didn't set the type bits, which
went unnoticed for a long time as it's only really relevant
for code auditing the OpenSSL installation, or doing similar
tasks. It is required by the API though, so this fixes it.

Backpatch through 9.6 as this has been wrong for a long time.

Author: Itamar Gafni
Discussion: https://postgr.es/m/SN6PR06MB39665EC10C34BB20956AE4578AF39@SN6PR06MB3966.namprd06.prod.outlook.com
Backpatch-through: 9.6
2021-08-17 14:30:01 +02:00
Heikki Linnakangas e9a79c220b doc: \123 and \x12 escapes in COPY are in database encoding.
The backslash sequences, including \123 and \x12 escapes, are interpreted
after encoding conversion. The docs failed to mention that.

Backpatch to all supported versions.

Reported-by: Andreas Grob
Discussion: https://www.postgresql.org/message-id/17142-9181542ca1df75ab%40postgresql.org
2021-08-17 10:00:06 +03:00
Alvaro Herrera 6f8127b739
Revert analyze support for partitioned tables
This reverts the following commits:
1b5617eb84 Describe (auto-)analyze behavior for partitioned tables
0e69f705cc Set pg_class.reltuples for partitioned tables
41badeaba8 Document ANALYZE storage parameters for partitioned tables
0827e8af70 autovacuum: handle analyze for partitioned tables

There are efficiency issues in this code when handling databases with
large numbers of partitions, and it doesn't look like there isn't any
trivial way to handle those.  There are some other issues as well.  It's
now too late in the cycle for nontrivial fixes, so we'll have to let
Postgres 14 users continue to manually deal with ANALYZE their
partitioned tables, and hopefully we can fix the issues for Postgres 15.

I kept [most of] be280cdad2 ("Don't reset relhasindex for partitioned
tables on ANALYZE") because while we added it due to 0827e8af70, it is
a good bugfix in its own right, since it affects manual analyze as well
as autovacuum-induced analyze, and there's no reason to revert it.

I retained the addition of relkind 'p' to tables included by
pg_stat_user_tables, because reverting that would require a catversion
bump.
Also, in pg14 only, I keep a struct member that was added to
PgStat_TabStatEntry to avoid breaking compatibility with existing stat
files.

Backpatch to 14.

Discussion: https://postgr.es/m/20210722205458.f2bug3z6qzxzpx2s@alap3.anarazel.de
2021-08-16 17:27:52 -04:00
Tom Lane 3aafc030a5 Reduce memory consumption for pending invalidation messages.
The existing data structures in inval.c are fairly inefficient for
the common case of a command or subtransaction that registers a small
number of cache invalidation events.  While this doesn't matter if we
commit right away, it can build up to a lot of bloat in a transaction
that contains many DDL operations.  By making a few more assumptions
about the expected use-case, we can switch to a representation using
densely-packed arrays.  Although this eliminates some data-copying,
it doesn't seem to make much difference time-wise.  But the space
consumption decreases substantially.

Patch by me; thanks to Nathan Bossart for review.

Discussion: https://postgr.es/m/2380555.1622395376@sss.pgh.pa.us
2021-08-16 16:48:25 -04:00
Daniel Gustafsson 069d33d0c5 Emit namespace in the post-copy errmsg
During a VACUUM or CLUSTER command, the initial output emits a
fully qualified relation path with namespace.  The post-action
errmsg only emitted the relation name however, which may lead
to hard to parse output when using multiple jobs with vacuumdb
as the output from different jobs may be interleaved.  Include
the full path in the post-action errmsg to be consistent with
the initial errmsg.

Author: Mike Fiedler <miketheman@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CAMerE0oz+8G-aORZL_BJcPxnBqewZAvND4bSUysjz+r-oT1BxQ@mail.gmail.com
2021-08-16 20:06:54 +02:00
John Naylor 4864c8e8f1 Use direct function calls for pg_popcount{32,64} on non-x86 platforms
Previously, all pg_popcount{32,64} calls were indirected through
a function pointer, even though we had no fast implementation for
non-x86 platforms. Instead, for those platforms use wrappers around
the pg_popcount{32,64}_slow functions.

Review and additional hacking by David Rowley
Reviewed by Álvaro Herrera

Discussion: https://www.postgresql.org/message-id/flat/CAFBsxsE7otwnfA36Ly44zZO%2Bb7AEWHRFANxR1h1kxveEV%3DghLQ%40mail.gmail.com
2021-08-16 11:51:15 -04:00
Daniel Gustafsson ea499f3d28 Clarify initdb --sync-only help message and docs
The initdb help message for --sync-only was a bit terse, and not
really self-explanatory. Make it clearer that initdb --sync-only
will exit after syncing, and expand the docs with a note on when
the option can be useful. Also align the help output with others
that exit immediately.

Author: Nathan Bossart, Gurjeet Singh
Discussion: https://postgr.es/m/CABwTF4U6hbNNE1bv=LxQdJybmUdZ5NJQ9rKY9tN82NXM8QH+iQ@mail.gmail.com
2021-08-16 13:38:01 +02:00
Michael Paquier e4ba1005c0 Refresh apply delay on reload of recovery_min_apply_delay at recovery
This commit ensures that the wait interval in the replay delay loop
waiting for an amount of time defined by recovery_min_apply_delay is
correctly handled on reload, recalculating the delay if this GUC value
is updated, based on the timestamp of the commit record being replayed.

The previous behavior would be problematic for example with replay
still waiting even if the delay got reduced or just cancelled.  If the
apply delay was increased to a larger value, the wait would have just
respected the old value set, finishing earlier.

Author: Soumyadeep Chakraborty, Ashwin Agrawal
Reviewed-by: Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/CAE-ML+93zfr-HLN8OuxF0BjpWJ17O5dv1eMvSE5jsj9jpnAXZA@mail.gmail.com
Backpatch-through: 9.6
2021-08-16 12:10:22 +09:00
Tom Lane 0a208ed63f Un-break s_lock_test.
Commit 80abbeba2 evidently didn't bother checking this code.
Also, list the generated executable in .gitignore (so it's
been a REALLY long time since anyone tried this).

Noted while trying out RISC-V spinlock patch.  Given that
this has been broken for 5 years and nobody noticed, it's
likely not worth back-patching.
2021-08-13 14:42:27 -04:00
Tom Lane c32fcac56a Add RISC-V spinlock support in s_lock.h.
Like the ARM case, just use gcc's __sync_lock_test_and_set();
that will compile into AMOSWAP.W.AQ which does what we need.

At some point it might be worth doing some work on atomic ops
for RISC-V, but this should be enough for a creditable port.

Back-patch to all supported branches, just in case somebody
wants to try them on RISC-V.

Marek Szuba

Discussion: https://postgr.es/m/dea97b6d-f55f-1f6d-9109-504aa7dfa421@gentoo.org
2021-08-13 13:59:43 -04:00
Peter Eisentraut 4279e5bc8c pg_amcheck: Message style and structuring improvements 2021-08-13 17:30:39 +02:00
Andres Freund 80a8f95b3b Remove support for background workers without BGWORKER_SHMEM_ACCESS.
Background workers without shared memory access have been broken on
EXEC_BACKEND / windows builds since shortly after background workers have been
introduced, without that being reported. Clearly they are not commonly used.

The problem is that bgworker startup requires to be attached to shared memory
in EXEC_BACKEND child processes. StartBackgroundWorker() detaches from shared
memory for unconnected workers, but at that point we already have initialized
subsystems referencing shared memory.

Fixing this problem is not entirely trivial, so removing the option to not be
connected to shared memory seems the best way forward. In most use cases the
advantages of being connected to shared memory far outweigh the disadvantages.

As there have been no reports about this issue so far, we have decided that it
is not worth trying to address the problem in the back branches.

Per discussion with Alvaro Herrera, Robert Haas and Tom Lane.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210802065116.j763tz3vz4egqy3w@alap3.anarazel.de
2021-08-13 05:49:26 -07:00
Andres Freund 1d5135f004 Fix typo.
Reported-By: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/YRIlNQhLNfx555Nx@paquier.xyz
2021-08-13 05:44:03 -07:00
Michael Meskes 399edafa2a Fix connection handling for DEALLOCATE and DESCRIBE statements
After binding a statement to a connection with DECLARE STATEMENT the connection
was still not used for DEALLOCATE and DESCRIBE statements. This patch fixes
that, adds a missing warning and cleans up the code.

Author: Hayato Kuroda
Reviewed-by: Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/TYAPR01MB5866BA57688DF2770E2F95C6F5069%40TYAPR01MB5866.jpnprd01.prod.outlook.com
2021-08-13 10:45:08 +02:00
Daniel Gustafsson 512f4ca6c6 Fix sslsni connparam boolean check
The check for sslsni only checked for existence of the parameter
but not for the actual value of the param.  This meant that the
SNI extension was always turned on.  Fix by inspecting the value
of sslsni and only activate the SNI extension iff sslsni has been
enabled.  Also update the docs to be more in line with how other
boolean params are documented.

Backpatch to 14 where sslsni was first implemented.

Reviewed-by: Tom Lane
Backpatch-through: 14, where sslni was added
2021-08-13 10:32:17 +02:00
David Rowley 37450f2ca9 Fix incorrect hash table resizing code in simplehash.h
This fixes a bug in simplehash.h which caused an incorrect size mask to be
used when the hash table grew to SH_MAX_SIZE (2^32).  The code was
incorrectly setting the size mask to 0 when the hash tables reached the
maximum possible number of buckets.  This would result always trying to
use the 0th bucket causing an  infinite loop of trying to grow the hash
table due to there being too many collisions.

Seemingly it's not that common for simplehash tables to ever grow this big
as this bug dates back to v10 and nobody seems to have noticed it before.
However, probably the most likely place that people would notice it would
be doing a large in-memory Hash Aggregate with something close to at least
2^31 groups.

After this fix, the code now works correctly with up to within 98% of 2^32
groups and will fail with the following error when trying to insert any
more items into the hash table:

ERROR:  hash table size exceeded

However, the work_mem (or hash_mem_multiplier in newer versions) settings
will generally cause Hash Aggregates to spill to disk long before reaching
that many groups.  The minimal test case I did took a work_mem setting of
over 192GB to hit the bug.

simplehash hash tables are used in a few other places such as Bitmap Index
Scans, however, again the size that the hash table can become there is
also limited to work_mem and it would take a relation of around 16TB
(2^31) pages and a very large work_mem setting to hit this.  With smaller
work_mem values the table would become lossy and never grow large enough
to hit the problem.

Author: Yura Sokolov
Reviewed-by: David Rowley, Ranier Vilela
Discussion: https://postgr.es/m/b1f7f32737c3438136f64b26f4852b96@postgrespro.ru
Backpatch-through: 10, where simplehash.h was added
2021-08-13 16:41:26 +12:00
Thomas Munro 88cbbbfa3e Make EXEC_BACKEND more convenient on macOS.
It's hard to disable ASLR on current macOS releases, for testing with
-DEXEC_BACKEND.  You could already set the environment variable
PG_SHMEM_ADDR to something not likely to collide with mappings created
earlier in process startup.  Let's also provide a default value that
works on current releases and architectures, for developer convenience.

As noted in the pre-existing comment, this is a horrible hack, but
-DEXEC_BACKEND is only used by Unix-based PostgreSQL developers for
testing some otherwise Windows-only code paths, so it seems excusable.

Back-patch to all supported branches.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20210806032944.m4tz7j2w47mant26%40alap3.anarazel.de
2021-08-13 11:09:00 +12:00
Tomas Vondra 650663b4cb Use appropriate tuple descriptor in FDW batching
The FDW batching code was using the same tuple descriptor both for all
slots (regular and plan slots), but that's incorrect - the subplan may
use a different descriptor. Currently this is benign, because batching
is used only for INSERTs, and in that case the descriptors always match.
But that would change if we allow batching UPDATEs.

Fix by copying the appropriate tuple descriptor. Backpatch to 14, where
the FDW batching was implemented.

Author: Amit Langote
Backpatch-through: 14, where FDW batching was added
Discussion: https://postgr.es/m/CA%2BHiwqEWd5B0-e-RvixGGUrNvGkjH2s4m95%3DJcwUnyV%3Df0rAKQ%40mail.gmail.com
2021-08-12 22:10:06 +02:00
John Naylor ba958299ea Speed up generation of Unicode hash functions.
Sets of Unicode keys are picky about the primes used when generating
a perfect hash function for them. Callers can spend many seconds
iterating through all the possible combinations of candidate
multipliers and seeds to find one that works.

Unicode updates typically happen only once a year, but it still makes
development and testing of Unicode scripts unnecessarily slow. To fix,
iterate over the primes in the innermost loop. This does not change
any existing functions checked into the tree.
2021-08-12 09:08:56 -04:00
John Naylor b05f7ecec4 Fix grammar mistake in hash index README
Dilip Kumar

Discussion: https://www.postgresql.org/message-id/CAFiTN-tjZbuY6vy7kZZ6xO%2BD4mVcO5wOPB5KiwJ3AHhpytd8fg%40mail.gmail.com
2021-08-12 08:53:41 -04:00
Michael Paquier 710796f054 Avoid unnecessary shared invalidations in ROLLBACK PREPARED
The performance gain is minimal, but this makes the logic more
consistent with AtEOXact_Inval().  No other invalidation is needed in
this case as PREPARE takes already care of sending any local ones.

Author: Liu Huailing
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/OSZPR01MB6215AA84D71EF2B3D354CF86BE139@OSZPR01MB6215.jpnprd01.prod.outlook.com
2021-08-12 20:12:47 +09:00
Heikki Linnakangas c3928b467a Fix segfault during EvalPlanQual with mix of local and foreign partitions.
It's not sensible to re-evaluate a direct-modify Foreign Update or Delete
during EvalPlanQual. However, ExecInitForeignScan() can still get called
if a table mixes local and foreign partitions. EvalPlanQualStart() left
the es_result_relations array uninitialized in the child EPQ EState, but
ExecInitForeignScan() still expected to find it. That caused a segfault.

Fix by skipping the es_result_relations lookup during EvalPlanQual
processing. To make things a bit more robust, also skip the
BeginDirectModify calls, and add a runtime check that ExecForeignScan()
is not called on direct-modify foreign scans during EvalPlanQual
processing.

This is new in v14, commit 1375422c78. Before that, EvalPlanQualStart()
copied the whole ResultRelInfo array to the EPQ EState. Backpatch to v14.

Report and diagnosis by Andrey Lepikhov.

Discussion: https://www.postgresql.org/message-id/cb2b808d-cbaa-4772-76ee-c8809bafcf3d%40postgrespro.ru
2021-08-12 11:02:29 +03:00
Tom Lane a6bd28beb0 Fix failure of btree_gin indexscans with "char" type and </<= operators.
As a result of confusion about whether the "char" type is signed or
unsigned, scans for index searches like "col < 'x'" or "col <= 'x'"
would start at the middle of the index not the left end, thus missing
many or all of the entries they should find.  Fortunately, this
is not a symptom of index corruption.  It's only the search logic
that is broken, and we can fix it without unpleasant side-effects.

Per report from Jason Kim.  This has been wrong since btree_gin's
beginning, so back-patch to all supported branches.

Discussion: https://postgr.es/m/20210810001649.htnltbh7c63re42p@jasonk.me
2021-08-10 18:10:29 -04:00
Daniel Gustafsson 72bbff4cd6 Add alternative output for OpenSSL 3 without legacy loaded
OpenSSL 3 introduced the concept of providers to support modularization,
and moved the outdated ciphers to the new legacy provider. In case it's
not loaded in the users openssl.cnf file there will be a lot of regress
test failures, so add alternative outputs covering those.

Also document the need to load the legacy provider in order to use older
ciphers with OpenSSL-enabled pgcrypto.

This will be backpatched to all supported version once there is sufficient
testing in the buildfarm of OpenSSL 3.

Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/FEF81714-D479-4512-839B-C769D2605F8A@yesql.se
2021-08-10 15:08:46 +02:00
Daniel Gustafsson 318df80235 Disable OpenSSL EVP digest padding in pgcrypto
The PX layer in pgcrypto is handling digest padding on its own uniformly
for all backend implementations. Starting with OpenSSL 3.0.0, DecryptUpdate
doesn't flush the last block in case padding is enabled so explicitly
disable it as we don't use it.

This will be backpatched to all supported version once there is sufficient
testing in the buildfarm of OpenSSL 3.

Reviewed-by: Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/FEF81714-D479-4512-839B-C769D2605F8A@yesql.se
2021-08-10 15:01:52 +02:00
Daniel Gustafsson 152c2e0ae1 Remove unused regression test certificate server-ss
The server-ss certificate was included in e39250c64 but was never
used in the TLS regression tests so remove.

Author: Jacob Champion
Discussion: https://postgr.es/m/d15a9838344ba090e09fd866abf913584ea19fb7.camel@vmware.com
2021-08-10 11:15:02 +02:00
Michael Paquier e2ce88b58f Add tab completion for DECLARE .. ASENSITIVE in psql
This option has been introduced in dd13ad9.

Author: Shinya Kato
Discussion: https://postgr.es/m/TYAPR01MB289665526B76DA29DC70A031C4F09@TYAPR01MB2896.jpnprd01.prod.outlook.com
2021-08-10 15:54:42 +09:00
Michael Paquier 1e3445237b Fix regression test output of sepgsql
The difference is caused by 7b56584, for the tests involving a table
rewrite.

Per buildfarm member rhinoceros.

Discussion: https://postgr.es/m/YRHxXcyFjPuPTZui@paquier.xyz
2021-08-10 13:14:37 +09:00
Michael Paquier 7b565843a9 Add call to object access hook at the end of table rewrite in ALTER TABLE
ALTER TABLE .. SET {LOGGED,UNLOGGED,ACCESS METHOD} would never do a
table-level object access hook, which was inconsistent with SET
TABLESPACE.  Note that contrary to SET TABLESPACE, the no-op case is
left off for those commands as this requires tracking if commands have
been called, but they may not execute a physical rewrite.  Another thing
worth noting is that the physical file swap at the end of a rewrite
does a couple of access calls for internal objects created for the swap
operation (internal objects are for example skipped by the tests of
sepgsql), but this does not trigger the hook for the table on which the
operation is done.

f41872d, that added support for SET LOGGED/UNLOGGED in ALTER TABLE,
visibly forgot to consider that.

Based on what I checked, two regression tests of sepgsql in ddl.sql are
going to log more information with this test, something that buildfarm
member rhinoceros will tell soon enough.  I am not completely sure of
their format though, so these are not refreshed yet.

This is arguably a bug, but no backpatch is done as this could cause a
behavior change for anybody using object access hooks.

Reported-by: Jeff Davis
Discussion: https://postgr.es/m/YQJKV29/1a60uG68@paquier.xyz
2021-08-10 12:21:05 +09:00
Tom Lane 18bac60ede Let regexp_replace() make use of REG_NOSUB when feasible.
If the replacement string doesn't contain \1...\9, then we don't
need sub-match locations, so we can use the REG_NOSUB optimization
here too.  There's already a pre-scan of the replacement string
to look for backslashes, so extend that to check for digits, and
refactor to allow that to happen before we compile the regexp.

While at it, try to speed up the pre-scan by using memchr() instead
of a handwritten loop.  It's likely that this is lost in the noise
compared to the regexp processing proper, but maybe not.  In any
case, this coding is shorter.

Also, add some test cases to improve the poor coverage of
appendStringInfoRegexpSubstr().

Discussion: https://postgr.es/m/3534632.1628536485@sss.pgh.pa.us
2021-08-09 20:53:25 -04:00
Andres Freund e12694523e Fix bogus assertion in BootstrapModeMain().
The assertion was always true, as written, thanks to me "simplifying" it
before commit.

Per coverity and Tom Lane.
2021-08-09 08:28:53 -07:00
Tom Lane 0e6aa8747d Avoid determining regexp subexpression matches, when possible.
Identifying the precise match locations for parenthesized subexpressions
is a fairly expensive task given the way our regexp engine works, both
at regexp compile time (where we must create an optimized NFA for each
parenthesized subexpression) and at runtime (where determining exact
match locations requires laborious search).

Up to now we've made little attempt to optimize this situation.  This
patch identifies cases where we know at compile time that we won't
need to know subexpression match locations, and teaches the regexp
compiler to not bother creating per-subexpression regexps for
parenthesis pairs that are not referenced by backrefs elsewhere in
the regexp.  (To preserve semantics, we obviously still have to
pin down the match locations of backref references.)  Users could
have obtained the same results before this by being careful to
write "non capturing" parentheses wherever possible, but few people
bother with that.

Discussion: https://postgr.es/m/2219936.1628115334@sss.pgh.pa.us
2021-08-09 11:26:34 -04:00
David Rowley 76ad24400d Remove some special cases from MSVC build scripts
Here we add additional parsing of Makefiles to determine when to add
references to libpgport and libpgcommon.  We also remove the need for
adding the current contrib_extrasource by adding sine very basic logic to
implement the Makefile rules which add .l and .y files when they exist for
a given .o file in the Makefile.

This is just some very basic additional parsing of Makefiles to try to
keep things more consistent between builds using make and MSVC builds.
This happens to work with how our current Makefiles are laid out, but it
could easily be broken in the future if someone chooses do something in
the Makefile that we don't have parsing support for.  We will cross that
bridge when we come to it.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvoPULi5JW3933NxgwxOmu9Ncvpcyt87UhEHAUX16QqmpA@mail.gmail.com
2021-08-09 19:45:26 +12:00
David Rowley deb6ffd4fd Doc: Fix misleading statement about VACUUM memory limits
In ec34040af I added a mention that there was no point in setting
maintenance_work_limit to anything higher than 1GB for vacuum, but that
was incorrect as ginInsertCleanup() also looks at what
maintenance_work_mem is set to during VACUUM and that's not limited to
1GB.

Here I attempt to make it more clear that the limitation is only around
the number of dead tuple identifiers that we can collect during VACUUM.

I've also added a note to autovacuum_work_mem to mention this limitation.
I didn't do that in ec34040af as I'd had some wrong-headed ideas about
just limiting the maximum value for that GUC to 1GB.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvpGwOAvunp-E-bN_rbAs3hmxMoasm5pzkYDbf36h73s7w@mail.gmail.com
Backpatch-through: 9.6, same as ec34040af
2021-08-09 16:45:35 +12:00
David Rowley 4a3d806f38 Use ExplainPropertyInteger for queryid in EXPLAIN
This saves a few lines of code.  Also add a comment to mention why we use
ExplainPropertyInteger instead of ExplainPropertyUInteger given that
queryid is a uint64 type.

Author: David Rowley
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/CAApHDvqhSLYpSU_EqUdN39w9Uvb8ogmHV7_3YhJ0S3aScGBjsg@mail.gmail.com
Backpatch-through: 14, where this code was originally added
2021-08-09 15:47:23 +12:00
Amit Kapila c9229d3d2b Fix typo in 022_twophase_cascade.pl.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pta=zo8G1DWVVg-LU6b_JvHHCueC=AKVpKJOrwLzj9EZA@mail.gmail.com
2021-08-09 08:58:38 +05:30
David Rowley 2e281249af Add POPCNT support for MSVC x86_64 builds
02a6a54ec added code to make use of the POPCNT instruction when available
for many of our common platforms.  Here we do the same for MSVC for x86_64
machines.

MSVC's intrinsic functions for popcnt seem to differ from GCCs in that
they always appear to emit the popcnt instructions.  In GCC the behavior
will depend on if the source file was compiled with -mpopcnt or not.  For
this reason, the MSVC intrinsic function has been lumped into the
pg_popcount*_asm function, however doing that sort of invalidates the name
of that function, so let's rename it to pg_popcount*_fast().

Author: David Rowley
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAApHDvqL3cbbK%3DGzNcwzsNR9Gi%2BaUvTudKkC4XgnQfXirJ_oRQ%40mail.gmail.com
2021-08-09 15:23:48 +12:00
Bruce Momjian d8a75b1308 doc: mention pg_upgrade extension script
Since commit e462856a7a, pg_upgrade automatically creates a script to
update extensions, so mention that instead of ALTER EXTENSION.

Backpatch-through: 9.6
2021-08-08 21:05:46 -04:00
Peter Eisentraut ae03a7c739 Remove some unnecessary casts in format arguments
We can use %zd or %zu directly, no need to cast to int.  Conversely,
some code was casting away from int when it could be using %d
directly.
2021-08-08 22:08:07 +02:00
Tom Lane cf5ce5aa70 Doc: remove bogus <indexterm> items.
Copy-and-pasteo in 665c5855e, evidently.  The 9.6 docs toolchain
whined about duplicate index entries, though our modern toolchain
doesn't.  In any case, these GUCs surely are not about the
default settings of these values.
2021-08-08 15:35:30 -04:00
Peter Eisentraut c1132aae33 Check the size in COPY_POINTER_FIELD
instead of making each caller do it.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-08-08 18:46:34 +02:00
Peter Eisentraut 18fea737b5 Change NestPath node to contain JoinPath node
This makes the structure of all JoinPath-derived nodes the same,
independent of whether they have additional fields.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-08-08 18:46:34 +02:00
Peter Eisentraut 2226b4189b Change SeqScan node to contain Scan node
This makes the structure of all Scan-derived nodes the same,
independent of whether they have additional fields.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-08-08 18:46:34 +02:00
Tom Lane 00116dee5a Rethink regexp engine's backref-related compilation state.
I had committer's remorse almost immediately after pushing cb76fbd7e,
upon finding that removing capturing subexpressions' subREs from the
data structure broke my proposed patch for REG_NOSUB optimization.
Revert that data structure change.  Instead, address the concern
about not changing capturing subREs' endpoints by not changing the
endpoints.  We don't need to, because the point of that bit was just
to ensure that the atom has endpoints distinct from the outer state
pair that we're stringing the branch between.  We already made
suitable states in the parenthesized-subexpression case, so the
additional ones were just useless overhead.  This seems more
understandable than Spencer's original coding, and it ought to be
a shade faster too by saving a few state creations and arc changes.
(I actually see a couple percent improvement on Jacobson's web
corpus, though that's barely above the noise floor so I wouldn't
put much stock in that result.)

Also, fix the logic added by ea1268f63 to ensure that the subRE
recorded in v->subs[subno] is exactly the one with capno == subno.
Spencer's original coding recorded the child subRE of the capture
node, which is okay so far as having the right endpoint states is
concerned, but as of cb76fbd7e the capturing subRE itself always
has those endpoints too.  I think the inconsistency is confusing
for the REG_NOSUB optimization.

As before, backpatch to v14.

Discussion: https://postgr.es/m/0203588E-E609-43AF-9F4F-902854231EE7@enterprisedb.com
2021-08-08 11:56:29 -04:00
David Rowley 75a2d132ea Remove unused function declaration
It appears that check_track_commit_timestamp was declared but has never
been defined in our code base.  Likely this is just leftover cruft from
a development version of the original patch to add commit timestamps.

Let's just remove the useless declaration.  The inclusion of guc.h also
seems surplus to requirements.

Author: Andrey Lepikhov
Discussion: https://postgr.es/m/f49aefb5-edbb-633a-af07-3e777023a94d@postgrespro.ru
2021-08-08 23:27:57 +12:00
Tom Lane cb76fbd7ec Make regexp engine's backref-related compilation state more bulletproof.
Up to now, we remembered the definition of a capturing parenthesis
subexpression by storing a pointer to the associated subRE node.
That was okay before, because that subRE didn't get modified anymore
while parsing the rest of the regexp.  However, in the wake of
commit ea1268f63, that's no longer true: the outer invocation of
parseqatom() feels free to scribble on that subRE.  This seems to
work anyway, because the states we jam into the child atom in the
"prepare a general-purpose state skeleton" stanza aren't really
semantically different from the original endpoints of the child atom.
But that would be mighty easy to break, and it's definitely not how
things worked before.

Between this and the issue fixed in the prior commit, it seems best
to get rid of this dependence on subRE nodes entirely.  We don't need
the whole child subRE for future backrefs, only its starting and ending
NFA states; so let's just store pointers to those.

Also, in the corner case where we make an extra subRE to handle
immediately-nested capturing parentheses, it seems like it'd be smart
to have the extra subRE have the same begin/end states as the original
child subRE does (s/s2 not lp/rp).  I think that linking it from lp to
rp might actually be semantically wrong, though since Spencer's original
code did it that way, I'm not totally certain.  Using s/s2 is certainly
not wrong, in any case.

Per report from Mark Dilger.  Back-patch to v14 where the problematic
patches came in.

Discussion: https://postgr.es/m/0203588E-E609-43AF-9F4F-902854231EE7@enterprisedb.com
2021-08-07 22:27:13 -04:00