Commit Graph

3891 Commits

Author SHA1 Message Date
Tom Lane 73fc2e5bab Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar.
Previously, pull_varnos() took the relids of a PlaceHolderVar as being
equal to the relids in its contents, but that fails to account for the
possibility that we have to postpone evaluation of the PHV due to outer
joins.  This could result in a malformed plan.  The known cases end up
triggering the "failed to assign all NestLoopParams to plan nodes"
sanity check in createplan.c, but other symptoms may be possible.

The right value to use is the join level we actually intend to evaluate
the PHV at.  We can get that from the ph_eval_at field of the associated
PlaceHolderInfo.  However, there are some places that call pull_varnos()
before the PlaceHolderInfos have been created; in that case, fall back
to the conservative assumption that the PHV will be evaluated at its
syntactic level.  (In principle this might result in missing some legal
optimization, but I'm not aware of any cases where it's an issue in
practice.)  Things are also a bit ticklish for calls occurring during
deconstruct_jointree(), but AFAICS the ph_eval_at fields should have
reached their final values by the time we need them.

The main problem in making this work is that pull_varnos() has no
way to get at the PlaceHolderInfos.  We can fix that easily, if a
bit tediously, in HEAD by passing it the planner "root" pointer.
In the back branches that'd cause an unacceptable API/ABI break for
extensions, so leave the existing entry points alone and add new ones
with the additional parameter.  (If an old entry point is called and
encounters a PHV, it'll fall back to using the syntactic level,
again possibly missing some valid optimization.)

Back-patch to v12.  The computation is surely also wrong before that,
but it appears that we cannot reach a bad plan thanks to join order
restrictions imposed on the subquery that the PlaceHolderVar came from.
The error only became reachable when commit 4be058fe9 allowed trivial
subqueries to be collapsed out completely, eliminating their join order
restrictions.

Per report from Stephan Springl.

Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
2021-01-21 15:37:23 -05:00
Tom Lane a57dbfcda5 Disable vacuum page skipping in selected test cases.
By default VACUUM will skip pages that it can't immediately get
exclusive access to, which means that even activities as harmless
and unpredictable as checkpoint buffer writes might prevent a page
from being processed.  Ordinarily this is no big deal, but we have
a small number of test cases that examine the results of VACUUM's
processing and therefore will fail if the page of interest is skipped.
This seems to be the explanation for some rare buildfarm failures.
To fix, add the DISABLE_PAGE_SKIPPING option to the VACUUM commands
in tests where this could be an issue.

In passing, remove a duplicated query in pageinspect/sql/page.sql.

Back-patch as necessary (some of these cases are as old as v10).

Discussion: https://postgr.es/m/413923.1611006484@sss.pgh.pa.us
2021-01-20 11:49:29 -05:00
Fujii Masao 546f143740 postgres_fdw: Fix connection leak.
In postgres_fdw, the cached connections to foreign servers will not be
closed until the local session exits if the user mappings or foreign servers
that those connections depend on are dropped. Those connections can be
leaked.

To fix that connection leak issue, after a change to a pg_foreign_server
or pg_user_mapping catalog entry, this commit makes postgres_fdw close
the connections depending on that entry immediately if current
transaction has not used those connections yet. Otherwise, mark those
connections as invalid and then close them at the end of current transaction,
since they cannot be closed in the midst of the transaction using them.
Closed connections will be remade at the next opportunity if necessary.

Back-patch to all supported branches.

Author: Bharath Rupireddy
Reviewed-by: Zhihong Yu, Zhijie Hou, Fujii Masao
Discussion: https://postgr.es/m/CALj2ACVNcGH_6qLY-4_tXz8JLvA+4yeBThRfxMz7Oxbk1aHcpQ@mail.gmail.com
2020-12-28 19:57:51 +09:00
Tom Lane 0217ad8066 Fix race condition between shutdown and unstarted background workers.
If a database shutdown (smart or fast) is commanded between the time
some process decides to request a new background worker and the time
that the postmaster can launch that worker, then nothing happens
because the postmaster won't launch any bgworkers once it's exited
PM_RUN state.  This is fine ... unless the requesting process is
waiting for that worker to finish (or even for it to start); in that
case the requestor is stuck, and only manual intervention will get us
to the point of being able to shut down.

To fix, cancel pending requests for workers when the postmaster sends
shutdown (SIGTERM) signals, and similarly cancel any new requests that
arrive after that point.  (We can optimize things slightly by only
doing the cancellation for workers that have waiters.)  To fit within
the existing bgworker APIs, the "cancel" is made to look like the
worker was started and immediately stopped, causing deregistration of
the bgworker entry.  Waiting processes would have to deal with
premature worker exit anyway, so this should introduce no bugs that
weren't there before.  We do have a side effect that registration
records for restartable bgworkers might disappear when theoretically
they should have remained in place; but since we're shutting down,
that shouldn't matter.

Back-patch to v10.  There might be value in putting this into 9.6
as well, but the management of bgworkers is a bit different there
(notably see 8ff518699) and I'm not convinced it's worth the effort
to validate the patch for that branch.

Discussion: https://postgr.es/m/661570.1608673226@sss.pgh.pa.us
2020-12-24 17:00:43 -05:00
Tom Lane 4b0292253c Improve autoprewarm's handling of early-shutdown scenarios.
Bad things happen if the DBA issues "pg_ctl stop -m fast" before
autoprewarm finishes loading its list of blocks to prewarm.
The current worker process successfully terminates early, but
(if this wasn't the last database with blocks to prewarm) the
leader process will just try to launch another worker for the
next database.  Since the postmaster is now in PM_WAIT_BACKENDS
state, it ignores the launch request, and the leader just sits
until it's killed manually.

This is mostly the fault of our half-baked design for launching
background workers, but a proper fix for that is likely to be
too invasive to be back-patchable.  To ameliorate the situation,
fix apw_load_buffers() to check whether SIGTERM has arrived
just before trying to launch another worker.  That leaves us with
only a very narrow window in each worker launch where SIGTERM
could occur between the launch request and successful worker start.

Another issue is that if the leader process does manage to exit,
it unconditionally rewrites autoprewarm.blocks with only the
blocks currently in shared buffers, thus forgetting any blocks
that we hadn't reached yet while prewarming.  This seems quite
unhelpful, since the next database start will then not have the
expected prewarming benefit.  Fix it to not modify the file if
we shut down before the initial load attempt is complete.

Per bug #16785 from John Thompson.  Back-patch to v11 where
the autoprewarm code was introduced.

Discussion: https://postgr.es/m/16785-c0207d8c67fb5f25@postgresql.org
2020-12-22 13:23:49 -05:00
Michael Paquier dfd8bf2b92 pgcrypto: Detect errors with EVP calls from OpenSSL
The following routines are called within pgcrypto when handling digests
but there were no checks for failures:
- EVP_MD_CTX_size (can fail with -1 as of 3.0.0)
- EVP_MD_CTX_block_size (can fail with -1 as of 3.0.0)
- EVP_DigestInit_ex
- EVP_DigestUpdate
- EVP_DigestFinal_ex

A set of elog(ERROR) is added by this commit to detect such failures,
that should never happen except in the event of a processing failure
internal to OpenSSL.

Note that it would be possible to use ERR_reason_error_string() to get
more context about such errors, but these refer mainly to the internals
of OpenSSL, so it is not really obvious how useful that would be.  This
is left out for simplicity.

Per report from Coverity.  Thanks to Tom Lane for the discussion.

Backpatch-through: 9.5
2020-12-08 15:22:38 +09:00
Heikki Linnakangas 74d6fb0a03 Remove leftover comments, left behind by removal of WITH OIDS.
Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGaRoF3XrhPW-Y7P%2BG7bKo84Z_h%3DkQHvMh-80%3Dav3wmOw%40mail.gmail.com
2020-11-30 10:29:18 +02:00
Andrew Gierth 48ab1fa304 pg_trgm: fix crash in 2-item picksplit
Whether from size overflow in gistSplit or from secondary splits,
picksplit is (rarely) called with exactly two items to split.

Formerly, due to special-case handling of the last item, this would
lead to access to an uninitialized cache entry; prior to PG 13 this
might have been harmless or at worst led to an incorrect union datum,
but in 13 onwards it can cause a backend crash from using an
uninitialized pointer.

Repair by removing the special case, which was deemed not to have been
appropriate anyway. Backpatch all the way, because this bug has
existed since pg_trgm was added.

Per report on IRC from user "ftzdomino". Analysis and testing by me,
patch from Alexander Korotkov.

Discussion: https://postgr.es/m/87k0usfdxg.fsf@news-spur.riddles.org.uk
2020-11-12 14:59:06 +00:00
Alexander Korotkov 6058f22324 Fix typo in contrib/pg_trgm/pg_trgm--1.4--1.5.sql
Backpatch-through: 13
2020-11-12 08:55:34 +03:00
Alexander Korotkov 065683bbdb Fix name of the macro for getting signature length trgm_gist.c
911e702077 has introduced the opclass parameters including signature length
for a set of GiST opclasses.  Due to copy-pasting, macro for getting the
signature length in trgm_gist.c was named LTREE_GET_ASIGLEN().  Fix that by
renaming this macro to just GET_SIGLEN().

Backpatch-through: 13
2020-11-12 06:38:17 +03:00
Tom Lane afce7908d7 Fix and simplify some usages of TimestampDifference().
Introduce TimestampDifferenceMilliseconds() to simplify callers
that would rather have the difference in milliseconds, instead of
the select()-oriented seconds-and-microseconds format.  This gets
rid of at least one integer division per call, and it eliminates
some apparently-easy-to-mess-up arithmetic.

Two of these call sites were in fact wrong:

* pg_prewarm's autoprewarm_main() forgot to multiply the seconds
by 1000, thus ending up with a delay 1000X shorter than intended.
That doesn't quite make it a busy-wait, but close.

* postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute
microseconds not milliseconds, thus ending up with a delay 1000X longer
than intended.  Somebody along the way had noticed this problem but
misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather
than fixing the units.  This was relatively harmless in context, because
we don't care that much about exactly how long this delay is; still,
it's wrong.

There are a few more callers of TimestampDifference() that don't
have a direct need for seconds-and-microseconds, but can't use
TimestampDifferenceMilliseconds() either because they do need
microsecond precision or because they might possibly deal with
intervals long enough to overflow 32-bit milliseconds.  It might be
worth inventing another API to improve that, but that seems outside
the scope of this patch; so those callers are untouched here.

Given the fact that we are fixing some bugs, and the likelihood
that future patches might want to back-patch code that uses this
new API, back-patch to all supported branches.

Alexey Kondratov and Tom Lane

Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru
2020-11-10 22:51:55 -05:00
Noah Misch c90c84b3f7 In security-restricted operations, block enqueue of at-commit user code.
Specifically, this blocks DECLARE ... WITH HOLD and firing of deferred
triggers within index expressions and materialized view queries.  An
attacker having permission to create non-temp objects in at least one
schema could execute arbitrary SQL functions under the identity of the
bootstrap superuser.  One can work around the vulnerability by disabling
autovacuum and not manually running ANALYZE, CLUSTER, REINDEX, CREATE
INDEX, VACUUM FULL, or REFRESH MATERIALIZED VIEW.  (Don't restore from
pg_dump, since it runs some of those commands.)  Plain VACUUM (without
FULL) is safe, and all commands are fine when a trusted user owns the
target object.  Performance may degrade quickly under this workaround,
however.  Back-patch to 9.5 (all supported versions).

Reviewed by Robert Haas.  Reported by Etienne Stalmans.

Security: CVE-2020-25695
2020-11-09 07:32:12 -08:00
Michael Paquier 1bd9b2b236 Fix potential memory leak in pgcrypto
When allocating a EVP context, it would have been possible to leak some
memory allocated directly by OpenSSL, that PostgreSQL lost track of if
the initialization of the context allocated failed.  The cleanup can be
done with EVP_MD_CTX_destroy().

Note that EVP APIs exist since OpenSSL 0.9.7 and we have in the tree
equivalent implementations for older versions since ce9b75d (code
removed with 9b7cd59a as of 10~).  However, in 9.5 and 9.6, the existing
code makes use of EVP_MD_CTX_destroy() and EVP_MD_CTX_create() without
an equivalent implementation when building the tree with OpenSSL 0.9.6
or older, meaning that this code is in reality broken with such versions
since it got introduced in e2838c5.  As we have heard no complains about
that, it does not seem worth bothering with in 9.5 and 9.6, so I have
left that out for simplicity.

Author: Michael Paquier
Discussion: https://postgr.es/m/20201015072212.GC2305@paquier.xyz
Backpatch-through: 9.5
2020-10-19 09:37:50 +09:00
Tom Lane 3d338a46a4 Add missing error check in pgcrypto/crypt-md5.c.
In theory, the second px_find_digest call in px_crypt_md5 could fail
even though the first one succeeded, since resource allocation is
required.  Don't skip testing for a failure.  (If one did happen,
the likely result would be a crash rather than clean recovery from
an OOM failure.)

The code's been like this all along, so back-patch to all supported
branches.

Daniel Gustafsson

Discussion: https://postgr.es/m/AA8D6FE9-4AB2-41B4-98CB-AE64BA668C03@yesql.se
2020-10-16 11:59:25 -04:00
Peter Geoghegan c287f58586 Fix amcheck child check pg_upgrade bug.
Commit d114cc53 overlooked the fact that pg_upgrade'd B-Tree indexes
have leaf page high keys whose offset numbers do not match the one from
the copy of the tuple one level up (the copy stored with a downlink for
leaf page's right sibling page).  This led to false positive reports of
corruption from bt_index_parent_check() when it was called to verify a
pg_upgrade'd index.

To fix, skip comparing the offset number on pg_upgrade'd B-Tree indexes.

Author: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Andrew Bille <andrewbille@gmail.com>
Diagnosed-By: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Bug: #16619
Discussion: https://postgr.es/m/16619-aaba10f83fdc1c3c@postgresql.org
Backpatch: 13-, where child check was enhanced.
2020-09-16 10:42:28 -07:00
Noah Misch 41dae35532 Move connect.h from fe_utils to src/include/common.
Any libpq client can use the header.  Clients include backend components
postgres_fdw, dblink, and logical replication apply worker.  Back-patch
to v10, because another fix needs this.  In released branches, just copy
the header and keep the original.
2020-08-10 09:22:58 -07:00
Tom Lane 98ca64899c Make contrib modules' installation scripts more secure.
Hostile objects located within the installation-time search_path could
capture references in an extension's installation or upgrade script.
If the extension is being installed with superuser privileges, this
opens the door to privilege escalation.  While such hazards have existed
all along, their urgency increases with the v13 "trusted extensions"
feature, because that lets a non-superuser control the installation path
for a superuser-privileged script.  Therefore, make a number of changes
to make such situations more secure:

* Tweak the construction of the installation-time search_path to ensure
that references to objects in pg_catalog can't be subverted; and
explicitly add pg_temp to the end of the path to prevent attacks using
temporary objects.

* Disable check_function_bodies within installation/upgrade scripts,
so that any security gaps in SQL-language or PL-language function bodies
cannot create a risk of unwanted installation-time code execution.

* Adjust lookup of type input/receive functions and join estimator
functions to complain if there are multiple candidate functions.  This
prevents capture of references to functions whose signature is not the
first one checked; and it's arguably more user-friendly anyway.

* Modify various contrib upgrade scripts to ensure that catalog
modification queries are executed with secure search paths.  (These
are in-place modifications with no extension version changes, since
it is the update process itself that is at issue, not the end result.)

Extensions that depend on other extensions cannot be made fully secure
by these methods alone; therefore, revert the "trusted" marking that
commit eb67623c9 applied to earthdistance and hstore_plperl, pending
some better solution to that set of issues.

Also add documentation around these issues, to help extension authors
write secure installation scripts.

Patch by me, following an observation by Andres Freund; thanks
to Noah Misch for review.

Security: CVE-2020-14350
2020-08-10 10:44:42 -04:00
Peter Geoghegan be9bdab983 amcheck: Sanitize metapage's allequalimage field.
This will be helpful if it ever proves necessary to revoke an opclass's
support for deduplication.

Backpatch: 13-, where nbtree deduplication was introduced.
2020-08-06 15:25:47 -07:00
Peter Geoghegan 725b43b9c3 Restore lost amcheck TOAST test coverage.
Commit eba77534 fixed an amcheck false positive bug involving
inconsistencies in TOAST input state between table and index.  A test
case was added that verified that such an inconsistency didn't result in
a spurious corruption related error.

Test coverage from the test was accidentally lost by commit 501e41dd,
which propagated ALTER TABLE ...  SET STORAGE attstorage state to
indexes.  This broke the test because the test specifically relied on
attstorage not being propagated.  This artificially forced there to be
index tuples whose datums were equivalent to the datums in the heap
without the datums actually being bitwise equal.

Fix this by updating pg_attribute directly instead.  Commit 501e41dd
made similar changes to a test_decoding TOAST-related test case which
made the same assumption, but overlooked the amcheck test case.

Backpatch: 11-, just like commit eba77534 (and commit 501e41dd).
2020-07-31 15:34:26 -07:00
Michael Paquier 0caf1fc6e8 Fix corner case with 16kB-long decompression in pgcrypto, take 2
A compressed stream may end with an empty packet.  In this case
decompression finishes before reading the empty packet and the
remaining stream packet causes a failure in reading the following
data.  This commit makes sure to consume such extra data, avoiding a
failure when decompression the data.  This corner case was reproducible
easily with a data length of 16kB, and existed since e94dd6a.  A cheap
regression test is added to cover this case based on a random,
incompressible string.

The first attempt of this patch has allowed to find an older failure
within the compression logic of pgcrypto, fixed by b9b6105.  This
involved SLES 15 with z390 where a custom flavor of libz gets used.
Bonus thanks to Mark Wong for providing access to the specific
environment.

Reported-by: Frank Gagnepain
Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/16476-692ef7b84e5fb893@postgresql.org
Backpatch-through: 9.5
2020-07-27 15:58:54 +09:00
Tom Lane 7dab4569d1 Fix ancient violation of zlib's API spec.
contrib/pgcrypto mishandled the case where deflate() does not consume
all of the offered input on the first try.  It reset the next_in pointer
to the start of the input instead of leaving it alone, causing the wrong
data to be fed to the next deflate() call.

This has been broken since pgcrypto was committed.  The reason for the
lack of complaints seems to be that it's fairly hard to get stock zlib
to not consume all the input, so long as the output buffer is big enough
(which it normally would be in pgcrypto's usage; AFAICT the input is
always going to be packetized into packets no larger than ZIP_OUT_BUF).
However, IBM's zlibNX implementation for AIX evidently will do it
in some cases.

I did not add a test case for this, because I couldn't find one that
would fail with stock zlib.  When we put back the test case for
bug #16476, that will cover the zlibNX situation well enough.

While here, write deflate()'s second argument as Z_NO_FLUSH per its
API spec, instead of hard-wiring the value zero.

Per buildfarm results and subsequent investigation.

Discussion: https://postgr.es/m/16476-692ef7b84e5fb893@postgresql.org
2020-07-23 17:20:01 -04:00
Michael Paquier 9f5162ef43 Revert "Fix corner case with PGP decompression in pgcrypto"
This reverts commit 9e10898, after finding out that buildfarm members
running SLES 15 on z390 complain on the compression and decompression
logic of the new test: pipistrelles, barbthroat and steamerduck.

Those hosts are visibly using hardware-specific changes to improve zlib
performance, requiring more investigation.

Thanks to Tom Lane for the discussion.

Discussion: https://postgr.es/m/20200722093749.GA2564@paquier.xyz
Backpatch-through: 9.5
2020-07-23 08:29:14 +09:00
Michael Paquier 35e142202b Fix corner case with PGP decompression in pgcrypto
A compressed stream may end with an empty packet, and PGP decompression
finished before reading this empty packet in the remaining stream.  This
caused a failure in pgcrypto, handling this case as corrupted data.
This commit makes sure to consume such extra data, avoiding a failure
when decompression the entire stream.  This corner case was reproducible
with a data length of 16kB, and existed since its introduction in
e94dd6a.  A cheap regression test is added to cover this case.

Thanks to Jeff Janes for the extra investigation.

Reported-by: Frank Gagnepain
Author: Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/16476-692ef7b84e5fb893@postgresql.org
Backpatch-through: 9.5
2020-07-22 14:52:36 +09:00
Jeff Davis 926ecf83c0 Revert "Use CP_SMALL_TLIST for hash aggregate"
This reverts commit 4cad2534da due to a
performance regression. It will be replaced by a new approach in an
upcoming commit.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20200614181418.mx4bvljmfkkhoqzl@alap3.anarazel.de
Backpatch-through: 13
2020-07-12 23:08:16 -07:00
Peter Eisentraut ffb23488af doc: Spell checking 2020-07-05 15:38:14 +02:00
Joe Conway 0025c3a2c2 Read until EOF vice stat-reported size in read_binary_file
read_binary_file(), used by SQL functions pg_read_file() and friends,
uses stat to determine file length to read, when not passed an explicit
length as an argument. This is problematic, for example, if the file
being read is a virtual file with a stat-reported length of zero.
Arrange to read until EOF, or StringInfo data string lenth limit, is
reached instead.

Original complaint and patch by me, with significant review, corrections,
advice, and code optimizations by Tom Lane. Backpatched to v11. Prior to
that only paths relative to the data and log dirs were allowed for files,
so no "zero length" files were reachable anyway.

Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/flat/969b8d82-5bb2-5fa8-4eb1-f0e685c5d736%40joeconway.com
Backpatch-through: 11
2020-07-04 06:28:21 -04:00
Fujii Masao 8d459762b1 Change default of pg_stat_statements.track_planning to off.
Since v13 pg_stat_statements is allowed to track the planning time of
statements when track_planning option is enabled. Its default was on.

But this feature could cause more terrible spinlock contentions in
pg_stat_statements. As a result of this, Robins Tharakan reported that
v13 beta1 showed ~45% performance drop at high DB connection counts
(when compared with v12.3) during fully-cached SELECT-only test using
pgbench.

To avoid this performance regression by the default setting,
this commit changes default of pg_stat_statements.track_planning to off.

Back-patch to v13 where pg_stat_statements.track_planning was introduced.

Reported-by: Robins Tharakan
Author: Fujii Masao
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/2895b53b033c47ccb22972b589050dd9@EX13D05UWC001.ant.amazon.com
2020-07-03 11:37:24 +09:00
Peter Eisentraut a5202889b4 Spelling adjustments
similar to 0fd2a79a63
2020-06-09 10:42:28 +02:00
Tom Lane 044c99bc56 Use query collation, not column's collation, while examining statistics.
Commit 5e0928005 changed the planner so that, instead of blindly using
DEFAULT_COLLATION_OID when invoking operators for selectivity estimation,
it would use the collation of the column whose statistics we're
considering.  This was recognized as still being not quite the right
thing, but it seemed like a good incremental improvement.  However,
shortly thereafter we introduced nondeterministic collations, and that
creates cases where operators can fail if they're passed the wrong
collation.  We don't want planning to fail in cases where the query itself
would work, so this means that we *must* use the query's collation when
invoking operators for estimation purposes.

The only real problem this creates is in ineq_histogram_selectivity, where
the binary search might produce a garbage answer if we perform comparisons
using a different collation than the column's histogram is ordered with.
However, when the query's collation is significantly different from the
column's default collation, the estimate we previously generated would be
pretty irrelevant anyway; so it's not clear that this will result in
noticeably worse estimates in practice.  (A follow-on patch will improve
this situation in HEAD, but it seems too invasive for back-patch.)

The patch requires changing the signatures of mcv_selectivity and allied
functions, which are exported and very possibly are used by extensions.
In HEAD, I just did that, but an API/ABI break of this sort isn't
acceptable in stable branches.  Therefore, in v12 the patch introduces
"mcv_selectivity_ext" and so on, with signatures matching HEAD, and makes
the old functions into wrappers that assume DEFAULT_COLLATION_OID should
be used.  That does not match the prior behavior, but it should avoid risk
of failure in most cases.  (In practice, I think most extension datatypes
aren't collation-aware, so the change probably doesn't matter to them.)

Per report from James Lucas.  Back-patch to v12 where the problem was
introduced.

Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
2020-06-05 16:18:50 -04:00
Tomas Vondra 4cad2534da Use CP_SMALL_TLIST for hash aggregate
Commit 1f39bce021 added disk-based hash aggregation, which may spill
incoming tuples to disk. It however did not request projection to make
the tuples as narrow as possible, which may mean having to spill much
more data than necessary (increasing I/O, pushing other stuff from page
cache, etc.).

This adds CP_SMALL_TLIST in places that may use hash aggregation - we do
that only for AGG_HASHED. It's unnecessary for AGG_SORTED, because that
either uses explicit Sort (which already does projection) or pre-sorted
input (which does not need spilling to disk).

Author: Tomas Vondra
Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/20200519151202.u2p2gpiawoaznsv2%40development
2020-05-31 14:43:13 +02:00
Joe Conway 9003b76e16 Initialize dblink remoteConn struct in all cases
Two of the members of rconn were left uninitialized. When
dblink_open() is called without an outer transaction it
handles the initialization for us, but with an outer
transaction it does not. Arrange for initialization
in all cases. Backpatch to all supported versions.

Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/flat/9bd0744f-5f04-c778-c5b3-809efe9c30c7%40joeconway.com#c545909a41664991aca60c4d70a10ce7
2020-05-28 13:44:54 -04:00
Noah Misch 3350fb5d1f Clear some style deviations. 2020-05-21 08:31:16 -07:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Alexander Korotkov 34dae902ca Fix amcheck for page checks concurrent to replay of btree page deletion
amcheck expects at least hikey to always exist on leaf page even if it is
deleted page.  But replica reinitializes page during replay of page deletion,
causing deleted page to have no items.  Thus, replay of page deletion can
cause an error in concurrent amcheck run.

This commit relaxes amcheck expectation making it tolerate deleted page with
no items.

Reported-by: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAPpHfdt_OTyQpXaPJcWzV2N-LNeNJseNB-K_A66qG%3DL518VTFw%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Peter Geoghegan
Backpatch-through: 11
2020-05-14 12:44:44 +03:00
Alvaro Herrera 17cc133f01
Dial back -Wimplicit-fallthrough to level 3
The additional pain from level 4 is excessive for the gain.

Also revert all the source annotation changes to their original
wordings, to avoid back-patching pain.

Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
2020-05-13 15:31:14 -04:00
Alvaro Herrera 87c291e29d
Fix straggler
contrib/pgcrypto did contain an unedited fall-through marker after all.
2020-05-12 16:15:55 -04:00
Peter Eisentraut 501e41dd3c Propagate ALTER TABLE ... SET STORAGE to indexes
When creating a new index, the attstorage setting of the table column
is copied to regular (non-expression) index columns.  But a later
ALTER TABLE ... SET STORAGE is not propagated to indexes, thus
creating an inconsistent and undumpable state.

Discussion: https://www.postgresql.org/message-id/flat/9765d72b-37c0-06f5-e349-2a580aafd989%402ndquadrant.com
2020-05-08 08:39:17 +02:00
Peter Geoghegan 18c117cc56 Remove obsolete amcheck comment.
Oversight in commit d114cc53.
2020-05-05 14:36:54 -07:00
Amit Kapila 69bfaf2e1d Change the display of WAL usage statistics in Explain.
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain.  The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field.  Change the format to make WAL usage
statistics consistent with Buffer usage statistics.

This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.

Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-05-05 08:00:53 +05:30
Peter Geoghegan 20c6905dee Add posting list tuple amcheck test case.
Add a test case to contrib/amcheck that creates coverage of code paths
that are used to verify posting list tuples (tuples created when
deduplication merges together existing tuples to avoid a leaf page
split).
2020-05-04 11:23:44 -07:00
Tom Lane 0da06d9faf Get rid of trailing semicolons in C macro definitions.
Writing a trailing semicolon in a macro is almost never the right thing,
because you almost always want to write a semicolon after each macro
call instead.  (Even if there was some reason to prefer not to, pgindent
would probably make a hash of code formatted that way; so within PG the
rule should basically be "don't do it".)  Thus, if we have a semi inside
the macro, the compiler sees "something;;".  Much of the time the extra
empty statement is harmless, but it could lead to mysterious syntax
errors at call sites.  In perhaps an overabundance of neatnik-ism, let's
run around and get rid of the excess semicolons whereever possible.

The only thing worse than a mysterious syntax error is a mysterious
syntax error that only happens in the back branches; therefore,
backpatch these changes where relevant, which is most of them because
most of these mistakes are old.  (The lack of reported problems shows
that this is largely a hypothetical issue, but still, it could bite
us in some future patch.)

John Naylor and Tom Lane

Discussion: https://postgr.es/m/CACPNZCs0qWTqJ2QUSGJ07B7uvAvzMb-KbG2q+oo+J3tsWN5cqw@mail.gmail.com
2020-05-01 17:28:00 -04:00
Michael Paquier 401aad6704 Rename connection parameters to control min/max SSL protocol version in libpq
The libpq parameters ssl{max|min}protocolversion are renamed to use
underscores, to become ssl_{max|min}_protocol_version.  The related
environment variables still use the names introduced in commit ff8ca5f
that added the feature.

Per complaint from Peter Eisentraut (this was also mentioned by me in
the original patch review but the issue got discarded).

Author: Daniel Gustafsson
Reviewed-by: Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/b319e449-318d-e691-4997-1327e166fcc4@2ndquadrant.com
2020-04-30 13:39:10 +09:00
Alvaro Herrera d0abe78d84
Check slot->restart_lsn validity in a few more places
Lack of these checks could cause visible misbehavior, including
assertion failures.  This was missed in commit c655077639, whereby
restart_lsn becomes invalid when the size limit is exceeded.

Also reword some existing error messages, and add errdetail(), so that
the reported errors all match in spirit.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200408.093710.447591748588426656.horikyota.ntt@gmail.com
2020-04-28 20:39:04 -04:00
Peter Eisentraut 3a89615776 Update Unicode data to Unicode 13.0.0 and CLDR 37 2020-04-24 09:52:59 +02:00
Tom Lane fc576b7c4f Fix cache reference leak in contrib/sepgsql.
fixup_whole_row_references() did the wrong thing with a dropped column,
resulting in a commit-time warning about a cache reference leak.

I (tgl) added a test case exercising this, but back-patched the test
only as far as v10; the patch didn't apply cleanly to 9.6 and it
didn't seem worth the trouble to adapt it.  The bug is pretty old
though, so apply the code change all the way back.

Michael Luo, with cosmetic improvements by me

Discussion: https://postgr.es/m/BYAPR08MB5606D1453D7F50E2AF4D2FD29AD80@BYAPR08MB5606.namprd08.prod.outlook.com
2020-04-16 14:45:54 -04:00
Peter Geoghegan bc3087b626 Harmonize nbtree page split point code.
An nbtree split point can be thought of as a point between two adjoining
tuples from an imaginary version of the page being split that includes
the incoming/new item (in addition to the items that really are on the
page).  These adjoining tuples are called the lastleft and firstright
tuples.

The variables that represent split points contained a field called
firstright, which is an offset number of the first data item from the
original page that goes on the new right page.  The corresponding tuple
from origpage was usually the same thing as the actual firstright tuple,
but not always: the firstright tuple is sometimes the new/incoming item
instead.  This situation seems unnecessarily confusing.

Make things clearer by renaming the origpage offset returned by
_bt_findsplitloc() to "firstrightoff".  We now have a firstright tuple
and a firstrightoff offset number which are comparable to the
newitem/lastleft tuples and the newitemoff/lastleftoff offset numbers
respectively.  Also make sure that we are consistent about how we
describe nbtree page split point state.

Push the responsibility for dealing with pg_upgrade'd !heapkeyspace
indexes down to lower level code, relieving _bt_split() from dealing
with it directly.  This means that we always have a palloc'd left page
high key on the leaf level, no matter what.  This enables simplifying
some of the code (and code comments) within _bt_split().

Finally, restructure the page split code to make it clearer why suffix
truncation (which only takes place during leaf page splits) is
completely different to the first data item truncation that takes place
during internal page splits.  Tuples are marked as having fewer
attributes stored in both cases, and the firstright tuple is truncated
in both cases, so it's easy to imagine somebody missing the distinction.
2020-04-13 16:39:55 -07:00
Andrew Dunstan 7be5d8df1f Use perl warnings pragma consistently
We've had a mixture of the warnings pragma, the -w switch on the shebang
line, and no warnings at all. This patch removes the -w swicth and add
the warnings pragma to all perl sources missing it. It raises the
severity of the TestingAndDebugging::RequireUseWarnings  perlcritic
policy to level 5, so that we catch any future violations.

Discussion: https://postgr.es/m/20200412074245.GB623763@rfd.leadboat.com
2020-04-13 11:55:45 -04:00
Amit Kapila ef08ca113f Cosmetic fixups for WAL usage work.
Reported-by: Justin Pryzby and Euler Taveira
Author: Justin Pryzby and Julien Rouhaud
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-13 15:31:16 +05:30
Peter Geoghegan 20fbb711ef Add contrib/amcheck debug message.
Add a DEBUG1 message indicating that verification of the index structure
is underway.  Also reduce the severity level of the existing "tree
level" debug message to DEBUG1.  It should never have been made DEBUG2.
Any B-Tree index with more than a couple of levels will generally also
have so many pages that the per-page DEBUG2 messages will become
completely unmanageable.

In passing, add a new "Tip" to the docs that advises users that run into
corruption that the debug messages might provide useful additional
context.
2020-04-10 17:44:08 -07:00
Tomas Vondra ba3e76cc57 Consider Incremental Sort paths at additional places
Commit d2d8a229bc introduced Incremental Sort, but it was considered
only in create_ordered_paths() as an alternative to regular Sort. There
are many other places that require sorted input and might benefit from
considering Incremental Sort too.

This patch modifies a number of those places, but not all. The concern
is that just adding Incremental Sort to any place that already adds
Sort may increase the number of paths considered, negatively affecting
planning time, without any benefit. So we've taken a more conservative
approach, based on analysis of which places do affect a set of queries
that did seem practical. This means some less common queries may not
benefit from Incremental Sort yet.

Author: Tomas Vondra
Reviewed-by: James Coleman
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-07 16:43:22 +02:00
Thomas Munro 4c04be9b05 Introduce xid8-based functions to replace txid_XXX.
The txid_XXX family of fmgr functions exposes 64 bit transaction IDs to
users as int8.  Now that we have an SQL type xid8 for FullTransactionId,
define a new set of functions including pg_current_xact_id() and
pg_current_snapshot() based on that.  Keep the old functions around too,
for now.

It's a bit sneaky to use the same C functions for both, but since the
binary representation is identical except for the signedness of the
type, and since older functions are the ones using the wrong signedness,
and since we'll presumably drop the older ones after a reasonable period
of time, it seems reasonable to switch to FullTransactionId internally
and share the code for both.

Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com>
Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
2020-04-07 12:04:32 +12:00
Amit Kapila 33e05f89c5 Add the option to report WAL usage in EXPLAIN and auto_explain.
This commit adds a new option WAL similar to existing option BUFFERS in the
EXPLAIN command.  This option allows to include information on WAL record
generation added by commit df3b181499 in EXPLAIN output.

This also allows the WAL usage information to be displayed via
the auto_explain module.  A new parameter auto_explain.log_wal controls
whether WAL usage statistics are printed when an execution plan is logged.
This parameter has no effect unless auto_explain.log_analyze is enabled.

Author: Julien Rouhaud
Reviewed-by: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-06 08:02:15 +05:30
Amit Kapila 6b466bf5f2 Allow pg_stat_statements to track WAL usage statistics.
This commit adds three new columns in pg_stat_statements output to
display WAL usage statistics added by commit df3b181499.

This commit doesn't bump the version of pg_stat_statements as the
same is done for this release in commit 17e0328224.

Author: Kirill Bychik and Julien Rouhaud
Reviewed-by: Julien Rouhaud, Fujii Masao, Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-05 07:34:04 +05:30
Noah Misch c6b92041d3 Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this.  If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules.  Maintainers of table access
methods should examine that section.

To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL.  A new GUC,
wal_skip_threshold, guides that choice.  If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.

Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node.  Make relcache.c retain entries for certain
dropped relations until end of transaction.

Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.

Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem.  Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.

Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 12:25:34 -07:00
Tom Lane 6dd9f35779 Fix bogus CALLED_AS_TRIGGER() defenses.
contrib/lo's lo_manage() thought it could use
trigdata->tg_trigger->tgname in its error message about
not being called as a trigger.  That naturally led to a core dump.

unique_key_recheck() figured it could Assert that fcinfo->context
is a TriggerData node in advance of having checked that it's
being called as a trigger.  That's harmless in production builds,
and perhaps not that easy to reach in any case, but it's logically
wrong.

The first of these per bug #16340 from William Crowell;
the second from manual inspection of other CALLED_AS_TRIGGER
call sites.

Back-patch the lo.c change to all supported branches, the
other to v10 where the thinko crept in.

Discussion: https://postgr.es/m/16340-591c7449dc7c8c47@postgresql.org
2020-04-03 11:24:56 -04:00
Peter Eisentraut 070c3d3937 Fix whitespace 2020-04-02 08:56:12 +02:00
Fujii Masao 17e0328224 Allow pg_stat_statements to track planning statistics.
This commit makes pg_stat_statements support new GUC
pg_stat_statements.track_planning. If this option is enabled,
pg_stat_statements tracks the planning statistics of the statements,
e.g., the number of times the statement was planned, the total time
spent planning the statement, etc. This feature is useful to check
the statements that it takes a long time to plan. Previously since
pg_stat_statements tracked only the execution statistics, we could
not use that for the purpose.

The planning and execution statistics are stored at the end of
each phase separately. So there are not always one-to-one relationship
between them. For example, if the statement is successfully planned
but fails in the execution phase, only its planning statistics are stored.
This may cause the users to be able to see different pg_stat_statements
results from the previous version. To avoid this,
pg_stat_statements.track_planning needs to be disabled.

This commit bumps the version of pg_stat_statements to 1.8
since it changes the definition of pg_stat_statements function.

Author: Julien Rouhaud, Pascal Legrand, Thomas Munro, Fujii Masao
Reviewed-by: Sergei Kornilov, Tomas Vondra, Yoshikazu Imai, Haribabu Kommi, Tom Lane
Discussion: https://postgr.es/m/CAHGQGwFx_=DO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw@mail.gmail.com
Discussion: https://postgr.es/m/CAEepm=0e59Y_6Q_YXYCTHZkqOc6H2pJ54C_Xe=VFu50Aqqp_sA@mail.gmail.com
Discussion: https://postgr.es/m/DB6PR0301MB21352F6210E3B11934B0DCC790B00@DB6PR0301MB2135.eurprd03.prod.outlook.com
2020-04-02 11:20:19 +09:00
Tom Lane 17ca067995 Clean up parsing of ltree and lquery some more.
Fix lquery parsing to handle repeated flag characters correctly,
and to enforce the max label length correctly in some cases where
it did not before, and to detect empty labels in some cases where
it did not before.

In a more cosmetic vein, use a switch rather than if-then chains to
handle the different states, and avoid unnecessary checks on charlen
when looking for ASCII characters, and factor out multiple copies of
the label length checking code.

Tom Lane and Dmitry Belyavsky

Discussion: https://postgr.es/m/CADqLbzLVkBuPX0812o+z=c3i6honszsZZ6VQOSKR3VPbB56P3w@mail.gmail.com
2020-04-01 19:44:17 -04:00
Tom Lane 949a9f043e Add support for binary I/O of ltree, lquery, and ltxtquery types.
Not much to say here --- does what it says on the tin.  The "binary"
representation in each case is really just the same as the text format,
though we prefix a version-number byte in case anyone ever feels
motivated to change that.  Thus, there's not any expectation of improved
speed or reduced space; the point here is just to allow clients to use
binary format for all columns of a query result or COPY data.

This makes use of the recently added ALTER TYPE support to add binary
I/O functions to an existing data type.  As in commit a80818605,
we can piggy-back on there already being a new-for-v13 version of the
ltree extension, so we don't need a new update script file.

Nino Floris, reviewed by Alexander Korotkov and myself

Discussion: https://postgr.es/m/CANmj9Vxx50jOo1L7iSRxd142NyTz6Bdcgg7u9P3Z8o0=HGkYyQ@mail.gmail.com
2020-04-01 17:31:29 -04:00
Tom Lane a80818605e Improve selectivity estimation for assorted match-style operators.
Quite a few matching operators such as JSONB's @> used "contsel" and
"contjoinsel" as their selectivity estimators.  That was a bad idea,
because (a) contsel is only a stub, yielding a fixed default estimate,
and (b) that default is 0.001, meaning we estimate these operators as
five times more selective than equality, which is surely pretty silly.

There's a good model for improving this in ltree's ltreeparentsel():
for any "var OP constant" query, we can try applying the operator
to all of the column's MCV and histogram values, taking the latter
as being a random sample of the non-MCV values.  That code is
actually 100% generic, except for the question of exactly what
default selectivity ought to be plugged in when we don't have stats.

Hence, migrate the guts of ltreeparentsel() into the core code, provide
wrappers "matchingsel" and "matchingjoinsel" with a more-appropriate
default estimate, and use those for the non-geometric operators that
formerly used contsel (mostly JSONB containment operators and tsquery
matching).

Also apply this code to some match-like operators in hstore, ltree, and
pg_trgm, including the former users of ltreeparentsel as well as ones
that improperly used contsel.  Since commit 911e70207 just created new
versions of those extensions that we haven't released yet, we can sneak
this change into those new versions instead of having to create an
additional generation of update scripts.

Patch by me, reviewed by Alexey Bashtanov

Discussion: https://postgr.es/m/12237.1582833074@sss.pgh.pa.us
2020-04-01 10:32:33 -04:00
Alvaro Herrera 69360b3458
Remove header noise from test_decoding test
Use psql's expanded output to avoid a pointless header.

Kyotaro Horiguchi, after an idea of Michael Paquier

Discussion: https://postgr.es/m/20181120050744.GJ4400@paquier.xyz
2020-03-31 16:43:14 -03:00
Tom Lane 70dc4c509b Fix lquery's NOT handling, and add ability to quantify non-'*' items.
The existing implementation of the ltree ~ lquery match operator is
sufficiently complex and undocumented that it's hard to tell exactly
what it does.  But one thing it clearly gets wrong is the combination
of NOT symbols (!) and '*' symbols.  A pattern such as '*.!foo.*'
should, by any ordinary understanding of regular expression behavior,
match any ltree that has at least one label that's not "foo".  As best
we can tell by experimentation, what it's actually matching is any
ltree in which *no* label is "foo".  That's surprising, and not at all
what the documentation says.

Now, that's arguably a useful behavior, so if we rewrite to fix the
bug we should provide some other way to get it.  To do so, add the
ability to attach lquery quantifiers to non-'*' items as well as '*'s.
Then the pattern '!foo{,}' expresses "any ltree in which no label is
foo".  For backwards compatibility, the default quantifier for non-'*'
items has to be "{1}", although the default for '*' items is '{,}'.
I wouldn't have done it like that in a green field, but it's not
totally horrible.

Armed with that, rewrite checkCond() from scratch.  Treating '*' and
non-'*' items alike makes it simpler, not more complicated, so that
the function actually gets a lot shorter than it was.

Filip Rembiałkowski, Tom Lane, Nikita Glukhov, per a very
ancient bug report from M. Palm

Discussion: https://postgr.es/m/CAP_rww=waX2Oo6q+MbMSiZ9ktdj6eaJj0cQzNu=Ry2cCDij5fw@mail.gmail.com
2020-03-31 11:14:42 -04:00
Tom Lane e07e2a40bd Improve error messages in ltree_in and lquery_in.
Ensure that the type name is mentioned in all cases (bare "syntax error"
isn't that helpful).  Avoid using the term "level", since that's not
used in the documentation.  Phrase error position reports as "at
character N" not "in position N"; the latter seems ambiguous, and it's
certainly not how we say it elsewhere.  For the same reason, make the
character position values be 1-based not 0-based.  Provide a position
in more cases.  (I continued to leave that out of messages complaining
about end-of-input, where it seemed pointless, as well as messages
complaining about overall input complexity, where fingering any one part
of the input would be arbitrary.)

Discussion: https://postgr.es/m/15582.1585529626@sss.pgh.pa.us
2020-03-31 11:14:42 -04:00
Alexander Korotkov 911e702077 Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing.  These index AMs are GiST, GIN,
SP-GiST and BRIN.  There opclasses define representation of keys, operations on
them and supported search strategies.  So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision.  This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.

This commit doesn't introduce new storage in system catalog.  Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.

In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions.  Options
are set to fn_expr as the constant bytea expression.  It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.

This commit comes with some examples of opclass options usage.  We parametrize
signature length in GiST.  That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops.  Also we parametrize maximum number of integer ranges for
gist__int_ops.  However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.

Catversion is bumped.

Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 19:17:23 +03:00
Fujii Masao 4a539a25eb Expose BufferUsageAccumDiff().
Previously pg_stat_statements calculated the difference of buffer counters
by its own code even while BufferUsageAccumDiff() had the same code.
This commit expose BufferUsageAccumDiff() and makes pg_stat_statements
use it for the calculation, in order to simply the code.

This change also would be useful for the upcoming patch for the planning
counters in pg_stat_statements because the patch will add one more code
for the calculation of difference of buffer counters and that can easily be
done by using BufferUsageAccumDiff().

Author: Julien Rouhaud
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/bdfee4e0-a304-2498-8da5-3cb52c0a193e@oss.nttdata.com
2020-03-30 12:15:26 +09:00
Tom Lane 2743d9ae4a Cosmetic improvements in ltree code.
Add more comments in ltree.h, and correct a misstatement or two.

Use a symbol, rather than hardwired constants, for the maximum length
of an ltree label.  The max length is still hardwired in the associated
error messages, but I want to clean that up as part of a separate patch
to improve the error messages.
2020-03-29 19:14:15 -04:00
Tom Lane 9950c8aadf Fix lquery's behavior for consecutive '*' items.
Something like "*{2}.*{3}" should presumably mean the same as
"*{5}", but it didn't.  Improve that.

Get rid of an undocumented and remarkably ugly (though not, as far as
I can tell, actually unsafe) static variable in favor of passing more
arguments to checkCond().

Reverse-engineer some commentary.  This function, like all of ltree,
is still far short of what I would consider the minimum acceptable
level of internal documentation, but at least now it has more than
zero comments.

Although this certainly seems like a bug fix, people might not
thank us for changing query behavior in stable branches, so
no back-patch.

Nikita Glukhov, with cosmetic improvements by me

Discussion: https://postgr.es/m/CAP_rww=waX2Oo6q+MbMSiZ9ktdj6eaJj0cQzNu=Ry2cCDij5fw@mail.gmail.com
2020-03-28 18:31:05 -04:00
Tom Lane 95f7ddfdad Protect against overflow of ltree.numlevel and lquery.numlevel.
These uint16 fields could be overflowed by excessively long input,
producing strange results.  Complain for invalid input.

Likewise check for out-of-range values of the repeat counts in lquery.
(We don't try too hard on that one, notably not bothering to detect
if atoi's result has overflowed.)

Also detect length overflow in ltree_concat.

In passing, be more consistent about whether "syntax error" messages
include the type name.  Also, clarify the documentation about what
the size limit is.

This has been broken for a long time, so back-patch to all supported
branches.

Nikita Glukhov, reviewed by Benjie Gillam and Tomas Vondra

Discussion: https://postgr.es/m/CAP_rww=waX2Oo6q+MbMSiZ9ktdj6eaJj0cQzNu=Ry2cCDij5fw@mail.gmail.com
2020-03-28 17:09:51 -04:00
Noah Misch de9396326e Revert "Skip WAL for new relfilenodes, under wal_level=minimal."
This reverts commit cb2fd7eac2.  Per
numerous buildfarm members, it was incompatible with parallel query, and
a test case assumed LP64.  Back-patch to 9.5 (all supported versions).

Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
2020-03-22 09:24:09 -07:00
Noah Misch cb2fd7eac2 Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this.  If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules.  Maintainers of table access
methods should examine that section.

To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL.  A new GUC,
wal_skip_threshold, guides that choice.  If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.

Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node.  Make relcache.c retain entries for certain
dropped relations until end of transaction.

Back-patch to 9.5 (all supported versions).  This introduces a new WAL
record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC.  As
always, update standby systems before master systems.  This changes
sizeof(RelationData) and sizeof(IndexStmt), breaking binary
compatibility for affected extensions.  (The most recent commit to
affect the same class of extensions was
089e4d405d0f3b94c74a2c6a54357a84a681754b.)

Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem.  Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.

Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-03-21 09:38:26 -07:00
Amit Kapila b4f140869f Add missing errcode() in a few ereport calls.
This will allow to specifying SQLSTATE error code for the errors in the
missing places.

Reported-by: Sawada Masahiko
Author: Sawada Masahiko
Backpatch-through: 9.5
Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-18 09:27:14 +05:30
Tom Lane 41b45576d5 Remove useless pfree()s at the ends of various ValuePerCall SRFs.
We don't need to manually clean up allocations in a SRF's
multi_call_memory_ctx, because the SRF_RETURN_DONE infrastructure
takes care of that (and also ensures that it will happen even if the
function never gets a final call, which simple manual cleanup cannot
do).

Hence, the code removed by this patch is a waste of code and cycles.
Worse, it gives the impression that cleaning up manually is a thing,
which can lead to more serious errors such as those fixed in
commits 085b6b667 and b4570d33a.  So we should get rid of it.

These are not quite actual bugs though, so I couldn't muster the
enthusiasm to back-patch.  Fix in HEAD only.

Justin Pryzby

Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
2020-03-16 21:36:53 -04:00
Tom Lane b4570d33aa Avoid holding a directory FD open across assorted SRF calls.
This extends the fixes made in commit 085b6b667 to other SRFs with the
same bug, namely pg_logdir_ls(), pgrowlocks(), pg_timezone_names(),
pg_ls_dir(), and pg_tablespace_databases().

Also adjust various comments and documentation to warn against
expecting to clean up resources during a ValuePerCall SRF's final
call.

Back-patch to all supported branches, since these functions were
all born broken.

Justin Pryzby, with cosmetic tweaks by me

Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
2020-03-16 21:05:52 -04:00
Alexander Korotkov d114cc5387 Improve checking of child pages in contrib/amcheck.
This commit eliminates lossiness in check for missing parent downlinks in
B-tree.  Instead of collecting lossy bitmap, we check for missing downlinks
while visiting child pages referenced by downlinks of target level.  We
traverse from previous child page to the subsequent child page by right links.
Intermediate pages are candidates to have lost parent downlinks.

Also this commit introduces matching of child high key to the pivot key of
it's parent.

Discussion: https://postgr.es/m/CAPpHfduoF-c4RhOyOm%3D4-Y367%2B8txq9Q6iM_ty0OYc8si1Abww%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Peter Geoghegan
2020-03-11 12:00:31 +03:00
Tom Lane d01f03a495 Preserve integer and float values accurately in (de)serialize_deflist.
Previously, this code just smashed all types of DefElem values to
strings, cavalierly reasoning that nobody would care.  But in point of
fact, most of the defGetFoo functions do distinguish among different
input syntaxes; for instance defGetBoolean will accept 1 as an integer
but not "1" as a string.  This led to CREATE/ALTER TEXT SEARCH
DICTIONARY accepting 0 and 1 as values for boolean dictionary
properties, only to have the dictionary fail at runtime.

We can upgrade this behavior by teaching serialize_deflist that it
does not need to quote T_Integer or T_Float nodes' values on output,
and then teaching deserialize_deflist to restore unquoted integer or
float values as the appropriate node type.  This should not break
anything using pg_ts_dict.dictinitoption, since that field is just
defined as being something valid to include in CREATE TEXT SEARCH
DICTIONARY.

deserialize_deflist is also used to parse the options arguments
for the ts_headline family of functions, but so far as I can see
this won't cause any problems there either: the only consumer of
that output is prsd_headline which always uses defGetString.
(Really that's a bad idea, but I won't risk changing it here.)

This is surely a bug fix, but given the lack of field complaints
I don't think it's necessary to back-patch.

Discussion: https://postgr.es/m/CAMkU=1xRcs_BUPzR0+V3WndaCAv0E_m3h6aUEJ8NF-sY1nnHsw@mail.gmail.com
2020-03-10 12:30:02 -04:00
Peter Eisentraut 3c173a53a8 Remove utils/acl.h from catalog/objectaddress.h
The need for this was removed by
8b9e9644dc.

A number of files now need to include utils/acl.h or
parser/parse_node.h explicitly where they previously got it indirectly
somehow.

Since parser/parse_node.h already includes nodes/parsenodes.h, the
latter is then removed where the former was added.  Also, remove
nodes/pg_list.h from objectaddress.h, since that's included via
nodes/parsenodes.h.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com
2020-03-10 10:27:00 +01:00
Peter Eisentraut 71d60e2aa0 Add tg_updatedcols to TriggerData
This allows a trigger function to determine for an UPDATE trigger
which columns were actually updated.  This allows some optimizations
in generic trigger functions such as lo_manage and
tsvector_update_trigger.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/11c5f156-67a9-0fb5-8200-2a8018eb2e0c@2ndquadrant.com
2020-03-09 09:34:55 +01:00
Tom Lane 806eb92c01 Add an "absval" parameter to allow contrib/dict_int to ignore signs.
Jeff Janes

Discussion: https://postgr.es/m/CAMkU=1xRcs_BUPzR0+V3WndaCAv0E_m3h6aUEJ8NF-sY1nnHsw@mail.gmail.com
2020-03-08 18:35:06 -04:00
Tom Lane 38ce06c37e Add an explicit test to catch changes in checksumming calculations.
Seems like a good idea in view of 006517432 and addd034ae.

Michael Paquier, Tom Lane

Discussion: https://postgr.es/m/20200306075230.GA118430@paquier.xyz
2020-03-08 15:09:14 -04:00
Peter Geoghegan 691e8b2e18 pageinspect: Fix types used for bt_metap() columns.
The data types that contrib/pageinspect's bt_metap() function were
declared to return as OUT arguments were wrong in some cases.  For
example, the oldest_xact column (a TransactionId/xid field) was declared
integer/int4 within the pageinspect extension's sql file.  This led to
errors when an oldest_xact value that exceeded 2^31-1 was encountered.
Some of the other columns were defined incorrectly ever since
pageinspect was first introduced, though they were far less likely to
produce problems in practice.

Fix these issues by changing the declaration of bt_metap() to
consistently use data types that can reliably represent all possible
values.  This fixes things on HEAD only.  No backpatch, since it doesn't
seem like there is a safe way to fix the issue without including a new
version of the pageinspect extension (HEAD/Postgres 13 already
introduced a new version of the extension).  Besides, the oldest_xact
issue has been around since the release of Postgres 11, and we haven't
heard any complaints about it before now.

Also, throw an error when we detect a bt_metap() declaration that must
be from an old version of the pageinspect extension by examining the
number of attributes from the tuple descriptor for the return tuples.
It seems better to throw an error in a reliable and obvious way
following a Postgres upgrade, rather than letting bt_metap() fail
unpredictably.  The problem is fundamentally with the CREATE FUNCTION
declared data types themselves, so I see no sensible alternative.

Reported-By: Victor Yegorov
Bug: #16285
Discussion: https://postgr.es/m/16285-df8fc1000ab3d5fc@postgresql.org
2020-03-07 16:44:53 -08:00
Tom Lane 36058a3c55 Create contrib/bool_plperl to provide a bool transform for PL/Perl[U].
plperl's default handling of bool arguments or results is not terribly
satisfactory, since Perl doesn't consider the string 'f' to be false.
Ideally we'd just fix that, but the backwards-compatibility hazard
would be substantial.  Instead, build a TRANSFORM module that can
be optionally applied to provide saner semantics.

Perhaps usefully, this is also about the minimum possible skeletal
example of a plperl transform module; so it might be a better starting
point for user-written transform modules than hstore_plperl or
jsonb_plperl.

Ivan Panchenko

Discussion: https://postgr.es/m/1583013317.881182688@f390.i.mail.ru
2020-03-06 17:11:23 -05:00
Tom Lane 3ed2005ff5 Introduce macros for typalign and typstorage constants.
Our usual practice for "poor man's enum" catalog columns is to define
macros for the possible values and use those, not literal constants,
in C code.  But for some reason lost in the mists of time, this was
never done for typalign/attalign or typstorage/attstorage.  It's never
too late to make it better though, so let's do that.

The reason I got interested in this right now is the need to duplicate
some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch.
But in general, this sort of change aids greppability and readability,
so it's a good idea even without any specific motivation.

I may have missed a few places that could be converted, and it's even
more likely that pending patches will re-introduce some hard-coded
references.  But that's not fatal --- there's no expectation that
we'd actually change any of these values.  We can clean up stragglers
over time.

Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us
2020-03-04 10:34:25 -05:00
Peter Eisentraut 1810ca18bf pg_standby: Don't use HAVE_WORKING_LINK
HAVE_WORKING_LINK is meant to indicate support for hard links, mainly
for Windows.  Here it is used for soft links on Unix, and the
functionality is optional anyway, so we can just make it error out
normally if needed.

Discussion: https://www.postgresql.org/message-id/flat/72fff73f-dc9c-4ef4-83e8-d2e60c98df48%402ndquadrant.com
2020-03-03 08:54:44 +01:00
Tom Lane 91f3bd732c Fix possibly-uninitialized variable.
Thinko in 2f9661311.  Per buildfarm, as well as warning seen locally.
2020-03-02 18:41:50 -05:00
Alvaro Herrera 2f9661311b
Represent command completion tags as structs
The backend was using strings to represent command tags and doing string
comparisons in multiple places, but that's slow and unhelpful.  Create a
new command list with a supporting structure to use instead; this is
stored in a tag-list-file that can be tailored to specific purposes with
a caller-definable C macro, similar to what we do for WAL resource
managers.  The first first such uses are a new CommandTag enum and a
CommandTagBehavior struct.

Replace numerous occurrences of char *completionTag with a
QueryCompletion struct so that the code no longer stores information
about completed queries in a cstring.  Only at the last moment, in
EndCommand(), does this get converted to a string.

EventTriggerCacheItem no longer holds an array of palloc’d tag strings
in sorted order, but rather just a Bitmapset over the CommandTags.

Author: Mark Dilger, with unsolicited help from Álvaro Herrera
Reviewed-by: John Naylor, Tom Lane
Discussion: https://postgr.es/m/981A9DB4-3F0C-4DA5-88AD-CB9CFF4D6CAD@enterprisedb.com
2020-03-02 18:19:51 -03:00
Peter Geoghegan 93ee38eade Teach pageinspect about nbtree deduplication.
Add a new bt_metap() column to display the metapage's allequalimage
field.  Also add three new columns to contrib/pageinspect's
bt_page_items() function:

* Add a boolean column ("dead") that displays the LP_DEAD bit value for
each non-pivot tuple.

* Add a TID column ("htid") that displays a single heap TID value for
each tuple.  This is the TID that is returned by BTreeTupleGetHeapTID(),
so comparable values are shown for pivot tuples, plain non-pivot tuples,
and posting list tuples.

* Add a TID array column ("tids") that displays TIDs from each tuple's
posting list, if any.  This works just like the "tids" column from
pageinspect's gin_leafpage_items() function.

No version bump for the pageinspect extension, since there hasn't been a
stable Postgres release since the last version bump (the last bump was
part of commit 58b4cb30).

Author: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-WzmSMmU2eNvY9+a4MNP+z02h6sa-uxZvN3un6jY02ZVBSw@mail.gmail.com
2020-02-29 12:10:17 -08:00
Peter Eisentraut 1933ae629e Add PostgreSQL home page to --help output
Per emerging standard in GNU programs and elsewhere.  Autoconf already
has support for specifying a home page, so we can just that.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/8d389c5f-7fb5-8e48-9a4a-68cec44786fa%402ndquadrant.com
2020-02-28 13:12:21 +01:00
Peter Eisentraut 864934131e Refer to bug report address by symbol rather than hardcoding
Use the PACKAGE_BUGREPORT macro that is created by Autoconf for
referring to the bug reporting address rather than hardcoding it
everywhere.  This makes it easier to change the address and it reduces
translation work.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/8d389c5f-7fb5-8e48-9a4a-68cec44786fa%402ndquadrant.com
2020-02-28 13:12:21 +01:00
Robert Haas 05d8449e73 Move src/backend/utils/hash/hashfn.c to src/common
This also involves renaming src/include/utils/hashutils.h, which
becomes src/include/common/hashfn.h. Perhaps an argument can be
made for keeping the hashutils.h name, but it seemed more
consistent to make it match the name of the file, and also more
descriptive of what is actually going on here.

Patch by me, reviewed by Suraj Kharage and Mark Dilger. Off-list
advice on how not to break the Windows build from Davinder Singh
and Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
2020-02-27 09:25:41 +05:30
Peter Geoghegan 0d861bbb70 Add deduplication to nbtree.
Deduplication reduces the storage overhead of duplicates in indexes that
use the standard nbtree index access method.  The deduplication process
is applied lazily, after the point where opportunistic deletion of
LP_DEAD-marked index tuples occurs.  Deduplication is only applied at
the point where a leaf page split would otherwise be required.  New
posting list tuples are formed by merging together existing duplicate
tuples.  The physical representation of the items on an nbtree leaf page
is made more space efficient by deduplication, but the logical contents
of the page are not changed.  Even unique indexes make use of
deduplication as a way of controlling bloat from duplicates whose TIDs
point to different versions of the same logical table row.

The lazy approach taken by nbtree has significant advantages over a GIN
style eager approach.  Most individual inserts of index tuples have
exactly the same overhead as before.  The extra overhead of
deduplication is amortized across insertions, just like the overhead of
page splits.  The key space of indexes works in the same way as it has
since commit dd299df8 (the commit that made heap TID a tiebreaker
column).

Testing has shown that nbtree deduplication can generally make indexes
with about 10 or 15 tuples for each distinct key value about 2.5X - 4X
smaller, even with single column integer indexes (e.g., an index on a
referencing column that accompanies a foreign key).  The final size of
single column nbtree indexes comes close to the final size of a similar
contrib/btree_gin index, at least in cases where GIN's posting list
compression isn't very effective.  This can significantly improve
transaction throughput, and significantly reduce the cost of vacuuming
indexes.

A new index storage parameter (deduplicate_items) controls the use of
deduplication.  The default setting is 'on', so all new B-Tree indexes
automatically use deduplication where possible.  This decision will be
reviewed at the end of the Postgres 13 beta period.

There is a regression of approximately 2% of transaction throughput with
synthetic workloads that consist of append-only inserts into a table
with several non-unique indexes, where all indexes have few or no
repeated values.  The underlying issue is that cycles are wasted on
unsuccessful attempts at deduplicating items in non-unique indexes.
There doesn't seem to be a way around it short of disabling
deduplication entirely.  Note that deduplication of items in unique
indexes is fairly well targeted in general, which avoids the problem
there (we can use a special heuristic to trigger deduplication passes in
unique indexes, since we're specifically targeting "version bloat").

Bump XLOG_PAGE_MAGIC because xl_btree_vacuum changed.

No bump in BTREE_VERSION, since the representation of posting list
tuples works in a way that's backwards compatible with version 4 indexes
(i.e. indexes built on PostgreSQL 12).  However, users must still
REINDEX a pg_upgrade'd index to use deduplication, regardless of the
Postgres version they've upgraded from.  This is the only way to set the
new nbtree metapage flag indicating that deduplication is generally
safe.

Author: Anastasia Lubennikova, Peter Geoghegan
Reviewed-By: Peter Geoghegan, Heikki Linnakangas
Discussion:
    https://postgr.es/m/55E4051B.7020209@postgrespro.ru
    https://postgr.es/m/4ab6e2db-bcee-f4cf-0916-3a06e6ccbb55@postgrespro.ru
2020-02-26 13:05:30 -08:00
Tom Lane 36390713a6 Fix compile failure.
I forgot that some compilers won't handle #if constructs within
ereport() calls.  Duplicating most of the call is annoying but simple.
Per buildfarm.
2020-02-24 18:43:40 -05:00
Tom Lane 3d475515a1 Account explicitly for long-lived FDs that are allocated outside fd.c.
The comments in fd.c have long claimed that all file allocations should
go through that module, but in reality that's not always practical.
fd.c doesn't supply APIs for invoking some FD-producing syscalls like
pipe() or epoll_create(); and the APIs it does supply for non-virtual
FDs are mostly insistent on releasing those FDs at transaction end;
and in some cases the actual open() call is in code that can't be made
to use fd.c, such as libpq.

This has led to a situation where, in a modern server, there are likely
to be seven or so long-lived FDs per backend process that are not known
to fd.c.  Since NUM_RESERVED_FDS is only 10, that meant we had *very*
few spare FDs if max_files_per_process is >= the system ulimit and
fd.c had opened all the files it thought it safely could.  The
contrib/postgres_fdw regression test, in particular, could easily be
made to fall over by running it under a restrictive ulimit.

To improve matters, invent functions Acquire/Reserve/ReleaseExternalFD
that allow outside callers to tell fd.c that they have or want to allocate
a FD that's not directly managed by fd.c.  Add calls to track all the
fixed FDs in a standard backend session, so that we are honestly
guaranteeing that NUM_RESERVED_FDS FDs remain unused below the EMFILE
limit in a backend's idle state.  The coding rules for these functions say
that there's no need to call them in code that just allocates one FD over
a fairly short interval; we can dip into NUM_RESERVED_FDS for such cases.
That means that there aren't all that many places where we need to worry.
But postgres_fdw and dblink must use this facility to account for
long-lived FDs consumed by libpq connections.  There may be other places
where it's worth doing such accounting, too, but this seems like enough
to solve the immediate problem.

Internally to fd.c, "external" FDs are limited to max_safe_fds/3 FDs.
(Callers can choose to ignore this limit, but of course it's unwise
to do so except for fixed file allocations.)  I also reduced the limit
on "allocated" files to max_safe_fds/3 FDs (it had been max_safe_fds/2).
Conceivably a smarter rule could be used here --- but in practice,
on reasonable systems, max_safe_fds should be large enough that this
isn't much of an issue, so KISS for now.  To avoid possible regression
in the number of external or allocated files that can be opened,
increase FD_MINFREE and the lower limit on max_files_per_process a
little bit; we now insist that the effective "ulimit -n" be at least 64.

This seems like pretty clearly a bug fix, but in view of the lack of
field complaints, I'll refrain from risking a back-patch.

Discussion: https://postgr.es/m/E1izCmM-0005pV-Co@gemulon.postgresql.org
2020-02-24 17:28:33 -05:00
Tom Lane 70a7732007 Remove support for upgrading extensions from "unpackaged" state.
Andres Freund pointed out that allowing non-superusers to run
"CREATE EXTENSION ... FROM unpackaged" has security risks, since
the unpackaged-to-1.0 scripts don't try to verify that the existing
objects they're modifying are what they expect.  Just attaching such
objects to an extension doesn't seem too dangerous, but some of them
do more than that.

We could have resolved this, perhaps, by still requiring superuser
privilege to use the FROM option.  However, it's fair to ask just what
we're accomplishing by continuing to lug the unpackaged-to-1.0 scripts
forward.  None of them have received any real testing since 9.1 days,
so they may not even work anymore (even assuming that one could still
load the previous "loose" object definitions into a v13 database).
And an installation that's trying to go from pre-9.1 to v13 or later
in one jump is going to have worse compatibility problems than whether
there's a trivial way to convert their contrib modules into extension
style.

Hence, let's just drop both those scripts and the core-code support
for "CREATE EXTENSION ... FROM".

Discussion: https://postgr.es/m/20200213233015.r6rnubcvl4egdh5r@alap3.anarazel.de
2020-02-19 16:59:14 -05:00
Amit Kapila e3ff789acf Stop demanding that top xact must be seen before subxact in decoding.
Manifested as

ERROR:  subtransaction logged without previous top-level txn record

this check forbids legit behaviours like
 - First xl_xact_assignment record is beyond reading, i.e. earlier
   restart_lsn.
 - After restart_lsn there is some change of a subxact.
 - After that, there is second xl_xact_assignment (for another subxact)
   revealing the relationship between top and first subxact.

Such a transaction won't be streamed anyway because we hadn't seen it in
full.  Saying for sure whether xact of some record encountered after
the snapshot was deserialized can be streamed or not requires to know
whether it wrote something before deserialization point --if yes, it
hasn't been seen in full and can't be decoded. Snapshot doesn't have such
info, so there is no easy way to relax the check.

Reported-by: Hsu, John
Diagnosed-by: Arseny Sher
Author: Arseny Sher, Amit Kapila
Reviewed-by: Amit Kapila, Dilip Kumar
Backpatch-through: 9.5
Discussion: https://postgr.es/m/AB5978B2-1772-4FEE-A245-74C91704ECB0@amazon.com
2020-02-19 08:15:49 +05:30
Michael Paquier 11f063b0a9 Remove some dead code in contrib/adminpack/
Since its introduction in fe59e56, the code in charge of validating and
converting a file path includes some extra handling for absolute paths
pointing to an external log_directory, but this has never been used.

Author: Antonin Houska
Reviewed-by: Julien Rouhaud, Michael Paquier
Discussion: https://postgr.es/m/32663.1581592539@antos
2020-02-14 12:38:44 +09:00
Tom Lane eb67623c96 Mark some contrib modules as "trusted".
This allows these modules to be installed into a database without
superuser privileges (assuming that the DBA or sysadmin has installed
the module's files in the expected place).  You only need CREATE
privilege on the current database, which by default would be
available to the database owner.

The following modules are marked trusted:

btree_gin
btree_gist
citext
cube
dict_int
earthdistance
fuzzystrmatch
hstore
hstore_plperl
intarray
isn
jsonb_plperl
lo
ltree
pg_trgm
pgcrypto
seg
tablefunc
tcn
tsm_system_rows
tsm_system_time
unaccent
uuid-ossp

In the future we might mark some more modules trusted, but there
seems to be no debate about these, and on the whole it seems wise
to be conservative with use of this feature to start out with.

Discussion: https://postgr.es/m/32315.1580326876@sss.pgh.pa.us
2020-02-13 15:02:35 -05:00
Thomas Munro 701a51fd4e Use pg_pwrite() in more places.
This removes some lseek() system calls.

Author: Thomas Munro
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CA%2BhUKGJ%2BoHhnvqjn3%3DHro7xu-YDR8FPr0FL6LF35kHRX%3D_bUzg%40mail.gmail.com
2020-02-11 17:50:22 +13:00
Tom Lane a069218163 In jsonb_plpython.c, suppress warning message from gcc 10.
Very recent gcc complains that PLyObject_ToJsonbValue could return
a pointer to a local variable.  I think it's wrong; but the coding
is fragile enough, and the savings of one palloc() minimal enough,
that it seems better to just do a palloc() all the time.  (My other
idea of tweaking the if-condition doesn't suppress the warning.)

Back-patch to v11 where this code was introduced.

Discussion: https://postgr.es/m/21547.1580170366@sss.pgh.pa.us
2020-01-30 18:26:12 -05:00
Alvaro Herrera c9d2977519 Clean up newlines following left parentheses
We used to strategically place newlines after some function call left
parentheses to make pgindent move the argument list a few chars to the
left, so that the whole line would fit under 80 chars.  However,
pgindent no longer does that, so the newlines just made the code
vertically longer for no reason.  Remove those newlines, and reflow some
of those lines for some extra naturality.

Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
2020-01-30 13:42:14 -03:00
Alvaro Herrera 4e89c79a52 Remove excess parens in ereport() calls
Cosmetic cleanup, not worth backpatching.

Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
Reviewed-by: Tom Lane, Michael Paquier
2020-01-30 13:32:04 -03:00