The Windows documentation insists that every WSAStartup call should
have a matching WSACleanup call. However, if that ever had actual
relevance, it wasn't in this century. Every remotely-modern Windows
kernel is capable of cleaning up when a process exits without doing
that, and must be so to avoid resource leaks in case of a process
crash. Moreover, Postgres backends have done WSAStartup without
WSACleanup since commit 4cdf51e64 in 2004, and we've never seen any
indication of a problem with that.
libpq's habit of doing WSAStartup during connection start and
WSACleanup during shutdown is also rather inefficient, since a
series of non-overlapping connection requests leads to repeated,
quite expensive DLL unload/reload cycles. We document a workaround
for that (having the application call WSAStartup for itself), but
that's just a kluge. It's also worth noting that it's far from
uncommon for applications to exit without doing PQfinish, and
we've not heard reports of trouble from that either.
However, the real reason for acting on this is that recent
experiments by Alexander Lakhin show that calling WSACleanup
during PQfinish is triggering the symptom we occasionally see
that a process using libpq fails to emit expected stdio output.
Therefore, let's change libpq so that it calls WSAStartup only
once per process, during the first connection attempt, and never
calls WSACleanup at all.
While at it, get rid of the only other WSACleanup call in our code
tree, in pg_dump/parallel.c; that presumably is equally useless.
Back-patch of HEAD commit 7d00a6b2d.
Discussion: https://postgr.es/m/ac976d8c-03df-d6b8-025c-15a2de8d9af1@postgrespro.ru
The rules to choose the number of parallel workers to perform parallel
vacuum operation were not clearly specified.
Reported-by: Peter Eisentraut
Author: Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/36aa8aea-61b7-eb3c-263b-648e0cb117b7@2ndquadrant.com
Section 8.5.1.4, which defines these literals, made only a vague
reference to the fact that they might be evaluated too soon to be
safe in non-interactive contexts. Provide a more explicit caution
against misuse. Also, generalize the wording in the related tip in
section 9.9.4: while it clearly described this problem, it implied
(or really, stated outright) that the problem only applies to table
DEFAULT clauses.
Per gripe from Tijs van Dam. Back-patch to all supported branches.
Discussion: https://postgr.es/m/c2LuRv9BiRT3bqIo5mMQiVraEXey_25B4vUn0kDqVqilwOEu_iVF1tbtvLnyQK7yDG3PFaz_GxLLPil2SDkj1MCObNRVaac-7j1dVdFERk8=@thalex.com
Commit a97e85f2b caused "exceed the available area" warnings in PDF
builds. Fine-tune colwidth values to avoid that.
Back-patch to 9.6, like the prior patch. (This is of dubious value
before v13, since we were far from free of such warnings in older
branches. But we might as well keep the SGML looking the same in all
branches.)
Per buildfarm.
Previously it was documented that toast_tuple_target affected column
marked as only External or Extended. But this description is not correct
and toast_tuple_target affects also column marked as Main.
Back-patch to v11 where toast_tuple_target reloption was introduced.
Author: Shinya Okano
Reviewed-by: Tatsuhito Kasahara, Fujii Masao
Discussion: https://postgr.es/m/93f46e311a67422e89e770d236059817@oss.nttdata.com
I removed the duplicate command tags for START_REPLICATION inadvertently
in commit 07082b08cc, but the replication protocol requires them. The
fact that the replication protocol was broken was not noticed because
all our test cases use an optimized code path that exits early, failing
to verify that the behavior is correct for non-optimized cases. Put
them back.
Also document this protocol quirk.
Add a test case that shows the failure. It might still succeed even
without the patch when run on a fast enough server, but it suffices to
show the bug in enough cases that it would be noticed in buildfarm.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Henry Hinze <henry.hinze@gmail.com>
Reviewed-by: Petr Jelínek <petr.jelinek@2ndquadrant.com>
Discussion: https://postgr.es/m/16643-eaadeb2a1a58d28c@postgresql.org
The descriptions of make_interval() and pg_options_to_table()
were randomly different from the reality embedded in pg_proc.
(These are not all the discrepancies I found in a quick search,
but the others perhaps require more discussion, since there's
at least a case to be made for changing pg_proc not the docs.)
make_interval issue noted by Thomas Kellerer.
Discussion: https://postgr.es/m/7b154ef0-9f22-90b9-7734-4bf23686695b@gmx.net
Previously, a conversion such as
to_date('-44-02-01','YYYY-MM-DD')
would result in '0045-02-01 BC', as the code attempted to interpret
the negative year as BC, but failed to apply the correction needed
for our internal handling of BC years. Fix the off-by-one problem.
Also, arrange for the combination of a negative year and an
explicit "BC" marker to cancel out and produce AD. This is how
the negative-century case works, so it seems sane to do likewise.
Continue to read "year 0000" as 1 BC. Oracle would throw an error,
but we've accepted that case for a long time so I'm hesitant to
change it in a back-patch.
Per bug #16419 from Saeed Hubaishan. Back-patch to all supported
branches.
Dar Alathar-Yemen and Tom Lane
Discussion: https://postgr.es/m/16419-d8d9db0a7553f01b@postgresql.org
Explicitly mention that primary key constraints are also included in the
limitation that the constraint columns must be a superset of the partition key
columns.
Wording suggestion from Tom Lane.
Discussion: https://postgr.es/m/64062533.78364.1601415362244@mail.yahoo.com
Backpatch-through: 11, where unique constraints on partitioned tables were added
Previously the standby server didn't archive timeline history files
streamed from the primary even when archive_mode is set to "always",
while it archives the streamed WAL files. This could cause the PITR to
fail because there was no required timeline history file in the archive.
The cause of this issue was that walreceiver didn't mark those files as
ready for archiving.
This commit makes walreceiver mark those streamed timeline history
files as ready for archiving if archive_mode=always. Then the archiver
process archives the marked timeline history files.
Back-patch to all supported versions.
Reported-by: Grigory Smolkin
Author: Grigory Smolkin, Fujii Masao
Reviewed-by: David Zhang, Anastasia Lubennikova
Discussion: https://postgr.es/m/54b059d4-2b48-13a4-6f43-95a087c92367@postgrespro.ru
We have had multiple reports that point to the
'@colReorder=latn-digit' collation customization being buggy. We have
reported this to ICU and are waiting for a fix. In the meantime,
remove references to this from the documentation and replace it by
another reordering example. Apparently, many users have been picking
up this example specifically from the documentation.
Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Discussion: https://www.postgresql.org/message-id/flat/153201618542.1404.3611626898935613264%40wrigleys.postgresql.org
Commit 15cb2bd27 neglected to make the running text match the
tables, leaving the reader with the strong impression that
we cannot count. Also, don't drop an unrelated para between
a table and the para describing it.
The previous version of the docs mentioned that files are rewritten,
implying that a second copy of each file gets created, but each file is
updated in-place.
Author: Michael Banck
Reviewed-by: Daniel Gustafsson, Michael Paquier
Discussion: https://postgr.es/m/858086b6a42fb7d17995b6175856f7e7ec44d0a2.camel@credativ.de
Backpatch-through: 12
The majority of our audience is probably using a pre-packaged Postgres
build rather than raw sources. For them, much of runtime.sgml is not
too relevant, and they should be reading the packager's docs instead.
Add some notes pointing that way in appropriate places.
Text by me; thanks to Daniel Gustafsson for review and discussion,
and to Laurenz Albe for an earlier version.
Discussion: https://postgr.es/m/159430831443.16535.11360317280100947016@wrigleys.postgresql.org
We were already raising an error for DROP INDEX CONCURRENTLY on a
partitioned table, albeit a different and confusing one:
ERROR: DROP INDEX CONCURRENTLY must be first action in transaction
Change that to throw a more comprehensible error:
ERROR: cannot drop partitioned index \"%s\" concurrently
Michael Paquier authored the test case for indexes on temporary
partitioned tables.
Backpatch to 11, where indexes on partitioned tables were added.
Reported-by: Jan Mussler <jan.mussler@zalando.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/16594-d2956ca909585067@postgresql.org
Per discussion, we're planning to remove parser support for postfix
operators in order to simplify the grammar. So it behooves us to
put out a deprecation notice at least one release before that.
There is only one built-in postfix operator, ! for factorial.
Label it deprecated in the docs and in pg_description, and adjust
some examples that formerly relied on it. (The sister prefix
operator !! is also deprecated. We don't really have to remove
that one, but since we're suggesting that people use factorial()
instead, it seems better to remove both operators.)
Also state in the CREATE OPERATOR ref page that postfix operators
in general are going away.
Although this changes the initial contents of pg_description,
I did not force a catversion bump; it doesn't seem essential.
In v13, also back-patch 4c5cf5431, so that there's someplace for
the <link>s to point to.
Mark Dilger and John Naylor, with some adjustments by me
Discussion: https://postgr.es/m/BE2DF53D-251A-4E26-972F-930E523580E9@enterprisedb.com
Commit ce77abe63c allowed EXPLAIN (BUFFERS) to report the information
on buffer usage during planning phase. However three issues were
reported regarding this feature.
(1) Previously, EXPLAIN option BUFFERS required ANALYZE. So the query
had to be actually executed by specifying ANALYZE even when we
want to see only the planner's buffer usage. This was inconvenient
especially when the query was write one like DELETE.
(2) EXPLAIN included the planner's buffer usage in summary
information. So SUMMARY option had to be enabled to report that.
Also this format was confusing.
(3) The output structure for planning information was not consistent
between TEXT format and the others. For example, "Planning" tag
was output in JSON format, but not in TEXT format.
For (1), this commit allows us to perform EXPLAIN (BUFFERS) without
ANALYZE to report the planner's buffer usage.
For (2), this commit changed EXPLAIN output so that the planner's
buffer usage is reported before summary information.
For (3), this commit made the output structure for planning
information more consistent between the formats.
Back-patch to v13 where the planner's buffer usage was allowed to
be reported in EXPLAIN.
Reported-by: Pierre Giraud, David Rowley
Author: Fujii Masao
Reviewed-by: David Rowley, Julien Rouhaud, Pierre Giraud
Discussion: https://postgr.es/m/07b226e6-fa49-687f-b110-b7c37572f69e@dalibo.com
When the leftover invalid index is "ccold", there's no need to re-run
the command. Reword the instructions to make that explicit.
Backpatch to 12, where REINDEX CONCURRENTLY appeared.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/20200819211312.GA15497@alvherre.pgsql
Put the -r option in the right section (it certainly isn't an
option controlling "the location and format of the output").
Clarify the behavior of the tablespace and waldir options
(that part per gripe from robert@interactive.co.uk).
Make a large number of small copy-editing fixes in text that
visibly wasn't written by native speakers, and try to avoid
grammatical inconsistencies between the descriptions of
the different options.
Back-patch to v13, since HEAD hasn't meaningfully diverged yet.
Discussion: https://postgr.es/m/159749418850.14322.216503677134569752@wrigleys.postgresql.org
The SimpleLruTruncate() header comment states the new coding rule. To
achieve this, add locktype "frozenid" and two LWLocks. This closes a
rare opportunity for data loss, which manifested as "apparent
wraparound" or "could not access status of transaction" errors. Data
loss is more likely in pg_multixact, due to released branches' thin
margin between multiStopLimit and multiWrapLimit. If a user's physical
replication primary logged ": apparent wraparound" messages, the user
should rebuild standbys of that primary regardless of symptoms. At less
risk is a cluster having emitted "not accepting commands" errors or
"must be vacuumed" warnings at some point. One can test a cluster for
this data loss by running VACUUM FREEZE in every database. Back-patch
to 9.5 (all supported versions).
Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
postmaster has immediately killed all "optional" background processes,
and subsequently refused to launch new ones while it's waiting for
foreground client processes to exit. No doubt this seemed like an OK
policy at some point; but it's a pretty bad one now, because it makes
for a seriously degraded environment for the remaining clients:
* Parallel queries are killed, and new ones fail to launch. (And our
parallel-query infrastructure utterly fails to deal with the case
in a reasonable way --- it just hangs waiting for workers that are
not going to arrive. There is more work needed in that area IMO.)
* Autovacuum ceases to function. We can tolerate that for awhile,
but if bulk-update queries continue to run in the surviving client
sessions, there's eventually going to be a mess. In the worst case
the system could reach a forced shutdown to prevent XID wraparound.
* The bgwriter and walwriter are also stopped immediately, likely
resulting in performance degradation.
Hence, let's rearrange things so that the only immediate change in
behavior is refusing to let in new normal connections. Once the last
normal connection is gone, shut everything down as though we'd received
a "fast" shutdown. To implement this, remove the PM_WAIT_BACKUP and
PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
while normal connections remain. A subsidiary state variable tracks
whether or not we're letting in new connections in those states.
This also allows having just one copy of the logic for killing child
processes in smart and fast shutdown modes. I moved that logic into
PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.
Back-patch to 9.6 where parallel query was added. In principle
this'd be a good idea in 9.5 as well, but the risk/reward ratio
is not as good there, since lack of autovacuum is not a problem
during typical uses of smart shutdown.
Per report from Bharath Rupireddy.
Patch by me, reviewed by Thomas Munro
Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
Make these examples self-contained by providing declarations of the
user-defined row types they rely on. There wasn't room to do this
in the old doc format, but now there is, and I think it makes the
examples a good bit less confusing.
Hostile objects located within the installation-time search_path could
capture references in an extension's installation or upgrade script.
If the extension is being installed with superuser privileges, this
opens the door to privilege escalation. While such hazards have existed
all along, their urgency increases with the v13 "trusted extensions"
feature, because that lets a non-superuser control the installation path
for a superuser-privileged script. Therefore, make a number of changes
to make such situations more secure:
* Tweak the construction of the installation-time search_path to ensure
that references to objects in pg_catalog can't be subverted; and
explicitly add pg_temp to the end of the path to prevent attacks using
temporary objects.
* Disable check_function_bodies within installation/upgrade scripts,
so that any security gaps in SQL-language or PL-language function bodies
cannot create a risk of unwanted installation-time code execution.
* Adjust lookup of type input/receive functions and join estimator
functions to complain if there are multiple candidate functions. This
prevents capture of references to functions whose signature is not the
first one checked; and it's arguably more user-friendly anyway.
* Modify various contrib upgrade scripts to ensure that catalog
modification queries are executed with secure search paths. (These
are in-place modifications with no extension version changes, since
it is the update process itself that is at issue, not the end result.)
Extensions that depend on other extensions cannot be made fully secure
by these methods alone; therefore, revert the "trusted" marking that
commit eb67623c9 applied to earthdistance and hstore_plperl, pending
some better solution to that set of issues.
Also add documentation around these issues, to help extension authors
write secure installation scripts.
Patch by me, following an observation by Andres Freund; thanks
to Noah Misch for review.
Security: CVE-2020-14350
This ought to work much like C's "#elif defined(name)"; but the code
implemented it in a way equivalent to endif followed by ifdef, so that
it didn't matter whether any previous branch of the IF construct had
succeeded. Fix that; add some test cases covering elif and nested IFs;
and improve the documentation, which also seemed a bit confused.
AFAICS the code has been like this since the feature was added in 1999
(commit b57b0e044). So while it's surely wrong, there might be code
out there relying on the current behavior. Hence, don't back-patch
into stable branches. It seems all right to fix it in v13 though.
Per report from Ashutosh Sharma. Reviewed by Ashutosh Sharma and
Michael Meskes.
Discussion: https://postgr.es/m/CAE9k0P=dQk9X0cU2tN49S7a9tv733-e1pVdpB1P-pWJ5PdTktg@mail.gmail.com
The type-name pattern in \dAc and \dAf was matched only to the actual
pg_type.typname string, which is fairly user-unfriendly in cases where
that is not what's shown to the user by format_type (compare "_int4"
and "integer[]"). Make this code match what \dT does, i.e. match the
pattern against either typname or format_type() output. Also fix its
broken handling of schema-name restrictions. (IOW, make these
processSQLNamePattern calls match \dT's.) While here, adjust
whitespace to make the query a little prettier in -E output, too.
Also improve some inaccuracies and shaky grammar in the related
documentation.
Noted while working on a patch for intarray's opclasses; I wondered
why I couldn't get a match to "integer*" for the input type name.
In "High Availability, Load Balancing, and Replication" chapter,
certain descriptions of Pgpool-II were not correct at this point. It
does not need conflict resolution. Also "Multiple-Server Parallel
Query Execution" is not supported anymore.
Discussion: https://postgr.es/m/20200726.230128.53842489850344110.t-ishii%40sraoss.co.jp
Author: Tatsuo Ishii
Reviewed-by: Bruce Momjian
Backpatch-through: 9.5
Partitioned indexes are also registered in pg_inherits, but the
description of this catalog did not reflect that.
Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87k0ynj35y.fsf@wibble.ilmari.org
Backpatch-through: 11
Add a GUC that acts as a multiplier on work_mem. It gets applied when
sizing executor node hash tables that were previously size constrained
using work_mem alone.
The new GUC can be used to preferentially give hash-based nodes more
memory than the generic work_mem limit. It is intended to enable admin
tuning of the executor's memory usage. Overall system throughput and
system responsiveness can be improved by giving hash-based executor
nodes more memory (especially over sort-based alternatives, which are
often much less sensitive to being memory constrained).
The default value for hash_mem_multiplier is 1.0, which is also the
minimum valid value. This means that hash-based nodes continue to apply
work_mem in the traditional way by default.
hash_mem_multiplier is generally useful. However, it is being added now
due to concerns about hash aggregate performance stability for users
that upgrade to Postgres 13 (which added disk-based hash aggregation in
commit 1f39bce0). While the old hash aggregate behavior risked
out-of-memory errors, it is nevertheless likely that many users actually
benefited. Hash agg's previous indifference to work_mem during query
execution was not just faster; it also accidentally made aggregation
resilient to grouping estimate problems (at least in cases where this
didn't create destabilizing memory pressure).
hash_mem_multiplier can provide a certain kind of continuity with the
behavior of Postgres 12 hash aggregates in cases where the planner
incorrectly estimates that all groups (plus related allocations) will
fit in work_mem/hash_mem. This seems necessary because hash-based
aggregation is usually much slower when only a small fraction of all
groups can fit. Even when it isn't possible to totally avoid hash
aggregates that spill, giving hash aggregation more memory will reliably
improve performance (the same cannot be said for external sort
operations, which appear to be almost unaffected by memory availability
provided it's at least possible to get a single merge pass).
The PostgreSQL 13 release notes should advise users that increasing
hash_mem_multiplier can help with performance regressions associated
with hash aggregation. That can be taken care of by a later commit.
Author: Peter Geoghegan
Reviewed-By: Álvaro Herrera, Jeff Davis
Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de
Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com
Backpatch: 13-, where disk-based hash aggregation was introduced.
The planner is in fact willing to use hash aggregation when work_mem is
not set high enough for everything to fit in memory. This has been the
case since commit 1f39bce0, which added disk-based hash aggregation.
There are a few remaining cases in which hash aggregation is avoided as
a matter of policy when the planner surmises that spilling will be
necessary. For example, callers of choose_hashed_setop() still
conservatively avoid hash aggregation when spilling is anticipated.
That doesn't seem like a good enough reason to mention hash aggregation
in this context.
Backpatch: 13-, where disk-based hash aggregation was introduced.
Note: This GUC was originally named enable_hashagg_disk when it appeared
in commit 1f39bce0, which added disk-based hash aggregation. It was
subsequently renamed in commit 92c58fd9.
Author: Peter Geoghegan
Reviewed-By: Jeff Davis, Álvaro Herrera
Discussion: https://postgr.es/m/9d9d1e1252a52ea1bad84ea40dbebfd54e672a0f.camel%40j-davis.com
Backpatch: 13-, where disk-based hash aggregation was introduced.
The initial implementation of leader_pid in pg_stat_activity added by
b025f32 took the approach to strictly print what a PGPROC entry
includes. In short, if a backend has been involved in parallel query at
least once, leader_pid would remain set as long as the backend is alive.
For a parallel group leader, this means that the field would always be
set after it participated at least once in parallel query, and after
more discussions this could be confusing if using for example a
connection pooler.
This commit changes the data printed so as leader_pid becomes always
NULL for a parallel group leader, showing up a non-NULL value only for
the parallel workers, and actually as long as a parallel query is
running as workers are shut down once the query has completed.
This does not change the definition of any catalog, so no catalog bump
is needed. Per discussion with Justin Pryzby, Álvaro Herrera, Julien
Rouhaud and me.
Discussion: https://postgr.es/m/20200721035145.GB17300@paquier.xyz
Backpatch-through: 13
TLS 1.3 uses a different way of specifying ciphers and a different
OpenSSL API. PostgreSQL currently does not support setting those
ciphers. For now, just document this. In the future, support for
this might be added somehow.
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
The executor checks for this error, and so does the bootstrap catalog
loader, but we never checked for it in retail catalog manipulations.
The folly of that has now been exposed, so let's add assertions
checking it. Checking in CatalogTupleInsert[WithInfo] and
CatalogTupleUpdate[WithInfo] should be enough to cover this.
Back-patch to v10; the aforesaid functions didn't exist before that,
and it didn't seem worth adapting the patch to the oldest branches.
But given the risk of JIT crashes, I think we certainly need this
as far back as v11.
Pre-v13, we have to explicitly exclude pg_subscription.subslotname
and pg_subscription_rel.srsublsn from the checks, since they are
mismarked. (Even if we change our mind about applying BKI_FORCE_NULL
in the branch tips, it doesn't seem wise to have assertions that
would fire in existing databases.)
Discussion: https://postgr.es/m/298837.1595196283@sss.pgh.pa.us
The code has always set this column to NULL when it's not valid,
but the catalog header's description failed to reflect that,
as did the SGML docs, as did some of the code. To prevent future
coding errors of the same ilk, let's hide the field from C code
as though it were variable-length (which, in a sense, it is).
As with commit 72eab84a5, we can only fix this cleanly in HEAD
and v13; the problem extends further back but we'll need some
klugery in the released branches.
Discussion: https://postgr.es/m/367660.1595202498@sss.pgh.pa.us
max_slot_wal_keep_size that was added in v13 and wal_keep_segments are
the GUC parameters to specify how much WAL files to retain for
the standby servers. While max_slot_wal_keep_size accepts the number of
bytes of WAL files, wal_keep_segments accepts the number of WAL files.
This difference of setting units between those similar parameters could
be confusing to users.
To alleviate this situation, this commit renames wal_keep_segments to
wal_keep_size, and make users specify the WAL size in it instead of
the number of WAL files.
There was also the idea to rename max_slot_wal_keep_size to
max_slot_wal_keep_segments, in the discussion. But we have been moving
away from measuring in segments, for example, checkpoint_segments was
replaced by max_wal_size. So we concluded to rename wal_keep_segments
to wal_keep_size.
Back-patch to v13 where max_slot_wal_keep_size was added.
Author: Fujii Masao
Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, David Steele
Discussion: https://postgr.es/m/574b4ea3-e0f9-b175-ead2-ebea7faea855@oss.nttdata.com
Due to the layout of this catalog, subslotname has to be explicitly
marked BKI_FORCE_NULL, else initdb will default to the assumption
that it's non-nullable. Since, in fact, CREATE/ALTER SUBSCRIPTION
will store null values there, the existing marking is just wrong,
and has been since this catalog was invented.
We haven't noticed because not much in the system actually depends
on attnotnull being truthful. However, JIT'ed tuple deconstruction
does depend on that in some cases, allowing crashes or wrong answers
in queries that inspect pg_subscription. Commit 9de77b545 quite
accidentally exposed this on the buildfarm members that force JIT
activation.
Back-patch to v13. The problem goes further back, but we cannot
force initdb in released branches, so some klugier solution will
be needed there. Before working on that, push this simple fix
to try to get the buildfarm back to green.
Discussion: https://postgr.es/m/4118109.1595096139@sss.pgh.pa.us
pg_dump produces custom-format archive files that lack data offsets
when it is unable to seek its output. Up to now that's been a hazard
for pg_restore. But if pg_restore is able to seek in the archive
file, there is no reason to throw up our hands when asked to restore
data blocks out of order. Instead, whenever we are searching for a
data block, record the locations of the blocks we passed over (that
is, fill in the missing data-offset fields in our in-memory copy of
the TOC data). Then, when we hit a case that requires going
backwards, we can just seek back.
Also track the furthest point that we've searched to, and seek back
to there when beginning a search for a new data block. This avoids
possible O(N^2) time consumption, by ensuring that each data block
is examined at most twice. (On Unix systems, that's at most twice
per parallel-restore job; but since Windows uses threads here, the
threads can share block location knowledge, reducing the amount of
duplicated work.)
We can also improve the code a bit by using fseeko() to skip over
data blocks during the search.
This is all of some use even in simple restores, but it's really
significant for parallel pg_restore. In that case, we require
seekability of the input already, and we will very probably need
to do out-of-order restores.
Back-patch to v12, as this fixes a regression introduced by commit
548e50976. Before that, parallel restore avoided requesting
out-of-order restores, so it would work on a data-offset-less
archive. Now it will again.
Ideally this patch would include some test coverage, but there are
other open bugs that need to be fixed before we can extend our
coverage of parallel restore very much. Plan to revisit that later.
David Gilman and Tom Lane; reviewed by Justin Pryzby
Discussion: https://postgr.es/m/CALBH9DDuJ+scZc4MEvw5uO-=vRyR2=QF9+Yh=3hPEnKHWfS81A@mail.gmail.com
The stats with this commit was available only for WALSenders, however,
users might want to see for backends doing logical decoding via SQL API.
Then, users might want to reset and access these stats across server
restart which was not possible with the current patch.
List of commits reverted:
caa3c4242c Don't call elog() while holding spinlock.
e641b2a995 Doc: Update the documentation for spilled transaction
statistics.
5883f5fe27 Fix unportable printf format introduced in commit 9290ad198.
9290ad198b Track statistics for spilling of changes from ReorderBuffer.
Additionaly, remove the release notes entry for this feature.
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com
Re-point comp.ai.genetic FAQ link to a more stable address.
Remove stale links to AIX documentation; we don't really need to
tell AIX users how to use their systems.
Remove stale links to HP documentation about SSL. We've had to
update those twice before, making it increasingly obvious that
HP does not intend them to be stable landing points. They're
not particularly authoritative, either. (This change effectively
reverts bbd3bdba3.)
Daniel Gustafsson and Álvaro Herrera, per a gripe from
Kyotaro Horiguchi. Back-patch, since these links are
just as dead in the back branches.
Discussion: https://postgr.es/m/20200709.161226.204639179120026914.horikyota.ntt@gmail.com
pg_stat_activity.query text is truncated at 1024 bytes. But previously
the document described that it's truncated at 1024 characters.
This was not accurate when considering multibyte characters.
Back-patch to v10 where this inaccurate description was added.
Author: Atsushi Torikoshi
Reviewed-by: Daniel Gustafsson, Fujii Masao
Discussion: https://postgr.es/m/cd5b49a5a14e887542f5f569c1c6bde2@oss.nttdata.com
The previous definition of the column was almost universally disliked,
so provide this updated definition which is more useful for monitoring
purposes: a large positive value is good, while zero or a negative value
means danger. This should be operationally more convenient.
Backpatch to 13, where the new column to pg_replication_slots (and the
feature it represents) were added.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/9ddfbf8c-2f67-904d-44ed-cf8bc5916228@oss.nttdata.com
Enabling pg_stat_statements.track_plaanning may incur a noticeable
performance penalty, especially when a fewer kinds of queries are executed
on many concurrent connections. This commit documents this note.
Back-patch to v13 where pg_stat_statements.track_plaanning was added.
Suggested-by: Pavel Stehule
Author: Fujii Masao
Reviewed-by: Pavel Stehule
Discussion: https://postgr.es/m/CAFj8pRC9Jxa8r5i0TNBWLb8mzuaYzEoLq3QOvip0jVpHPOLbVA@mail.gmail.com
After running GetForeignRelSize for a foreign table, adjust rel->tuples
to be at least as large as rel->rows. This prevents bizarre behavior
in estimate_num_groups() and perhaps other places, especially in the
scenario where rel->tuples is zero because pg_class.reltuples is
(suggesting that ANALYZE has never been run for the table). As things
stood, we'd end up estimating one group out of any GROUP BY on such a
table, whereas the default group-count estimate is more likely to result
in a sane plan.
Also, clarify in the documentation that GetForeignRelSize has the option
to override the rel->tuples value if it has a better idea of what to use
than what is in pg_class.reltuples.
Per report from Jeff Janes. Back-patch to all supported branches.
Patch by me; thanks to Etsuro Fujita for review
Discussion: https://postgr.es/m/CAMkU=1xNo9cnan+Npxgz0eK7394xmjmKg-QEm8wYG9P5-CcaqQ@mail.gmail.com
Previously the document explained that restart_lsn indicates the LSN of
oldest WAL won't be automatically removed during checkpoints. But
since v13 this was no longer true thanks to max_slot_wal_keep_size.
Back-patch to v13 where max_slot_wal_keep_size was added.
Author: Fujii Masao
Discussion: https://postgr.es/m/6497f1e9-3148-c5da-7e49-b2fddad9a42f@oss.nttdata.com
Since v13 pg_stat_statements is allowed to track the planning time of
statements when track_planning option is enabled. Its default was on.
But this feature could cause more terrible spinlock contentions in
pg_stat_statements. As a result of this, Robins Tharakan reported that
v13 beta1 showed ~45% performance drop at high DB connection counts
(when compared with v12.3) during fully-cached SELECT-only test using
pgbench.
To avoid this performance regression by the default setting,
this commit changes default of pg_stat_statements.track_planning to off.
Back-patch to v13 where pg_stat_statements.track_planning was introduced.
Reported-by: Robins Tharakan
Author: Fujii Masao
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/2895b53b033c47ccb22972b589050dd9@EX13D05UWC001.ant.amazon.com
The IANA tzcode library has a feature to read a time zone file named
"posixrules" and apply the daylight-savings transition dates and times
therein, when it is given a POSIX-style time zone specification that
lacks an explicit transition rule. However, there's a problem with
that code: it doesn't work for dates past the Y2038 time_t rollover.
(Effectively, all times beyond that point are treated as standard
time.) The IANA crew regard this feature as legacy, so their plan is
to remove it not fix it. The time frame in which that will happen
is unclear, but presumably it'll happen well before 2038.
Moreover, effective with the next IANA data update (probably this
fall), the recommended default will be to not install a "posixrules"
file in the first place. The time frame in which tzdata packagers
might adopt that suggestion is likewise unclear, but at least some
platforms will probably do it in the next year or so. While we could
ignore that recommendation so far as PG-supplied tzdata trees are
concerned, builds using --with-system-tzdata will be subject to
whatever the platform's tzdata packager decides to do.
Thus, whether or not we do anything, some increasing fraction of
Postgres users will be exposed to the behavior observed when there
is no "posixrules" file; and if we do nothing, we'll have essentially
no control over the timing of that change.
The best thing to do to ameliorate the uncertainty seems to be to
proactively remove the posixrules-reading feature. If we do that in
a scheduled release then at least we can release-note the behavioral
change, rather than having users be surprised by it after a routine
tzdata update.
The change in question is fairly minor anyway: to be affected,
you have to be using a POSIX-style timezone spec, it has to not
have an explicit rule, and it has to not be one of the four traditional
continental-USA zone names (EST5EDT, CST6CDT, MST7MDT, or PST8PDT),
as those are special-cased. Since the default "posixrules" file
provides USA DST rules, the number of people who are likely to find
such a zone spec useful is probably quite small. Moreover, the
fallback behavior with no explicit rule and no "posixrules" file is to
apply current USA rules, so the only thing that really breaks is the
DST transitions in years before 2007 (and you get the countervailing
fix that transitions after 2038 will be applied).
Now, some installations might have replaced the "posixrules" file,
allowing e.g. EU rules to be applied to a POSIX-style timezone spec.
That won't work anymore. But it's not exactly clear why this solution
would be preferable to using a regular named zone. In any case, given
the Y2038 issue, we need to be pushing users to stop depending on this.
Back-patch into v13; it hasn't been released yet, so it seems OK to
change its behavior. (Personally I think we ought to back-patch
further, but I've been outvoted.)
Discussion: https://postgr.es/m/1390.1562258309@sss.pgh.pa.us
Discussion: https://postgr.es/m/20200621211855.6211-1-eggert@cs.ucla.edu
Warnings start 10M transactions before xidStopLimit, which is 11M
transactions before wraparound. The sample WARNING output showed a
value greater than 11M, and its HINT message predated commit
25ec228ef7. Hence, the sample was
impossible. Back-patch to 9.5 (all supported versions).
When we initially created this parameter, in commit ff8ca5fad, we left
the default as "allow any protocol version" on grounds of backwards
compatibility. However, that's inconsistent with the backend's default
since b1abfec82; protocol versions prior to 1.2 are not considered very
secure; and OpenSSL has had TLSv1.2 support since 2012, so the number
of PG servers that need a lesser minimum is probably quite small.
On top of those things, it emerges that some popular distros (including
Debian and RHEL) set MinProtocol=TLSv1.2 in openssl.cnf. Thus, far
from having "allow any protocol version" behavior in practice, what
we actually have as things stand is a platform-dependent lower limit.
So, change our minds and set the min version to TLSv1.2. Anybody
wanting to connect with a new libpq to a pre-2012 server can either
set ssl_min_protocol_version=TLSv1 or accept the fallback to non-SSL.
Back-patch to v13 where the aforementioned patches appeared.
Patch by me, reviewed by Daniel Gustafsson
Discussion: https://postgr.es/m/a9408304-4381-a5af-d259-e55d349ae4ce@2ndquadrant.com
In pg_replication_slot, change output from normal/reserved/lost to
reserved/extended/unreserved/ lost, which better expresses the possible
states particularly near the time where segments are no longer safe but
checkpoint has not run yet.
Under the new definition, reserved means the slot is consuming WAL
that's still under the normal WAL size constraints; extended means it's
consuming WAL that's being protected by wal_keep_segments or the slot
itself, whose size is below max_slot_wal_keep_size; unreserved means the
WAL is no longer safe, but checkpoint has not yet removed those files.
Such as slot is in imminent danger, but can still continue for a little
while and may catch up to the reserved WAL space.
Also, there were some bugs in the calculations used to report the
status; fixed those.
Backpatch to 13.
Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200616.120236.1809496990963386593.horikyota.ntt@gmail.com
Defining duplicates as "close by" to each other was unclear. Simplify
the definition.
Backpatch: 13-, where deduplication was introduced (by commit 0d861bbb)
911e702077 added opclass options and adjusted documentation for each
particular affected opclass. However, documentation for extendability was
not adjusted. This commit adjusts documentation for interfaces of index AMs
and opclasses.
Discussion: https://postgr.es/m/CAH2-WzmQnW6%2Bz5F9AW%2BSz%2BzEcEvXofTwh_A9J3%3D_WA-FBP0wYg%40mail.gmail.com
Author: Alexander Korotkov
Reported-by: Peter Geoghegan
Reviewed-by: Peter Geoghegan
The IANA time zone folk have deprecated use of a "posixrules" file in
the tz database. While for now it's our choice whether to keep
supplying one in our own builds, installations built with
--with-system-tzdata will soon be needing to cope with that file not
being present, at least on some platforms.
This causes a problem for the horology test, which expected the
nonstandard POSIX zone spec "CST7CDT" to apply pre-2007 US daylight
savings rules. That does happen if the posixrules file supplies such
information, but otherwise the test produces undesired results.
To fix, add an explicit transition date rule that matches 2005 practice.
(We could alternatively have switched the test to use some real time
zone, but it seems useful to have coverage of this type of zone spec.)
While at it, update a documentation example that also relied on
"CST7CDT"; use a real-world zone name instead. Also, document why
the zone names EST5EDT, CST6CDT, MST7MDT, PST8PDT aren't subject to
similar failures when "posixrules" is missing.
Back-patch to all supported branches, since the hazard is the same
for all.
Discussion: https://postgr.es/m/1665379.1592581287@sss.pgh.pa.us
Mostly in response to Jürgen Purtz critique of previous definitions,
though I added many other changes.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Jürgen Purtz <juergen@purtz.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Erik Rijkers <er@xs4all.nl>
Discussion: https://postgr.es/m/c1e06008-2132-30f4-9b38-877e8683d418@purtz.de
We'd glossed over most of this complexity for years, but it's hard
to avoid writing it all down now, so that we can explain what happens
when there's no "posixrules" file in the IANA time zone database.
That was at best a tiny minority situation till now, but it's likely
to become quite common in the future, so we'd better explain it.
Nonetheless, we don't really encourage people to use POSIX zone specs;
picking a named zone is almost always what you really want, unless
perhaps you're stuck with an out-of-date zone database. Therefore,
let's shove all this detail into an appendix.
Patch by me; thanks to Robert Haas for help with some awkward wording.
Discussion: https://postgr.es/m/1390.1562258309@sss.pgh.pa.us
Our documentation failed to point out that REPEATABLE READ is really
snapshot isolation, which might be important to some users. Point to
the standard reference paper for this complicated topic.
Likewise, add a reference to the VLDB paper about PostgreSQL SSI, for
technical information about our SSI implementation and how it compares
to S2PL.
While here, add a note about catalog access using a lower isolation
level, per recent user complaint.
Back-patch to all releases.
Reported-by: Kyle Kingsbury <aphyr@jepsen.io>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp>
Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443%40jepsen.io
Discussion: https://postgr.es/m/16454-9408996bb1750faf%40postgresql.org
Eliminate enable_groupingsets_hash_disk, which was primarily useful
for testing grouping sets that use HashAgg and spill. Instead, hack
the table stats to convince the planner to choose hashed aggregation
for grouping sets that will spill to disk. Suggested by Melanie
Plageman.
Rename enable_hashagg_disk to hashagg_avoid_disk_plan, and invert the
meaning of on/off. The new name indicates more strongly that it only
affects the planner. Also, the word "avoid" is less definite, which
should avoid surprises when HashAgg still needs to use the
disk. Change suggested by Justin Pryzby, though I chose a different
GUC name.
Discussion: https://postgr.es/m/CAAKRu_aisiENMsPM2gC4oUY1hHG3yrCwY-fXUg22C6_MJUwQdA%40mail.gmail.com
Discussion: https://postgr.es/m/20200610021544.GA14879@telsasoft.com
Backpatch-through: 13
In PostgreSQL 10, we stopped using System V semaphores on Linux
systems. Update the example we give of an error message from a
misconfigured system to show what people are most likely to see these
days.
Back-patch to 10, where PREFERRED_SEMAPHORES=UNNAMED_POSIX arrived.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGLmJUSwybaPQv39rB8ABpqJq84im2UjZvyUY4feYhpWMw%40mail.gmail.com
Whitespace between tags is significant, and in some cases it creates
extra vertical space in man pages. The fix is either to remove some
newlines or in some cases to reword slightly to avoid the awkward
markup layout.
Remove obsolete instructions for old operating system versions, and
update the text to reflect the defaults on modern systems.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Magnus Hagander <magnus@hagander.net>
Discussion: https://postgr.es/m/CA%2BhUKGLmJUSwybaPQv39rB8ABpqJq84im2UjZvyUY4feYhpWMw%40mail.gmail.com
The preferred terminology has been support "function", not procedure,
for some time, so change that over. The command stays \dAp, since
\dAf is already something else.
The group of wal_init_zero and wal_recycle is WAL_SETTINGS in guc.c,
but previously their documents were located in
"Replication"/"Sending Servers" section. This commit moves them to
the proper section "Write Ahead Log"/"Settings".
Back-patch to v12 where wal_init_zero and wal_recycle parameters
were introduced.
Author: Fujii Masao
Discussion: https://postgr.es/m/b5190ab4-a169-6a42-0e49-aed0807c8976@oss.nttdata.com
The documentation of REINDEX includes a complete description of
CONCURRENTLY and its advantages as well as its disadvantages, but
reindexdb was not really clear about all that.
From discussion with Tom Lane, based on a report from Andrey Klychkov.
Discussion: https://postgr.es/m/1590486572.205117372@f500.i.mail.ru
Backpatch-through: 12
This commit updates the "Viewing Statistics" section more like
the existing catalogs chapter.
- Change its layout so that an introductory paragrap is put above
the table for each statistics view. Previously the explanations
were below the tables.
- Separate each view to different section and add index terms for them.
Author: Fujii Masao
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/6f8a482c-b3fa-4ed9-21c3-6d222a2cb87d@oss.nttdata.com
These were missed when these were added to pg_hba.conf in PG 12;
updates docs and pg_hba.conf.sample.
Reported-by: Arthur Nascimento
Bug: 16380
Discussion: https://postgr.es/m/20200421182736.GG19613@momjian.us
Backpatch-through: 12
Explain that the followings are tracked only when track_io_timing GUC
is enabled.
- blk_read_time and blk_write_time in pg_stat_database
- time spent reading and writing data file blocks in EXPLAIN output
with BUFFERS option
Whther track_io_timing is enabled affects also blk_read_time and
blk_write_time in pg_stat_statements, but which was already documented.
Author: Atsushi Torikoshi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CACZ0uYHo_NwbxpLH76OGF-O=13tkR0ZM0zeyGEhZ+JEXZVRyCA@mail.gmail.com
The B-Tree index deduplication strategy used during CREATE INDEX and
REINDEX differs from the lazy strategy used by retail inserts. Make
that clear by adding a new paragraph to the B-Tree implementation
section of the documentation.
In passing, do some copy-editing of nearby deduplication documentation.
The description missed a comma and lacked an explanation of what happens
with REPLICA IDENTITY USING INDEX when the dependent index is dropped.
Author: Marina Polyakova
Reviewed-by: Daniel Gustafsson, Michael Paquier
Discussion: https://postgr.es/m/ad1a0badc32658b1bbb07aa312346a1d@postgrespro.ru
Backpatch-through: 9.5
Synchronize the event names for parallel hash join waits with other
event names, by getting rid of the slashes and dropping "-ing"
suffixes. Rename ClogGroupUpdate to XactGroupUpdate, to match the
new SLRU name. Move the ProcSignalBarrier event to the IPC category;
it doesn't belong under IO.
Also a bit more wordsmithing in the wait event documentation tables.
Discussion: https://postgr.es/m/4505.1589640417@sss.pgh.pa.us
d140f2f3 has renamed receivedUpto to flushedUpto, and has added
writtenUpto to the WAL receiver's shared memory information, but
pg_stat_wal_receiver was not consistent with that. This commit renames
received_lsn to flushed_lsn, and adds a new column called written_lsn.
Bump catalog version.
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20200515090817.GA212736@paquier.xyz
libpq's exports.txt was overlooked in commit 36d108761, which the
buildfarm is quite unhappy about.
Also, I'd gathered that the plan included renaming PQgetSSLKeyPassHook
to PQgetSSLKeyPassHook_OpenSSL, but that didn't happen in the patch
as committed. I'm taking it on my own authority to do so now, since
the window before beta1 is closing fast.
4dc6355210 provided a way for libraries and clients to modify how libpq
handles client certificate passphrases, by installing a hook. However,
these routines are quite specific to how OpenSSL works, so it's
misleading and not future-proof to have these names not refer to OpenSSL.
Change all the names to add "_OpenSSL" after "Hook", and fix the docs
accordingly.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/981DE552-E399-45C2-9F60-3F0E3770CC61@yesql.se
It's just weird that this name wasn't chosen to look like an
identifier. The suspicion that it wasn't thought about too
hard is reinforced by the fact that it wasn't documented in
the pg_locks view (until I did so, a day or two back).
Update, and add a comment reminding future adjusters of this
array to fix the docs too.
Do some desultory wordsmithing on various entries in the wait
events tables.
Discussion: https://postgr.es/m/24595.1589326879@sss.pgh.pa.us
This was mostly confusing, especially since some wait events in
this class had the suffix and some did not.
While at it, stop exposing MainLWLockNames[] as a globally visible
name; any code using that directly is almost certainly wrong, as
its name has been misleading for some time.
(GetLWLockIdentifier() is what to use instead.)
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
Choose names that fit into the conventions for wait event names
(particularly, that multi-word names are in the style MultiWordName)
and hopefully convey more information to non-hacker users than the
previous names did.
Also rename SerializablePredicateLockListLock to
SerializablePredicateListLock; the old name was long enough to cause
table formatting problems, plus the double occurrence of "Lock" seems
confusing/error-prone.
Also change a couple of particularly opaque LWLock field names.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
Add some more terms, clarify some definitions, remove redundant terms,
move a couple of terms to keep alphabetical order.
Co-authored-by: Jürgen Purtz <juergen@purtz.de>
Co-authored-by: Erik Rijkers <er@xs4all.nl>
Co-authored-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/7b9b469e804777ac9df4d37716db935e@xs4all.nl
I abbreviated the heck out of the column headings, and made a few
small wording changes, to get it to build warning-free. I can't
say that the result is pretty, but it's probably better than
removing this table entirely.
As of this commit, we have zero "exceed the available area" warnings
in a US-letter PDF build, and one such warning (about an 863-millipoint
overrun) in an A4 build. I expect to get rid of that one by renaming
wait events, so I'm not doing anything about it at the formatting
level.
Discussion: https://postgr.es/m/6916.1589146280@sss.pgh.pa.us
In one or two places it seemed reasonable to modify the example so as
to shorten its output slightly; but for the most part I just added a
&zwsp; after 67 characters, which is the most we can fit on a line
of monospace text in A4 format.
Discussion: https://postgr.es/m/6916.1589146280@sss.pgh.pa.us
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.
Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
The previous design for this table didn't really work in narrow views,
such as PDF output; besides which its reliance on large morerows
values made it a pain to maintain (cf ab3e4fbd5, for example).
I experimented with a couple of ways to fix it, but the best and
simplest is to split it up into a separate table for each event
type category.
I also rearranged the event ordering to be strictly alphabetical,
as nobody would ever be able to find entries otherwise.
There is work afoot to revise the set of event names described
in this table, but this commit just changes the layout, not the
contents.
In passing, add a missing entry to pg_locks.locktype,
and cross-reference that to the related wait event list.
Discussion: https://postgr.es/m/6916.1589146280@sss.pgh.pa.us
This changes our catalog and view descriptions to use a style inspired
by the new format for function/operator tables: each table entry is
formatted roughly like a <varlistentry>, with the column name and type
on the first line and then an indented description. This provides much
more room for expansive descriptions than we had before, and thereby
eliminates a passel of PDF build warnings.
Discussion: https://postgr.es/m/12984.1588643549@sss.pgh.pa.us
I can't see any way to make this table fit in PDF column width
without either a fundamental redesign or abbreviating EXCLUSIVE.
So I did the latter.
It'd be nicer if the abbreviating didn't leak into the HTML output
as well; but the hackery required to make the output different
seems like more trouble than it's really worth.
Discussion: https://postgr.es/m/6916.1589146280@sss.pgh.pa.us
Even after the tweaking I did in commit 5545b69ae, some of the
longer keywords mentioned in the SQL standard don't fit the
available space in PDF output.
I experimented with various solutions like putting such keywords
on their own table lines, but everything looked ugly or confusing
or both; worse, the weirdness also appeared in the HTML version,
which (normally) doesn't need it.
The best answer seems to be to insert &zwsp; into long keywords
so that they can be broken into two lines when, and only when,
needed. It doesn't look too awful if the break happens after
an underscore --- and fortunately, all the problematic keywords
have underscores.
Discussion: https://postgr.es/m/6916.1589146280@sss.pgh.pa.us
Use xreflabel attributes instead of endterm attributes to control the
appearance of links to subsections of SQL command reference pages.
This is simpler, it matches what we do elsewhere (e.g. for GUC variables),
and it doesn't draw "Unresolved ID reference" warnings from the PDF
toolchain.
Fix some places where the text was absolutely dependent on an <xref>
rendering exactly so, by using a <link> around the required text
instead. At least one of those spots had already been turned into
bad grammar by subsequent changes, and the whole idea is just too
fragile for my taste. <xref> does NOT have fixed output, don't write
as if it does.
Consistently include a page-level link in cross-man-page references,
because otherwise they are useless/nonsensical in man-page output.
Likewise, be consistent about mentioning "below" or "above" in same-page
references; we were doing that in about 90% of the cases, but now it's
100%.
Also get rid of another nonfunctional-in-PDF idea, of making
cross-references to functions by sticking ID tags on <row> constructs.
We can put the IDs on <indexterm>s instead --- which is probably not any
more sensible in abstract terms, but it works where the other doesn't.
(There is talk of attaching cross-reference IDs to most or all of
the docs' function descriptions, but for now I just fixed the two
that exist.)
Discussion: https://postgr.es/m/14480.1589154358@sss.pgh.pa.us
This patch eliminates a few more "exceed the available area" warnings
whose causes aren't particularly connected to anything else.
The only one really worthy of comment is that I increased the space
allowed for an <orderedlist>'s numbers, because the default of 1em
doesn't quite work for more than one digit. The rest are one-off
insertions of &zwsp; and suchlike tweaks, in places where they
shouldn't do any damage to the material. (In particular, although
I split some long identifiers with zwsp's, there are other nearby
occurrences of each one; so those changes shouldn't hurt greppability
of the document sources.)
I made up a very crude hack to compare the docs with reality (as
embodied in the system catalogs) ... and indeed they don't match
everywhere. Missing oid columns, wrong data types, wrong "references"
links, columns listed in the wrong order. None of this seems quite
important enough to back-patch.
This converts the contrib documentation to the new style, and mops up
a couple of function tables that were outside chapter 9 in the main
docs.
A few contrib modules choose not to present their functions in the
standard tabular format. There might be room to rethink those decisions
now that the standard format is more friendly to verbose descriptions.
But I have not undertaken to do that here; I just converted existing
tables.
In the wake of commit f21599311, we don't need to set table columns'
align specs retail. Undo a few such settings I'd added in commit
5545b69ae. (The column width adjustments stay, though.)
I concluded that we really just ought to force all tables in PDF output
to default to "left" alignment (instead of "justify"); that is what the
HTML toolchain does and that's what most people have been designing the
tables to look good with. There are few if any places where "justify"
produces better-looking output, and there are many where it looks
horrible. So change stylesheet-fo.xsl to make that true.
Also tweak column widths in a few more tables to make them look better
and avoid "exceed the available area" warnings. This commit fixes
basically everything that can be fixed through that approach. The
remaining tables that give warnings either are scheduled for redesign
as per recent discussions, or need a fundamental rethink because they
Just Don't Work in a narrow view.
Moving this setting into the main configuration file was ill-considered,
perhaps, because that typically causes it to be set before
timezone_abbreviations has been set. Which in turn means that zone
abbreviations don't work, only full zone names.
We could imagine hacking things so that such cases do work, but the
stability of the hack would be questionable, and the value isn't really
that high. Instead just document that you should use a numeric zone
offset or a full zone name.
Per bug #16404 from Reijo Suhonen.
Back-patch to v12 where this was changed.
Discussion: https://postgr.es/m/16404-4603a99603fbd04c@postgresql.org
The following docs are updated:
- High-availaility section
- pg_basebackup
- pg_receivewal
Per the principle of least privilege, we want to encourage users to
interact with those areas using roles that have replication rights, but
superusers were mentioned first.
Author: Daniel Gustafsson
Reviewed-by: Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/ECEBD212-7101-41EB-84F3-2F356E4B6401@yesql.se
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain. The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field. Change the format to make WAL usage
statistics consistent with Buffer usage statistics.
This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.
Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
The PDF toolchain defaults to laying out all columns of a table with
equal widths, in contrast to the HTML rendering which automatically
varies the column widths to fit the data. In many places, this
results in very badly laid-out tables, with lots of useless whitespace
in some places and text that overruns its cell in other places.
For tables that have reasonably static content, we can improve
matters by adding <colspec> entries to hand-assign the column widths.
This commit does that for a few of the tables that were worst off;
it eliminates close to 200 "contents ... exceed the available area"
warnings in an A4 PDF build.
I also forced align="left" in these tables, overriding the PDF
toolchain's default which is evidently "justify". (The HTML toolchain
seems to default to that already.) Anyplace where things are tight
enough that we need to worry about this, forced justification tends to
look truly awful.
We had a mishmash of <replaceable>, <replaceable class="parameter">,
and <parameter> markup for operator/function arguments. Use <parameter>
consistently for things that are in fact names of parameters (including
OUT parameters), reserving <replaceable> for things that aren't. The
latter class includes some made-up-by-the-docs type class names, like
"numeric_type", as well as placeholders for arguments that don't have
well-defined types. Possibly we could do better with those categories
as well, but for the moment I'm content not to have parameter names
marked up in different ways in different places.
(This commit aligns the earlier sections of chapter 9 with a policy
that I'd arrived at while working on commit 1ad23335f, which is why
the last few sections need no changes.)
The pg_rewind docs currently assert that the state of the target's
data directory after rewind is equivalent to the source's data
directory. This clarifies the documentation to describe that the base
state is further back in time and that the target's data directory will
include the current state from the source of any copied blocks since the
point of divergence.
This commit also improves the section "How It Works":
- Describe the update of the pg_control file.
- Reorganize the list of files and directories ignored during the
rewind.
Author: James Coleman
Discussion: https://postgr.es/m/CAAaqYe-sgqCos7MXF4XiY8rUPy3CEmaCY9EvfhX-DhPhPBF5_A@mail.gmail.com
The libpq parameters ssl{max|min}protocolversion are renamed to use
underscores, to become ssl_{max|min}_protocol_version. The related
environment variables still use the names introduced in commit ff8ca5f
that added the feature.
Per complaint from Peter Eisentraut (this was also mentioned by me in
the original patch review but the issue got discarded).
Author: Daniel Gustafsson
Reviewed-by: Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/b319e449-318d-e691-4997-1327e166fcc4@2ndquadrant.com
Make the markup a bit less ad-hoc. A function-table cell now contains
several <para> units, and we label the ones that contain function
signatures with role="func_signature". The CSS or FO stylesheets then
key off of that to decide how to set the indentation. A very useful
win from this approach is that we can have more than one signature
entry per table cell, simplifying the documentation of closely-related
operators and functions.
This patch mostly just replaces the markup in the tables I converted so
far. But I did alter a couple of places where multiple signatures were
helpful.
Discussion: https://postgr.es/m/5561.1587922854@sss.pgh.pa.us
This includes the usual amount of editorial cleanup, such as
correcting wrong or less-helpful-than-they-could-be examples.
I moved the two tsvector-updating triggers into "9.28 Trigger
Functions", which seems like a better home for them. (I believe
that section didn't exist when this text was originally written.)
Also rearrange that page a bit for more consistency and less
duplication.
In passing, fix erroneous examples of the results of abbrev(cidr)
in datatype.sgml, and do a bit of copy-editing there.
David Johnston reminded me that the per-point calculations being done
by these operators are equivalent to complex multiplication/division.
(Once I would've recognized that immediately, but it's been too long
since I did any of that sort of math.)
Also put in a footnote mentioning that "rotation" of a box doesn't do
what you might expect, as I'd griped about in the referenced thread.
Discussion: https://postgr.es/m/158110996889.1089.4224139874633222837@wrigleys.postgresql.org
This also makes an attempt to flesh out the docs for some of the more
severely underdocumented geometric operators and functions.
This effort exposed that the point <^ point (point_below) and
point >^ point (point_above) operators are misnamed; they should be
<<| and |>>, because they act like the other operators named that
way and not like the other operators named <^ and >^. But I just
documented them that way; fixing it is matter for another patch.
The haphazard datatype coverage of many of the operators is also
now depressingly obvious.
Discussion: https://postgr.es/m/158110996889.1089.4224139874633222837@wrigleys.postgresql.org
Add a couple of lines to make it explicit that indexes, constraints,
triggers are added, removed, or left alone.
Backpatch to pg11.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200421162038.GA18628@alvherre.pgsql
When a partition is detached, any triggers that had been cloned from its
parent were not properly disentangled from its parent triggers.
This resulted in triggers that could not be dropped because they
depended on the trigger in the trigger in the no-longer-parent table:
ALTER TABLE t DETACH PARTITION t1;
DROP TRIGGER trig ON t1;
ERROR: cannot drop trigger trig on table t1 because trigger trig on table t requires it
HINT: You can drop trigger trig on table t instead.
Moreover the table can no longer be re-attached to its parent, because
the trigger name is already taken:
ALTER TABLE t ATTACH PARTITION t1 FOR VALUES FROM (1)TO(2);
ERROR: trigger "trig" for relation "t1" already exists
The former is a bug introduced in commit 86f575948c. (The latter is
not necessarily a bug, but it makes the bug more uncomfortable.)
To avoid the complexity that would be needed to tell whether the trigger
has a local definition that has to be merged with the one coming from
the parent table, establish the behavior that the trigger is removed
when the table is detached.
Backpatch to pg11.
Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/20200408152412.GZ2228@telsasoft.com
Previously in the "Standby Server Operation" section, pg_ctl promote and
protmote_trigger_file were documented as a method to trigger standby
promotion, but pg_promote() function not.
This commit also adds parentheses into <function>pg_promote</function>
in some docs to make it clearer that a function is being referred to.
Author: Masahiro Ikeda
Reviewed-by: Michael Paquier, Laurenz Albe, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/de0068417a9f4046bac693cbcc00bdc9@oss.nttdata.com
Commit f2fcad27d5 (9.6 era) added the ability to mark objects as
dependent an extension, but forgot to add a way for such dependencies to
be removed. This commit fixes that oversight.
Strictly speaking this should be backpatched to 9.6, but due to lack of
demand we're not doing so at this time.
Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
Reviewed-by: ahsan hadi <ahsan.hadi@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Along the way, update the older examples for bytea to use "hex"
output format. That lets us get rid of the lame disclaimer about
how the examples assume bytea_output = escape, which was only half
true anyway because none of the more-recently-added examples had
paid any attention to that.
I took the opportunity to do some copy-editing in this area as well,
and to add some new material such as a note about BETWEEN's syntactical
peculiarities.
Of note is that quite a few of the examples of transcendental functions
needed to be updated, because the displayed output no longer matched
what you get on a modern server. I believe some of these cases are
side-effects of the new Ryu algorithm in float8out. Others appear to be
because the examples predate the addition of type numeric, and were
expecting that float8 calculations would be done although the given
syntax would actually lead to calling the numeric function nowadays.
Jonathan Katz felt that slightly different indentation settings made
for a better-looking result, so sync stylesheet-fo.xsl (for PDF) and
stylesheet.css (for non-website-style HTML) with those choices.
Discussion: https://postgr.es/m/31464.1587156281@sss.pgh.pa.us
The table layout ideas proposed in commit e894c6183 were not as widely
popular as I'd hoped. After discussion, we've settled on a layout
that's effectively a single-column table with cell contents much like a
<varlistentry> description of the function or operator; though we're not
actually using <varlistentry>, because it'd add way too much vertical
space. Instead the effect is accomplished using line-break processing
instructions to separate the description and example(s), plus CSS or FO
customizations to produce indentation of all but the first line in each
cell. While technically this is a bit grotty, it does have the
advantage that we won't need to write nearly as much boilerplate markup.
This patch updates tables 9.30, 9.31, and 9.33 (which were touched by
the previous patch) to the revised style, and additionally converts
table 9.10. A lot of work still remains to do, but hopefully it won't
be too controversial.
Thanks to Andrew Dunstan, Pierre Giraud, Robert Haas, Alvaro Herrera,
David Johnston, Jonathan Katz, Isaac Morland for valuable ideas.
Discussion: https://postgr.es/m/8691.1586798003@sss.pgh.pa.us
Earlier we were inconsistent in allowing the usage of parallel and
full options. Change it such that we disallow them only when they are
combined in a way that we don't support.
In passing, improve the comments in some of the existing tests of parallel
vacuum.
Reported-by: Tushar Ahuja
Author: Justin Pryzby, Amit Kapila
Reviewed-by: Sawada Masahiko, Michael Paquier, Mahendra Singh Thalor and
Amit Kapila
Discussion: https://postgr.es/m/58c8d171-e665-6fa3-a9d3-d9423b694dae%40enterprisedb.com
This commit prevents pg_basebackup from receiving backup_manifest file
when --no-manifest is specified. Previously, when pg_basebackup was
writing a tarfile to stdout, it tried to receive backup_manifest file even
when --no-manifest was specified, and reported an error.
Also remove unused -m option from pg_basebackup.
Also fix typo in BASE_BACKUP command documentation.
Author: Fujii Masao
Reviewed-by: Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/01e3ed3a-8729-5aaa-ca84-e60e3ca59db8@oss.nttdata.com
Now that warnings are enabled across the board, this code that tries to
print an undef variable emits one. Silently printing the empty string
achieves the previous behavior.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Discussion: https://postgr.es/m/E1jO1VT-0008Qk-TM@gemulon.postgresql.org
We've had a mixture of the warnings pragma, the -w switch on the shebang
line, and no warnings at all. This patch removes the -w swicth and add
the warnings pragma to all perl sources missing it. It raises the
severity of the TestingAndDebugging::RequireUseWarnings perlcritic
policy to level 5, so that we catch any future violations.
Discussion: https://postgr.es/m/20200412074245.GB623763@rfd.leadboat.com
We've long fought with the draconian space limitations of our
traditional table layout for describing SQL functions and operators.
This commit introduces a new approach, though so far I've only applied
it to a few of those tables. The new way makes use of DocBook's support
for different layouts in different rows of a table, and allows the
descriptions and examples for a function or operator to run to several
lines without as much ugliness and wasted space as before.
The core layout concept is now
Name Signature
Description
Example Example Result
so that a function or operator really has three table rows not one,
but we group them to look like one row by having the name column
have only one entry for all three rows. (Actually, there could be
four or more rows if you wanted to have more than one example, which
is another thing that was painful before but works easily now.)
This is handled by a "morerows" annotation on the name entry, which
isn't perfect (notably, the toolchain is not smart enough to avoid
breaking these row groups across PDF pages) but there seems no better
solution in DocBook. The name column is normally fairly narrow,
allowing plenty of space for the other column(s), and not wasting too
much space when one of the other components runs to multiple lines.
The varying row layout is managed by defining named "spans" and then
tagging entries with a "spanname" of "name", "sig", "desc", "example",
or "exresult". This provides a bit of semantic annotation to go with
the formatting improvement, which seems like a good thing. (It seems
that we have to re-define these spans afresh for each table, which is
annoying, but it's not any worse than the duplication involved in
the table headers. At least that gives us an opportunity to vary the
relative column widths per-table, which is handy since function tables
sometimes need much wider name columns than operator tables.)
Signature entries should be written in the style
<function>fname</function>(<type>typename</type> ...)
<returnvalue>typename</returnvalue>
The <returnvalue> tag produces a right arrow before the result type
name. (I'll document that convention in a user-visible place later.)
While this provides significantly more horizontal space than before
for examples, it's still true that PDF output is a lot narrower than
typical webpage viewing windows, so some examples need to be broken
in places where there is no whitespace. I've added &zwsp; markers in
suitable places to allow the tables to render warning-free in PDF.
I've so far converted only the date/time operator, date/time function,
and enum function tables in sections 9.9 and 9.10; these were chosen
to provide a reasonable sample of the formatting problems that need
to be solved. Assuming that this looks good on the website and doesn't
provoke howls of anguish, I'll work on the other similar tables in the
near future.
There's a moderate amount of new editorial content in this patch along
with the raw formatting changes; for instance I had to write text
descriptions for operators that lacked them. I failed to resist the
temptation to improve some other descriptions and examples, too.
Patch by me, with thanks to Alexander Lakhin for assistance with
figuring out some formatting issues.
Discussion: https://postgr.es/m/9326.1581457869@sss.pgh.pa.us
We already had a couple of places using zero-width spaces for formatting
hackery, and we're going to need more if we ever want the PDF manuals to
look decent. But please let's not write hard-coded Unicode escapes.
We can avoid that by using a custom entity, which also provides a place
to put a teeny bit of documentation about what it is and how to use it.
I'd previously posted a patch using "&break;" for this, but on reflection
that would be horrible to grep for. Instead let's use "&zwsp;", based
on the name of the Unicode symbol ("zero width space").
Discussion: https://postgr.es/m/9326.1581457869@sss.pgh.pa.us
Commit ac8623760 "fixed" a typo in an example of what would happen in
the event of a typo. Restore the original typo and add a comment about
its intentionality. Backpatch to 12 where the error was introduced.
Per report from irc user Nicolás Alvarez.
Add a DEBUG1 message indicating that verification of the index structure
is underway. Also reduce the severity level of the existing "tree
level" debug message to DEBUG1. It should never have been made DEBUG2.
Any B-Tree index with more than a couple of levels will generally also
have so many pages that the per-page DEBUG2 messages will become
completely unmanageable.
In passing, add a new "Tip" to the docs that advises users that run into
corruption that the debug messages might provide useful additional
context.
The docs explained that a SHARE ROW EXCLUSIVE lock is needed on the
referenced table, but failed to say the same about the table being
altered. Since the page says that ACCESS EXCLUSIVE lock is taken
unless otherwise stated, this left readers with the wrong conclusion.
Discussion: https://postgr.es/m/834603375.3470346.1586482852542@mail.yahoo.com
CREATE GROUP is an exact alias for CREATE ROLE, and CREATE USER is
almost an exact alias, as can easily be confirmed by checking the
code. So the man page syntax descriptions ought to match up. The
last few additions of role options seem to have forgotten to update
create_group.sgml, though. Fix that, and add a naggy reminder to
create_role.sgml in hopes of not forgetting again.
Discussion: https://postgr.es/m/158647836143.655.9853963229391401576@wrigleys.postgresql.org
The added note explains that the numbers of planning and execution in
the statement are not always expected to match because their statistics are
updated at their respective end phase, and only for successful operations.
Author: Pascal Legrand, Julien Rouhaud, tweaked a bit by Fujii Masao
Discussion: https://postgr.es/m/1585857868967-0.post@n3.nabble.com
To control whether partition changes are replicated using their own
identity and schema or an ancestor's, add a new parameter that can be
set per publication named 'publish_via_partition_root'.
This allows replicating a partitioned table into a different partition
structure on the subscriber.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
0f5ca02f53 introduces 3 new keywords. It appears to be too much for relatively
small feature. Given now we past feature freeze, it's already late for
discussion of the new syntax. So, revert.
Discussion: https://postgr.es/m/28209.1586294824%40sss.pgh.pa.us
Previously, the partitionwise join technique only allowed partitionwise
join when input partitioned tables had exactly the same partition
bounds. This commit extends the technique to some cases when the tables
have different partition bounds, by using an advanced partition-matching
algorithm introduced by this commit. For both the input partitioned
tables, the algorithm checks whether every partition of one input
partitioned table only matches one partition of the other input
partitioned table at most, and vice versa. In such a case the join
between the tables can be broken down into joins between the matching
partitions, so the algorithm produces the pairs of the matching
partitions, plus the partition bounds for the join relation, to allow
partitionwise join for computing the join. Currently, the algorithm
works for list-partitioned and range-partitioned tables, but not
hash-partitioned tables. See comments in partition_bounds_merge().
Ashutosh Bapat and Etsuro Fujita, most of regression tests by Rajkumar
Raghuwanshi, some of the tests by Mark Dilger and Amul Sul, reviewed by
Dmitry Dolgov and Amul Sul, with additional review at various points by
Ashutosh Bapat, Mark Dilger, Robert Haas, Antonin Houska, Amit Langote,
Justin Pryzby, and Tomas Vondra
Discussion: https://postgr.es/m/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com
Replication slots are useful to retain data that may be needed by a
replication system. But experience has shown that allowing them to
retain excessive data can lead to the primary failing because of running
out of space. This new feature allows the user to configure a maximum
amount of space to be reserved using the new option
max_slot_wal_keep_size. Slots that overrun that space are invalidated
at checkpoint time, enabling the storage to be released.
Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp
We invented \gx to allow the "\pset expanded" flag to be forced on
for the duration of one command output, but that turns out to not
be nearly enough to satisfy the demand for variant output formats.
Hence, make it possible to change any pset option(s) for the duration
of a single command output, by writing "option=value ..." inside
parentheses, for example
\g (format=csv csv_fieldsep='\t') somefile
\gx can now be understood as a shorthand for including expanded=on
inside the parentheses.
Patch by me, expanding on a proposal by Pavel Stehule
Discussion: https://postgr.es/m/CAFj8pRBx9OnBPRJVtfA5ycUpySge-XootAXAsv_4rrkHxJ8eRg@mail.gmail.com
This commit adds following optional clause to BEGIN and START TRANSACTION
commands.
WAIT FOR LSN lsn [ TIMEOUT timeout ]
New clause pospones transaction start till given lsn is applied on standby.
This clause allows user be sure, that changes previously made on primary would
be visible on standby.
New shared memory struct is used to track awaited lsn per backend. Recovery
process wakes up backend once required lsn is applied.
Author: Ivan Kartyshov, Anna Akenteva
Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi
Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs
Reviewed-by: Amit Kapila, Alexander Korotkov
Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru
WITH TIES is an option to the FETCH FIRST N ROWS clause (the SQL
standard's spelling of LIMIT), where you additionally get rows that
compare equal to the last of those N rows by the columns in the
mandatory ORDER BY clause.
There was a proposal by Andrew Gierth to implement this functionality in
a more powerful way that would yield more features, but the other patch
had not been finished at this time, so we decided to use this one for
now in the spirit of incremental development.
Author: Surafel Temesgen <surafel3000@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Discussion: https://postgr.es/m/CALAY4q9ky7rD_A4vf=FVQvCGngm3LOes-ky0J6euMrg=_Se+ag@mail.gmail.com
Discussion: https://postgr.es/m/87o8wvz253.fsf@news-spur.riddles.org.uk
Since the existing bit number argument can't exceed INT32_MAX, it's
not possible for these functions to manipulate bits beyond the first
256MB of a bytea value. Lift that restriction by redeclaring the
bit number arguments as int8 (which requires a catversion bump,
hence is not back-patchable).
The similarly-named functions for bit/varbit don't really have a
problem because we restrict those types to at most VARBITMAXLEN bits;
hence leave them alone.
While here, extend the encode/decode functions in utils/adt/encode.c
to allow dealing with values wider than 1GB. This is not a live bug
or restriction in current usage, because no input could be more than
1GB, and since none of the encoders can expand a string more than 4X,
the result size couldn't overflow uint32. But it might be desirable
to support more in future, so make the input length values size_t
and the potential-output-length values uint64.
Also add some test cases to improve the miserable code coverage
of these functions.
Movead Li, editorialized some by me; also reviewed by Ashutosh Bapat
Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
The txid_XXX family of fmgr functions exposes 64 bit transaction IDs to
users as int8. Now that we have an SQL type xid8 for FullTransactionId,
define a new set of functions including pg_current_xact_id() and
pg_current_snapshot() based on that. Keep the old functions around too,
for now.
It's a bit sneaky to use the same C functions for both, but since the
binary representation is identical except for the signedness of the
type, and since older functions are the ones using the wrong signedness,
and since we'll presumably drop the older ones after a reasonable period
of time, it seems reasonable to switch to FullTransactionId internally
and share the code for both.
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com>
Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
Similar to xid, but 64 bits wide. This new type is suitable for use in
various system views and administration functions.
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com>
Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.
This has a number of benefits:
- Reduced memory consumption, because only a single group (determined by
values in the sorted prefix) needs to be kept in memory. This may also
eliminate the need to spill to disk.
- Lower startup cost, because Incremental Sort produce results after each
prefix group, which is beneficial for plans where startup cost matters
(like for example queries with LIMIT clause).
We consider both Sort and Incremental Sort, and decide based on costing.
The implemented algorithm operates in two different modes:
- Fetching a minimum number of tuples without check of equality on the
prefix keys, and sorting on all columns when safe.
- Fetching all tuples for a single prefix group and then sorting by
comparing only the remaining (non-prefix) keys.
We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.
This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.
There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.
Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
Mainly, this adds support code in logical/worker.c for applying
replicated operations whose target is a partitioned table to its
relevant partitions.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
This commit adds a new option WAL similar to existing option BUFFERS in the
EXPLAIN command. This option allows to include information on WAL record
generation added by commit df3b181499 in EXPLAIN output.
This also allows the WAL usage information to be displayed via
the auto_explain module. A new parameter auto_explain.log_wal controls
whether WAL usage statistics are printed when an execution plan is logged.
This parameter has no effect unless auto_explain.log_analyze is enabled.
Author: Julien Rouhaud
Reviewed-by: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
This commit adds three new columns in pg_stat_statements output to
display WAL usage statistics added by commit df3b181499.
This commit doesn't bump the version of pg_stat_statements as the
same is done for this release in commit 17e0328224.
Author: Kirill Bychik and Julien Rouhaud
Reviewed-by: Julien Rouhaud, Fujii Masao, Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.
A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.
Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.
Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
More work is still needed, but this is a good start.
Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Jürgen Purtz <juergen@purtz.de>
Co-authored-by: Roger Harkavy <rogerharkavy@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CADkLM=eP6HOeqDjn0FdXuGRusQu4oWH_LFsKjjafmhvWD=aSpQ@mail.gmail.com
This commit introduces new wait events RecoveryConflictSnapshot and
RecoveryConflictTablespace. The former is reported while waiting for
recovery conflict resolution on a vacuum cleanup. The latter is reported
while waiting for recovery conflict resolution on dropping tablespace.
Also this commit changes the code so that the wait event Lock is reported
while waiting in ResolveRecoveryConflictWithVirtualXIDs() for recovery
conflict resolution on a lock. Basically the wait event Lock is reported
during that wait, but previously was not reported only when that wait
happened in ResolveRecoveryConflictWithVirtualXIDs().
Author: Masahiko Sawada
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com
This option is similar to \gset, except that it is able to store all
results from combined SQL queries into separate variables. If a query
returns multiple rows, the last result is stored and if a query returns
no rows, nothing is stored.
While on it, add a TAP test for \gset to check for a failure when a
query returns multiple rows.
Author: Fabien Coelho
Reviewed-by: Ibrar Ahmed, Michael Paquier
Discussion: https://postgr.es/m/alpine.DEB.2.21.1904081914200.2529@lancre
The primary motivation for this change is that it will be used by the
upcoming patch to add backup manifests, but it also seems to have some
potential more general use.
Andres Freund and Robert Haas
Discussion: http://postgr.es/m/20200330020814.nspra4mvby42yoa4@alap3.anarazel.de
This patch replaces the boolean GUC log_parameters_on_error introduced
by commit ba79cb5dc with an integer log_parameter_max_length_on_error,
adding the ability to specify how many bytes to trim each logged
parameter value to. (The previous coding hard-wired that choice at
64 bytes.)
In addition, add a new parameter log_parameter_max_length that provides
similar control over truncation of query parameters that are logged in
response to statement-logging options, as opposed to errors. Previous
releases always logged such parameters in full, possibly causing log
bloat.
For backwards compatibility with prior releases,
log_parameter_max_length defaults to -1 (log in full), while
log_parameter_max_length_on_error defaults to 0 (no logging).
Per discussion, log_parameter_max_length is SUSET since the DBA should
control routine logging behavior, but log_parameter_max_length_on_error
is USERSET because it also affects errcontext data sent back to the
client.
Alexey Bashtanov, editorialized a little by me
Discussion: https://postgr.es/m/b10493cc-a399-a03a-67c7-068f2791ee50@imap.cc
This adds SQL expressions NORMALIZE() and IS NORMALIZED to convert and
check Unicode normal forms, per SQL standard.
To support fast IS NORMALIZED tests, we pull in a new data file
DerivedNormalizationProps.txt from Unicode and build a lookup table
from that, using techniques similar to ones already used for other
Unicode data. make update-unicode will keep it up to date. We only
build and use these tables for the NFC and NFKC forms, because they
are too big for NFD and NFKD and the improvement is not significant
enough there.
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com
This commit makes pg_stat_statements support new GUC
pg_stat_statements.track_planning. If this option is enabled,
pg_stat_statements tracks the planning statistics of the statements,
e.g., the number of times the statement was planned, the total time
spent planning the statement, etc. This feature is useful to check
the statements that it takes a long time to plan. Previously since
pg_stat_statements tracked only the execution statistics, we could
not use that for the purpose.
The planning and execution statistics are stored at the end of
each phase separately. So there are not always one-to-one relationship
between them. For example, if the statement is successfully planned
but fails in the execution phase, only its planning statistics are stored.
This may cause the users to be able to see different pg_stat_statements
results from the previous version. To avoid this,
pg_stat_statements.track_planning needs to be disabled.
This commit bumps the version of pg_stat_statements to 1.8
since it changes the definition of pg_stat_statements function.
Author: Julien Rouhaud, Pascal Legrand, Thomas Munro, Fujii Masao
Reviewed-by: Sergei Kornilov, Tomas Vondra, Yoshikazu Imai, Haribabu Kommi, Tom Lane
Discussion: https://postgr.es/m/CAHGQGwFx_=DO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw@mail.gmail.com
Discussion: https://postgr.es/m/CAEepm=0e59Y_6Q_YXYCTHZkqOc6H2pJ54C_Xe=VFu50Aqqp_sA@mail.gmail.com
Discussion: https://postgr.es/m/DB6PR0301MB21352F6210E3B11934B0DCC790B00@DB6PR0301MB2135.eurprd03.prod.outlook.com
There's a number of SLRU caches used to access important data like clog,
commit timestamps, multixact, asynchronous notifications, etc. Until now
we had no easy way to monitor these shared caches, compute hit ratios,
number of reads/writes etc.
This commit extends the statistics collector to track this information
for a predefined list of SLRUs, and also introduces a new system view
pg_stat_slru displaying the data.
The list of built-in SLRUs is fixed, but additional SLRUs may be defined
in extensions. Unfortunately, there's no suitable registry of SLRUs, so
this patch simply defines a fixed list of SLRUs with entries for the
built-in ones and one entry for all additional SLRUs. Extensions adding
their own SLRU are fairly rare, so this seems acceptable.
This patch only allows monitoring of SLRUs, not tuning. The SLRU sizes
are still fixed (hard-coded in the code) and it's not entirely clear
which of the SLRUs might need a GUC to tune size. In a way, allowing us
to determine that is one of the goals of this patch.
Bump catversion as the patch introduces new functions and system view.
Author: Tomas Vondra
Reviewed-by: Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/flat/20200119143707.gyinppnigokesjok@development
Quite a few matching operators such as JSONB's @> used "contsel" and
"contjoinsel" as their selectivity estimators. That was a bad idea,
because (a) contsel is only a stub, yielding a fixed default estimate,
and (b) that default is 0.001, meaning we estimate these operators as
five times more selective than equality, which is surely pretty silly.
There's a good model for improving this in ltree's ltreeparentsel():
for any "var OP constant" query, we can try applying the operator
to all of the column's MCV and histogram values, taking the latter
as being a random sample of the non-MCV values. That code is
actually 100% generic, except for the question of exactly what
default selectivity ought to be plugged in when we don't have stats.
Hence, migrate the guts of ltreeparentsel() into the core code, provide
wrappers "matchingsel" and "matchingjoinsel" with a more-appropriate
default estimate, and use those for the non-geometric operators that
formerly used contsel (mostly JSONB containment operators and tsquery
matching).
Also apply this code to some match-like operators in hstore, ltree, and
pg_trgm, including the former users of ltreeparentsel as well as ones
that improperly used contsel. Since commit 911e70207 just created new
versions of those extensions that we haven't released yet, we can sneak
this change into those new versions instead of having to create an
additional generation of update scripts.
Patch by me, reviewed by Alexey Bashtanov
Discussion: https://postgr.es/m/12237.1582833074@sss.pgh.pa.us
The previously listed total of 179 does not appear to be correct for
SQL:2016 anymore. (Previous SQL versions had slightly different
feature sets, so it's plausible that it was once correct.) The
currently correct count is the number of rows in the respective tables
in appendix F in SQL parts 2 and 11, minus 2 features that are listed
twice. Thus the correct count is currently 177. This also matches
the number of Core entries the built documentation currently shows, so
it's internally consistent.
Old versions of opclass parameters patch supported ability to specify DEFAULT
as the opclass name in CREATE INDEX command. This ability was removed in the
final version, but 911e702077 still mentions that in the documentation.
pg_rewind needs to copy from the source cluster to the target cluster a
set of relation blocks changed from the previous checkpoint where WAL
forked up to the end of WAL on the target. Building this list of
relation blocks requires a range of WAL segments that may not be present
anymore on the target's pg_wal, causing pg_rewind to fail. It is
possible to work around this issue by copying manually the WAL segments
needed but this may lead to some extra and actually useless work.
This commit introduces a new option allowing pg_rewind to use a
restore_command while doing the rewind by grabbing the parameter value
of restore_command from the target cluster configuration. This allows
the rewind operation to be more reliable, so as only the WAL segments
needed by the rewind are restored from the archives.
In order to be able to do that, a new routine is added to src/common/ to
allow frontend tools to restore files from archives using an
already-built restore command. This version is more simple than the
backend equivalent as there is no need to handle the non-recovery case.
Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
The existing implementation of the ltree ~ lquery match operator is
sufficiently complex and undocumented that it's hard to tell exactly
what it does. But one thing it clearly gets wrong is the combination
of NOT symbols (!) and '*' symbols. A pattern such as '*.!foo.*'
should, by any ordinary understanding of regular expression behavior,
match any ltree that has at least one label that's not "foo". As best
we can tell by experimentation, what it's actually matching is any
ltree in which *no* label is "foo". That's surprising, and not at all
what the documentation says.
Now, that's arguably a useful behavior, so if we rewrite to fix the
bug we should provide some other way to get it. To do so, add the
ability to attach lquery quantifiers to non-'*' items as well as '*'s.
Then the pattern '!foo{,}' expresses "any ltree in which no label is
foo". For backwards compatibility, the default quantifier for non-'*'
items has to be "{1}", although the default for '*' items is '{,}'.
I wouldn't have done it like that in a green field, but it's not
totally horrible.
Armed with that, rewrite checkCond() from scratch. Treating '*' and
non-'*' items alike makes it simpler, not more complicated, so that
the function actually gets a lot shorter than it was.
Filip Rembiałkowski, Tom Lane, Nikita Glukhov, per a very
ancient bug report from M. Palm
Discussion: https://postgr.es/m/CAP_rww=waX2Oo6q+MbMSiZ9ktdj6eaJj0cQzNu=Ry2cCDij5fw@mail.gmail.com
The original implementation disallowed using OVERRIDING USER VALUE on
identity columns defined as GENERATED ALWAYS, which is not per
standard. So allow that now.
Expand documentation and tests around this.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://www.postgresql.org/message-id/flat/CAEZATCVrh2ufCwmzzM%3Dk_OfuLhTTPBJCdFkimst2kry4oHepuQ%40mail.gmail.com
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing. These index AMs are GiST, GIN,
SP-GiST and BRIN. There opclasses define representation of keys, operations on
them and supported search strategies. So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision. This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.
This commit doesn't introduce new storage in system catalog. Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.
In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions. Options
are set to fn_expr as the constant bytea expression. It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.
This commit comes with some examples of opclass options usage. We parametrize
signature length in GiST. That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops. Also we parametrize maximum number of integer ranges for
gist__int_ops. However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.
Catversion is bumped.
Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record. The standby then checks whether its own settings are at least
as big as the ones on the primary. If not, the standby shuts down
with a fatal error.
The correspondence of settings between primary and standby is required
because those settings influence certain shared memory sizings that
are required for processing WAL records that the primary might send.
For example, if the primary sends a prepared transaction, the standby
must have had max_prepared_transaction set appropriately or it won't
be able to process those WAL records.
However, fatally shutting down the standby immediately upon receipt of
the parameter change record might be a bit of an overreaction. The
resources related to those settings are not required immediately at
that point, and might never be required if the activity on the primary
does not exhaust all those resources. If we just let the standby roll
on with recovery, it will eventually produce an appropriate error when
those resources are used.
So this patch relaxes this a bit. Upon receipt of
XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
warning and set a global flag if there is a problem. Then when we
actually hit the resource issue and the flag was set, we issue another
warning message with relevant information. At that point we pause
recovery, so a hot standby remains usable. We also repeat the last
warning message once a minute so it is harder to miss or ignore.
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
The documentation says that the max length is 255 bytes, but
code inspection says it's actually 255 characters; and relevant
lengths are stored as uint16 so that that works.
These uint16 fields could be overflowed by excessively long input,
producing strange results. Complain for invalid input.
Likewise check for out-of-range values of the repeat counts in lquery.
(We don't try too hard on that one, notably not bothering to detect
if atoi's result has overflowed.)
Also detect length overflow in ltree_concat.
In passing, be more consistent about whether "syntax error" messages
include the type name. Also, clarify the documentation about what
the size limit is.
This has been broken for a long time, so back-patch to all supported
branches.
Nikita Glukhov, reviewed by Benjie Gillam and Tomas Vondra
Discussion: https://postgr.es/m/CAP_rww=waX2Oo6q+MbMSiZ9ktdj6eaJj0cQzNu=Ry2cCDij5fw@mail.gmail.com
Traditionally autovacuum has only ever invoked a worker based on the
estimated number of dead tuples in a table and for anti-wraparound
purposes. For the latter, with certain classes of tables such as
insert-only tables, anti-wraparound vacuums could be the first vacuum that
the table ever receives. This could often lead to autovacuum workers being
busy for extended periods of time due to having to potentially freeze
every page in the table. This could be particularly bad for very large
tables. New clusters, or recently pg_restored clusters could suffer even
more as many large tables may have the same relfrozenxid, which could
result in large numbers of tables requiring an anti-wraparound vacuum all
at once.
Here we aim to reduce the work required by anti-wraparound and aggressive
vacuums in general, by triggering autovacuum when the table has received
enough INSERTs. This is controlled by adding two new GUCs and reloptions;
autovacuum_vacuum_insert_threshold and
autovacuum_vacuum_insert_scale_factor. These work exactly the same as the
existing scale factor and threshold controls, only base themselves off the
number of inserts since the last vacuum, rather than the number of dead
tuples. New controls were added rather than reusing the existing
controls, to allow these new vacuums to be tuned independently and perhaps
even completely disabled altogether, which can be done by setting
autovacuum_vacuum_insert_threshold to -1.
We make no attempt to skip index cleanup operations on these vacuums as
they may trigger for an insert-mostly table which continually doesn't have
enough dead tuples to trigger an autovacuum for the purpose of removing
those dead tuples. If we were to skip cleaning the indexes in this case,
then it is possible for the index(es) to become bloated over time.
There are additional benefits to triggering autovacuums based on inserts,
as tables which never contain enough dead tuples to trigger an autovacuum
are now more likely to receive a vacuum, which can mark more of the table
as "allvisible" and encourage the query planner to make use of Index Only
Scans.
Currently, we still obey vacuum_freeze_min_age when triggering these new
autovacuums based on INSERTs. For large insert-only tables, it may be
beneficial to lower the table's autovacuum_freeze_min_age so that tuples
are eligible to be frozen sooner. Here we've opted not to zero that for
these types of vacuums, since the table may just be insert-mostly and we
may otherwise freeze tuples that are still destined to be updated or
removed in the near future.
There was some debate to what exactly the new scale factor and threshold
should default to. For now, these are set to 0.2 and 1000, respectively.
There may be some motivation to adjust these before the release.
Author: Laurenz Albe, Darafei Praliaskouski
Reviewed-by: Alvaro Herrera, Masahiko Sawada, Chris Travers, Andres Freund, Justin Pryzby
Discussion: https://postgr.es/m/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com
The parameters primary_conninfo, primary_slot_name and
wal_receiver_create_temp_slot can now be changed with a simple "reload"
signal, no longer requiring a server restart. This is achieved by
signalling the walreceiver process to terminate and having it start
again with the new values.
Thanks to Andres Freund, Kyotaro Horiguchi, Fujii Masao for discussion.
Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/19513901543181143@sas1-19a94364928d.qloud-c.yandex.net
Commit 3297308278 gave walreceiver the ability to create and use a
temporary replication slot, and made it controllable by a GUC (enabled
by default) that can be changed with SIGHUP. That's useful but has two
problems: one, it's possible to cause the origin server to fill its disk
if the slot doesn't advance in time; and also there's a disconnect
between state passed down via the startup process and GUCs that
walreceiver reads directly.
We handle the first problem by setting the option to disabled by
default. If the user enables it, its on their head to make sure that
disk doesn't fill up.
We handle the second problem by passing the flag via startup rather than
having walreceiver acquire it directly, and making it PGC_POSTMASTER
(which ensures a walreceiver always has the fresh value). A future
commit can relax this (to PGC_SIGHUP again) by having the startup
process signal walreceiver to shutdown whenever the value changes.
Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200122055510.GH174860@paquier.xyz
The new command-line switch --include-foreign-data=PATTERN lets the user
specify foreign servers from which to dump foreign table data. This can
be refined by further inclusion/exclusion switches, so that the user has
full control over which tables to dump.
A limitation is that this doesn't work in combination with parallel
dumps, for implementation reasons. This might be lifted in the future,
but requires shuffling some code around.
Author: Luis Carril <luis.carril@swarm64.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Surafel Temesgen <surafel3000@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@2ndQuadrant.com>
Discussion: https://postgr.es/m/LEJPR01MB0185483C0079D2F651B16231E7FC0@LEJPR01MB0185.DEUPRD01.PROD.OUTLOOK.DE
Now that we require C99, we can depend on __VA_ARGS__ to work, and
revising ereport() to use it has several significant benefits:
* The extra parentheses around the auxiliary function calls are now
optional. Aside from being a bit less ugly, this removes a common
gotcha for new contributors, because in some cases the compiler errors
you got from forgetting them were unintelligible.
* The auxiliary function calls are now evaluated as a comma expression
list rather than as extra arguments to errfinish(). This means that
compilers can be expected to warn about no-op expressions in the list,
allowing detection of several other common mistakes such as forgetting
to add errmsg(...) when converting an elog() call to ereport().
* Unlike the situation with extra function arguments, comma expressions
are guaranteed to be evaluated left-to-right, so this removes platform
dependency in the order of the auxiliary function calls. While that
dependency hasn't caused us big problems in the past, this change does
allow dropping some rather shaky assumptions around errcontext() domain
handling.
There's no intention to make wholesale changes of existing ereport
calls, but as proof-of-concept this patch removes the extra parens
from a couple of calls in postgres.c.
While new code can be written either way, code intended to be
back-patched will need to use extra parens for awhile yet. It seems
worth back-patching this change into v12, so as to reduce the window
where we have to be careful about that by one year. Hence, this patch
is careful to preserve ABI compatibility; a followup HEAD-only patch
will make some additional simplifications.
Andres Freund and Tom Lane
Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
Previously if a promotion was triggered while recovery was paused,
the paused state continued. Also recovery could be paused by executing
pg_wal_replay_pause() even while a promotion was ongoing. That is,
recovery pause had higher priority over a standby promotion.
But this behavior was not desirable because most users basically wanted
the recovery to complete as soon as possible and the server to become
the master when they requested a promotion.
This commit changes recovery so that it prefers a promotion over
recovery pause. That is, if a promotion is triggered while recovery
is paused, the paused state ends and a promotion continues. Also
this commit makes recovery pause functions like pg_wal_replay_pause()
throw an error if they are executed while a promotion is ongoing.
Internally, this commit adds new internal function PromoteIsTriggered()
that returns true if a promotion is triggered. Since the name of
this function and the existing function IsPromoteTriggered() are
confusingly similar, the commit changes the name of IsPromoteTriggered()
to IsPromoteSignaled, as more appropriate name.
Author: Fujii Masao
Reviewed-by: Atsushi Torikoshi, Sergei Kornilov
Discussion: https://postgr.es/m/00c194b2-dbbb-2e8a-5b39-13f14048ef0a@oss.nttdata.com
This commit introduces new wait events BackupWaitWalArchive and
RecoveryPause. The former is reported while waiting for the WAL files
required for the backup to be successfully archived. The latter is
reported while waiting for recovery in pause state to be resumed.
Author: Fujii Masao
Reviewed-by: Michael Paquier, Atsushi Torikoshi, Robert Haas
Discussion: https://postgr.es/m/f0651f8c-9c96-9f29-0ff9-80414a15308a@oss.nttdata.com
Previously 0 was reported in pg_stat_progress_basebackup.total_backup
if the total backup size was not estimated. Per discussion, our consensus
is that NULL is better choise as the value in total_backup in that case.
So this commit makes pg_stat_progress_basebackup view report NULL
in total_backup column if the estimation is disabled.
Bump catversion.
Author: Fujii Masao
Reviewed-by: Amit Langote, Magnus Hagander, Alvaro Herrera
Discussion: https://postgr.es/m/CABUevExnhOD89zBDuPvfAAh243RzNpwCPEWNLtMYpKHMB8gbAQ@mail.gmail.com
src/port/getopt_long.c failed on such an argument, always seeing it
as an unrecognized switch. This is unhelpful; better is to treat such
an item as a non-switch argument. That behavior is what we find in
GNU's getopt_long(); it's what src/port/getopt.c does; and it is
required by POSIX for getopt(), which getopt_long() ought to be
generally a superset of. Moreover, it's expected by ecpg, which
intends an argument of "-" to mean "read from stdin". So fix it.
Also add some documentation about ecpg's behavior in this area, since
that was miserably underdocumented. I had to reverse-engineer it
from the code.
Per bug #16304 from James Gray. Back-patch to all supported branches,
since this has been broken forever.
Discussion: https://postgr.es/m/16304-c662b00a1322db7f@postgresql.org
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Back-patch to 9.5 (all supported versions). This introduces a new WAL
record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As
always, update standby systems before master systems. This changes
sizeof(RelationData) and sizeof(IndexStmt), breaking binary
compatibility for affected extensions. (The most recent commit to
affect the same class of extensions was
089e4d405d0f3b94c74a2c6a54357a84a681754b.)
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
This patch adds the pseudo-types anycompatible, anycompatiblearray,
anycompatiblenonarray, and anycompatiblerange. They work much like
anyelement, anyarray, anynonarray, and anyrange respectively, except
that the actual input values need not match precisely in type.
Instead, if we can find a common supertype (using the same rules
as for UNION/CASE type resolution), then the parser automatically
promotes the input values to that type. For example,
"myfunc(anycompatible, anycompatible)" can match a call with one
integer and one bigint argument, with the integer automatically
promoted to bigint. With anyelement in the definition, the user
would have had to cast the integer explicitly.
The new types also provide a second, independent set of type variables
for function matching; thus with "myfunc(anyelement, anyelement,
anycompatible) returns anycompatible" the first two arguments are
constrained to be the same type, but the third can be some other
type, and the result has the type of the third argument. The need
for more than one set of type variables was foreseen back when we
first invented the polymorphic types, but we never did anything
about it.
Pavel Stehule, revised a bit by me
Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com
This commit changes pg_basebackup so that it specifies PROGRESS option in
BASE_BACKUP replication command whether --progress is specified or not.
This causes the server to estimate the total backup size and report it in
pg_stat_progress_basebackup.backup_total, by default. This is reasonable
default because the time required for the estimation would not be so large
in most cases.
Also this commit adds new option --no-estimate-size to pg_basebackup.
This option prevents the server from the estimation, and so is useful to
avoid such estimation time if it's too long.
Author: Fujii Masao
Reviewed-by: Magnus Hagander, Amit Langote
Discussion: https://postgr.es/m/CABUevEyDPPSjP7KRvfTXPdqOdY5aWNkqsB5aAXs3bco5ZwtGHg@mail.gmail.com
This commit renames RecoveryWalAll and RecoveryWalStream wait events to
RecoveryWalStream and RecoveryRetrieveRetryInterval, respectively,
in order to make the names and what they are more consistent. For example,
previously RecoveryWalAll was reported as a wait event while the recovery
was waiting for WAL from a stream, and which was confusing because the name
was very different from the situation where the wait actually could happen.
The names of macro variables for those wait events also are renamed
accordingly.
This commit also changes the category of RecoveryRetrieveRetryInterval to
Timeout from Activity because the wait event is reported while waiting based
on wal_retrieve_retry_interval.
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Atsushi Torikoshi
Discussion: https://postgr.es/m/124997ee-096a-5d09-d8da-2c7a57d0816e@oss.nttdata.com
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".
In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.
The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).
Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
Commit d06215d03b added a new attribute to pg_statistic_ext catalog, but
failed to add it to document it properly.
Reported-by: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com>