In order to troubleshoot misbehaving or buggy event triggers, the
documented advice is to enter single-user mode. In an attempt to
reduce the number of situations where single-user mode is required
(or even recommended) for non-extraordinary maintenance, this GUC
allows to temporarily suspend event triggers.
This was originally extracted from a larger patchset which aimed
at supporting event triggers on login events.
Reviewed-by: Ted Yu <yuzhihong@gmail.com>
Reviewed-by: Mikhail Gribkov <youzhick@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/9140106E-F9BF-4D85-8FC8-F2D3C094A6D9@yesql.se
Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709@postgrespro.ru
This commit allows specifying a --sync-method in several frontend
utilities that must synchronize many files to disk (initdb,
pg_basebackup, pg_checksums, pg_dump, pg_rewind, and pg_upgrade).
On Linux, users can specify "syncfs" to synchronize the relevant
file systems instead of calling fsync() for every single file. In
many cases, using syncfs() is much faster.
As with recovery_init_sync_method, this new option comes with some
caveats. The descriptions of these caveats have been moved to a
new appendix section in the documentation.
Co-authored-by: Justin Pryzby
Reviewed-by: Michael Paquier, Thomas Munro, Robert Haas, Justin Pryzby
Discussion: https://postgr.es/m/20210930004340.GM831%40telsasoft.com
The logical_replication_mode GUC is intended for testing and debugging
purposes, but its current name may be misleading and encourage users to make
unnecessary changes.
To avoid confusion, renaming the GUC to a less misleading name
debug_logical_replication_streaming that casual users are less likely to mistakenly
assume needs to be modified in a regular logical replication setup.
Author: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/d672d774-c44b-6fec-f993-793e744f169a%40eisentraut.org
This is useful to show the allocation state of huge pages when setting
up a server with "huge_pages = try", where allocating huge pages would
be attempted but the server would continue its startup sequence even if
the allocation fails. The effective status of huge pages is not easily
visible without OS-level tools (or for instance, a lookup at
/proc/N/smaps), and the environments where Postgres runs may not
authorize that. Like the other GUCs related to huge pages, this works
for Linux and Windows.
This GUC can report as values:
- "on", if huge pages were allocated.
- "off", if huge pages were not allocated.
- "unknown", a special state that could only be seen when using for
example postgres -C because it is only possible to know if the shared
memory allocation worked after we can check for the GUC values, even if
checking a runtime-computed GUC. This value should never be seen when
querying for the GUC on a running server. An assertion is added to
check that.
The discussion has also turned around having a new function to grab this
status, but this would have required more tricks for -DEXEC_BACKEND,
something that GUCs already handle.
Noriyoshi Shinoda has initiated the thread that has led to the result of
this commit.
Author: Justin Pryzby
Reviewed-by: Nathan Bossart, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/TU4PR8401MB1152EBB0D271F827E2E37A01EECC9@TU4PR8401MB1152.NAMPRD84.PROD.OUTLOOK.COM
The GUC settings lc_collate and lc_ctype are from a time when those
locale settings were cluster-global. When those locale settings were
made per-database (PG 8.4), the settings were kept as read-only. As
of PG 15, you can use ICU as the per-database locale provider, so
examining these settings is already less meaningful and possibly
confusing, since you need to look into pg_database to find out what is
really happening, and they would likely become fully obsolete in the
future anyway.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/696054d1-bc88-b6ab-129a-18b8bce6a6f0@enterprisedb.com
Give the new GUC introduced by d4e71df6 a name that is clearly not
intended for mainstream use quite yet.
Future proposals would drop the prefix only after adding infrastructure
to make it efficient. Having the switch in the tree sooner is good
because it might lead to new discoveries about the hazards awaiting us
on a wide range of systems, but that name was too enticing and could
lead to cross-version confusion in future, per complaints from Noah and
Justin.
Suggested-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (the idea, not the patch)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (ditto)
Discussion: https://postgr.es/m/20230430041106.GA2268796%40rfd.leadboat.com
An update of the GUC stats_fetch_consistency in a transaction would be
able to trigger an assertion when doing cache->snapshot. In this case,
when retrieving a pgstat entry after the switch, a new snapshot would be
rebuilt, confusing pgstat_build_snapshot() because a snapshot is already
cached with an unexpected mode ("cache").
In order to fix this problem, this commit adds a flag to force a
snapshot clear each time this GUC is changed. Some tests are added to
check, while on it.
Some optimizations in avoiding the snapshot clear should be possible
depending on what is cached and the current GUC value, I guess, but this
solution is simple, and ensures that the state of the cache is updated
each time a new pgstat entry is fetched, hence being consistent with the
level wanted by the client that has set the GUC.
Note that cache->none and snapshot->none would not cause issues, as
fetching a pgstat entry would be retrieved from shared memory on the
second attempt, however a snapshot would still be cached. Similarly,
none->snapshot and none->cache would build a new snapshot on the second
fetch attempt. Finally, snapshot->cache would cache a new snapshot on
the second attempt.
Reported-by: Alexander Lakhin
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/17804-2a118cd046f2d0e5@postgresql.org
backpatch-through: 15
vacuum_defer_cleanup_age was introduced before hot_standby_feedback and
replication slots existed. It is hard to use reasonably - commonly it will
either be set too low (not preventing recovery conflicts, while still causing
some bloat), or too high (causing a lot of bloat). The alternatives do not
have that issue.
That on its own might not be sufficient reason to remove
vacuum_defer_cleanup_age, but it also complicates computation of xid
horizons. See e.g. the bug fixed in be504a3e97. It also is untested.
This commit removes TransactionIdRetreatSafely(), as there are no users
anymore. There might be potential future users, hence noting that here.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20230317230930.nhsgk3qfk7f4axls@awork3.anarazel.de
This reverts commit 3d03b24c3 (Revert Add support for Kerberos
credential delegation) which was committed on the grounds of concern
about portability, but on further review and discussion, it's clear that
we are better off explicitly requiring MIT Kerberos as that appears to
be the only GSSAPI library currently that's under proper maintenance
and ongoing development. The API used for storing credentials was added
to MIT Kerberos over a decade ago while for the other libraries which
appear to be mainly based on Heimdal, which exists explicitly to be a
re-implementation of MIT Kerberos, the API never made it to a released
version (even though it was added to the Heimdal git repo over 5 years
ago..).
This post-feature-freeze change was approved by the RMT.
Discussion: https://postgr.es/m/ZDDO6jaESKaBgej0%40tamriel.snowman.net
This reverts commit 3d4fa227bc.
Per discussion and buildfarm, this depends on APIs that seem to not
be available on at least one platform (NetBSD). Should be certainly
possible to rework to be optional on that platform if necessary but bit
late for that at this point.
Discussion: https://postgr.es/m/3286097.1680922218@sss.pgh.pa.us
Provide a way to ask the kernel to use O_DIRECT (or local equivalent)
where available for data and WAL files, to avoid or minimize kernel
caching. This hurts performance currently and is not intended for end
users yet. Later proposed work would introduce our own I/O clustering,
read-ahead, etc to replace the facilities the kernel disables with this
option.
The only user-visible change, if the developer-only GUC is not used, is
that this commit also removes the obscure logic that would activate
O_DIRECT for the WAL when wal_sync_method=open_[data]sync and
wal_level=minimal (which also requires max_wal_senders=0). Those are
non-default and unlikely settings, and this behavior wasn't (correctly)
documented. The same effect can be achieved with io_direct=wal.
Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg%40mail.gmail.com
Support GSSAPI/Kerberos credentials being delegated to the server by a
client. With this, a user authenticating to PostgreSQL using Kerberos
(GSSAPI) credentials can choose to delegate their credentials to the
PostgreSQL server (which can choose to accept them, or not), allowing
the server to then use those delegated credentials to connect to
another service, such as with postgres_fdw or dblink or theoretically
any other service which is able to be authenticated using Kerberos.
Both postgres_fdw and dblink are changed to allow non-superuser
password-less connections but only when GSSAPI credentials have been
delegated to the server by the client and GSSAPI is used to
authenticate to the remote system.
Authors: Stephen Frost, Peifeng Qiu
Reviewed-By: David Christensen
Discussion: https://postgr.es/m/CO1PR05MB8023CC2CB575E0FAAD7DF4F8A8E29@CO1PR05MB8023.namprd05.prod.outlook.com
The old wording described these as being multiplied by max_connections
plus max_prepared_transactions, which hasn't been exactly right for
some time thanks to the addition of various auxiliary processes.
Moreover, exactness here is a bit pointless given that the lock tables
can expand into the initially-unallocated "slop" space in shared
memory. Rather than trying to track exactly what the code is doing,
let's just use the term "server processes".
Likewise adjust these GUCs' description strings in guc_tables.c.
Wang Wei, reviewed by Nathan Bossart and myself
Discussion: https://postgr.es/m/OS3PR01MB6275BDD09C9B875C65FCC5AB9EA39@OS3PR01MB6275.jpnprd01.prod.outlook.com
1cbbee033 added BUFFER_USAGE_LIMIT to the VACUUM and ANALYZE commands, so
here we permit that option to be specified in vacuumdb.
In passing, adjust the documents for vacuum_buffer_usage_limit and the
BUFFER_USAGE_LIMIT VACUUM option to mention "kB" rather than "KB". Do the
same for the ERROR message in ExecVacuum() and
check_vacuum_buffer_usage_limit(). Without that we might tell a user that
the valid minimum value is 128 KB only to reject that because we accept
only "kB" and not "KB".
Also, add a small reminder comment in vacuum.h to try to trigger the
memory of anyone adding new fields to VacuumParams that they might want to
consider if vacuumdb needs to grow a new option too.
Author: Melanie Plageman
Reviewed-by: Justin Pryzby
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/ZAzTg3iEnubscvbf@telsasoft.com
Add new options to the VACUUM and ANALYZE commands called
BUFFER_USAGE_LIMIT to allow users more control over how large to make the
buffer access strategy that is used to limit the usage of buffers in
shared buffers. Larger rings can allow VACUUM to run more quickly but
have the drawback of VACUUM possibly evicting more buffers from shared
buffers that might be useful for other queries running on the database.
Here we also add a new GUC named vacuum_buffer_usage_limit which controls
how large to make the access strategy when it's not specified in the
VACUUM/ANALYZE command. This defaults to 256KB, which is the same size as
the access strategy was prior to this change. This setting also
controls how large to make the buffer access strategy for autovacuum.
Per idea by Andres Freund.
Author: Melanie Plageman
Reviewed-by: David Rowley
Reviewed-by: Andres Freund
Reviewed-by: Justin Pryzby
Reviewed-by: Bharath Rupireddy
Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb@awork3.anarazel.de
Traditionally, vacuum always makes use of a buffer access strategy 32
buffers in size. This means that running vacuums tend not to cause too
many shared buffers to become dirty, however, this can cause vacuums to
run much more slowly than they otherwise could as WAL flushes will occur
more frequently due to having to flush WAL out to the LSN of the dirty
page before that page can be written to disk.
When we are performing failsafe VACUUMs (as added in 1e55e7d17), we really
want to make the vacuum work go as quickly as possible, so here we disable
the buffer access strategy when entering failsafe mode while vacuuming a
relation.
Per idea and analyis from Andres Freund.
In passing, also include some changes I had intended for 32fbe0239.
Author: Melanie Plageman
Reviewed-by: Justin Pryzby, David Rowley
Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb%40awork3.anarazel.de
Commit ecb696527c added an XML ID attribute to one varlistentry in
create_subscription.sgml. Following 78ee60ed84, this commit adds XML ID
attributes to all varlistentries in create_subscription.sgml.
Additionally, links are added to refer to the subscription options,
enhancing the readability of documents.
Author: Kuroda Hayato
Reviewed-by: Peter Smith, Amit Kapila
Discussion: https://postgr.es/m/TYAPR01MB58667AE04D291924671E2051F5879@TYAPR01MB5866.jpnprd01.prod.outlook.com
For ICU collations, ensure that the locale's language exists in ICU,
and that the locale can be opened.
Basic validation helps avoid minor mistakes and misspellings, which
often fall back to the root locale instead of the intended
locale. It's even more important to avoid such mistakes in ICU
versions 54 and earlier, where the same (misspelled) locale string
could fall back to different locales depending on the environment.
Discussion: https://postgr.es/m/11b1eeb7e7667fdd4178497aeb796c48d26e69b9.camel@j-davis.com
Discussion: https://postgr.es/m/df2efad0cae7c65180df8e5ebb709e5eb4f2a82b.camel@j-davis.com
Reviewed-by: Peter Eisentraut
Replace the hardcoded value with a GUC such that the iteration
count can be raised in order to increase protection against
brute-force attacks. The hardcoded value for SCRAM iteration
count was defined to be 4096, which is taken from RFC 7677, so
set the default for the GUC to 4096 to match. In RFC 7677 the
recommendation is at least 15000 iterations but 4096 is listed
as a SHOULD requirement given that it's estimated to yield a
0.5s processing time on a mobile handset of the time of RFC
writing (late 2015).
Raising the iteration count of SCRAM will make stored passwords
more resilient to brute-force attacks at a higher computational
cost during connection establishment. Lowering the count will
reduce computational overhead during connections at the tradeoff
of reducing strength against brute-force attacks.
There are however platforms where even a modest iteration count
yields a too high computational overhead, with weaker password
encryption schemes chosen as a result. In these situations,
SCRAM with a very low iteration count still gives benefits over
weaker schemes like md5, so we allow the iteration count to be
set to one at the low end.
The new GUC is intentionally generically named such that it can
be made to support future SCRAM standards should they emerge.
At that point the value can be made into key:value pairs with
an undefined key as a default which will be backwards compatible
with this.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Discussion: https://postgr.es/m/F72E7BC7-189F-4B17-BF47-9735EB72C364@yesql.se
force_parallel_mode is meant to be used to allow us to exercise the
parallel query infrastructure to ensure that it's working as we expect.
It seems some users think this GUC is for forcing the query planner into
picking a parallel plan regardless of the costs. A quick look at the
documentation would have made them realize that they were wrong, but the
GUC is likely too conveniently named which, evidently, seems to often
result in users expecting that it forces the planner into usefully
parallelizing queries.
Here we rename the GUC to something which casual users are less likely to
mistakenly think is what they need to make their query run more quickly.
For now, the old name can still be used. We'll revisit if the old name
mapping can be removed once the buildfarm configs are all updated.
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAApHDvrsOi92_uA7PEaHZMH-S4Xv+MGhQWA+GrP8b1kjpS1HjQ@mail.gmail.com
An early release of AF_UNIX in Windows apparently supported Linux-style
"abstract" Unix sockets, but they do not seem to work in current Windows
versions and there is no mention of any of this in the Winsock
documentation. Remove the mention of Windows from the documentation.
Back-patch to 14, where commit c9f0624b landed.
Discussion: https://postgr.es/m/CA%2BhUKGKrYbSZhrk4NGfoQGT_3LQS5pC5KNE1g0tvE_pPBZ7uew%40mail.gmail.com
Extend the existing developer option 'logical_replication_mode' to help
test the parallel apply of large transactions on the subscriber.
When set to 'buffered', the leader sends changes to parallel apply workers
via a shared memory queue. When set to 'immediate', the leader serializes
all changes to files and notifies the parallel apply workers to read and
apply them at the end of the transaction.
This helps in adding tests to cover the serialization code path in
parallel streaming mode.
Author: Hou Zhijie
Reviewed-by: Peter Smith, Kuroda Hayato, Sawada Masahiko, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
Rename the developer option 'logical_decoding_mode' to the more flexible
name 'logical_replication_mode' because doing so will make it easier to
extend this option in the future to help test other areas of logical
replication.
Currently, it is used on the publisher side to allow streaming or
serializing each change in logical decoding. In the upcoming patch, we are
planning to use it on the subscriber. On the subscriber, it will allow
serializing the changes to file and notifies the parallel apply workers to
read and apply them at the end of the transaction.
We discussed exposing this parameter as a subscription option but
it did not seem advisable since it is primarily used for testing/debugging
and there is no other such parameter. We also discussed having separate
GUCs for publisher and subscriber but for current testing/debugging
requirements, one GUC is sufficient.
Author: Hou Zhijie
Reviewed-by: Peter Smith, Kuroda Hayato, Sawada Masahiko, Amit Kapila
Discussion: https://postgr.es/m/CAD21AoAy2c=Mx=FTCs+EwUsf2kQL5MmU3N18X84k0EmCXntK4g@mail.gmail.com
Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
Eager freezing strategy avoids large build-ups of all-visible pages. It
makes VACUUM trigger page-level freezing whenever doing so will enable
the page to become all-frozen in the visibility map. This is useful for
tables that experience continual growth, particularly strict append-only
tables such as pgbench's history table. Eager freezing significantly
improves performance stability by spreading out the cost of freezing
over time, rather than doing most freezing during aggressive VACUUMs.
It complements the insert autovacuum mechanism added by commit b07642db.
VACUUM determines its freezing strategy based on the value of the new
vacuum_freeze_strategy_threshold GUC (or reloption) with logged tables.
Tables that exceed the size threshold use the eager freezing strategy.
Unlogged tables and temp tables always use eager freezing strategy,
since the added cost is negligible there. Non-permanent relations won't
incur any extra overhead in WAL written (for the obvious reason), nor in
pages dirtied (since any extra freezing will only take place on pages
whose PD_ALL_VISIBLE bit needed to be set either way).
VACUUM uses lazy freezing strategy for logged tables that fall under the
GUC size threshold. Page-level freezing triggers based on the criteria
established in commit 1de58df4, which added basic page-level freezing.
Eager freezing is strictly more aggressive than lazy freezing. Settings
like vacuum_freeze_min_age still get applied in just the same way in
every VACUUM, independent of the strategy in use. The only mechanical
difference between eager and lazy freezing strategies is that only the
former applies its own additional criteria to trigger freezing pages.
Note that even lazy freezing strategy will trigger freezing whenever a
page happens to have required that an FPI be written during pruning,
provided that the page will thereby become all-frozen in the visibility
map afterwards (due to the FPI optimization from commit 1de58df4).
The vacuum_freeze_strategy_threshold default setting is 4GB. This is a
relatively low setting that prioritizes performance stability. It will
be reviewed at the end of the Postgres 16 beta period.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
Enforce wal_retrieve_retry_interval on a per-subscription basis,
rather than globally, and arrange to skip that delay in case of
an intentional worker exit. This probably makes little difference
in the field, where apply workers wouldn't be restarted often;
but it has a significant impact on the runtime of our logical
replication regression tests (even though those tests use
artificially-small wal_retrieve_retry_interval settings already).
Nathan Bossart, with mostly-cosmetic editorialization by me
Discussion: https://postgr.es/m/20221122004119.GA132961@nathanxps13
This provides a way to reserve connection slots for non-superusers.
The slots reserved via the new GUC are available only to users who
have the new predefined role pg_use_reserved_connections.
superuser_reserved_connections remains as a final reserve in case
reserved_connections has been exhausted.
Patch by Nathan Bossart. Reviewed by Tushar Ahuja and by me.
Discussion: http://postgr.es/m/20230119194601.GA4105788@nathanxps13
Commit ea92368cd1 made max_wal_senders
a separate pool of backends from max_connections, but the documentation
and error message for superuser_reserved_connections weren't updated
at the time, and as a result are somewhat misleading. Update.
This is arguably a back-patchable bug fix, but because it seems quite
minor, no back-patch.
Patch by Nathan Bossart. Reviewed by Tushar Ahuja and by me.
Discussion: http://postgr.es/m/20230119194601.GA4105788@nathanxps13
Avoid use of "_" in SGML IDs. Awhile back that was actually
disallowed by the toolchain, as a consequence of which our convention
has been to use "-" instead. Fix a couple of stragglers that are
particularly inconsistent with that convention and with related IDs.
This is just neatnik-ism, so no need for back-patch.
Discussion: https://postgr.es/m/769446.1673478332@sss.pgh.pa.us
Can be set to the empty string, or to either or both of "set" or
"inherit". If set to a non-empty value, a non-superuser who creates
a role (necessarily by relying up the CREATEROLE privilege) will
grant that role back to themselves with the specified options.
This isn't a security feature, because the grant that this feature
triggers can also be performed explicitly. Instead, it's a user experience
feature. A superuser would necessarily inherit the privileges of any
created role and be able to access all such roles via SET ROLE;
with this patch, you can configure createrole_self_grant = 'set, inherit'
to provide a similar experience for a user who has CREATEROLE but not
SUPERUSER.
Discussion: https://postgr.es/m/CA+TgmobN59ct+Emmz6ig1Nua2Q-_o=r6DSD98KfU53kctq_kQw@mail.gmail.com
Currently, for large transactions, the publisher sends the data in
multiple streams (changes divided into chunks depending upon
logical_decoding_work_mem), and then on the subscriber-side, the apply
worker writes the changes into temporary files and once it receives the
commit, it reads from those files and applies the entire transaction. To
improve the performance of such transactions, we can instead allow them to
be applied via parallel workers.
In this approach, we assign a new parallel apply worker (if available) as
soon as the xact's first stream is received and the leader apply worker
will send changes to this new worker via shared memory. The parallel apply
worker will directly apply the change instead of writing it to temporary
files. However, if the leader apply worker times out while attempting to
send a message to the parallel apply worker, it will switch to
"partial serialize" mode - in this mode, the leader serializes all
remaining changes to a file and notifies the parallel apply workers to
read and apply them at the end of the transaction. We use a non-blocking
way to send the messages from the leader apply worker to the parallel
apply to avoid deadlocks. We keep this parallel apply assigned till the
transaction commit is received and also wait for the worker to finish at
commit. This preserves commit ordering and avoid writing to and reading
from files in most cases. We still need to spill if there is no worker
available.
This patch also extends the SUBSCRIPTION 'streaming' parameter so that the
user can control whether to apply the streaming transaction in a parallel
apply worker or spill the change to disk. The user can set the streaming
parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means
the streaming will be applied via a parallel apply worker, if available.
The parameter value 'on' means the streaming transaction will be spilled
to disk. The default value is 'off' (same as current behaviour).
In addition, the patch extends the logical replication STREAM_ABORT
message so that abort_lsn and abort_time can also be sent which can be
used to update the replication origin in parallel apply worker when the
streaming transaction is aborted. Because this message extension is needed
to support parallel streaming, parallel streaming is not supported for
publications on servers < PG16.
Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko
Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik
Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
Teach VACUUM to decide on whether or not to trigger freezing at the
level of whole heap pages. Individual XIDs and MXIDs fields from tuple
headers now trigger freezing of whole pages, rather than independently
triggering freezing of each individual tuple header field.
Managing the cost of freezing over time now significantly influences
when and how VACUUM freezes. The overall amount of WAL written is the
single most important freezing related cost, in general. Freezing each
page's tuples together in batch allows VACUUM to take full advantage of
the freeze plan WAL deduplication optimization added by commit 9e540599.
Also teach VACUUM to trigger page-level freezing whenever it detects
that heap pruning generated an FPI. We'll have already written a large
amount of WAL just to do that much, so it's very likely a good idea to
get freezing out of the way for the page early. This only happens in
cases where it will directly lead to marking the page all-frozen in the
visibility map.
In most cases "freezing a page" removes all XIDs < OldestXmin, and all
MXIDs < OldestMxact. It doesn't quite work that way in certain rare
cases involving MultiXacts, though. It is convenient to define "freeze
the page" in a way that gives FreezeMultiXactId the leeway to put off
the work of processing an individual tuple's xmax whenever it happens to
be a MultiXactId that would require an expensive second pass to process
aggressively (allocating a new multi is especially worth avoiding here).
FreezeMultiXactId is eager when processing is cheap (as it usually is),
and lazy in the event of an individual multi that happens to require
expensive second pass processing. This avoids regressions related to
processing of multis that page-level freezing might otherwise cause.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
This enables streaming or serializing changes immediately in logical
decoding. This parameter is intended to be used to test logical decoding
and replication of large transactions for which otherwise we need to
generate the changes till logical_decoding_work_mem is reached.
This helps in reducing the timing of existing tests related to logical
replication of in-progress transactions and will help in writing tests for
for the upcoming feature for parallelly applying large in-progress
transactions.
Author: Shi yu
Reviewed-by: Sawada Masahiko, Shveta Mallik, Amit Kapila, Dilip Kumar, Kuroda Hayato, Kyotaro Horiguchi
Discussion: https://postgr.es/m/OSZPR01MB63104E7449DBE41932DB19F1FD1B9@OSZPR01MB6310.jpnprd01.prod.outlook.com
1349d279 added query planner support to allow more efficient execution of
aggregate functions which have an ORDER BY or a DISTINCT clause. Prior to
that commit, the planner would only request that the lower planner produce
a plan with the order required for the GROUP BY clause and it would be
left up to nodeAgg.c to perform the final sort of records within each
group so that the aggregate transition functions were called in the
correct order. Now that the planner requests the lower planner produce a
plan with the GROUP BY and the ORDER BY / DISTINCT aggregates in mind,
there is the possibility that the planner chooses a plan which could be
less efficient than what would have been produced before 1349d279.
While developing 1349d279, I had in mind that Incremental Sort would help
us in cases where an index exists only on the GROUP BY column(s).
Incremental Sort would just replace the implicit tuplesorts which are
being performed in nodeAgg.c. However, because the planner has the
flexibility to instead choose a plan which just performs a full sort on
both the GROUP BY and ORDER BY / DISTINCT aggregate columns, there is
potential for the planner to make a bad choice. The costing for
Incremental Sort is not perfect as it assumes an even distribution of rows
to sort within each sort group.
Here we add an escape hatch in the form of the enable_presorted_aggregate
GUC. This will allow users to get the pre-PG16 behavior in cases where
they have no other means to convince the query planner to produce a plan
which only sorts on the GROUP BY column(s).
Discussion: https://postgr.es/m/CAApHDvr1Sm+g9hbv4REOVuvQKeDWXcKUAhmbK5K+dfun0s9CvA@mail.gmail.com
Add some cross-links between chapter "20. Server Parameters" and
"31. Logical Replication" regarding the available configuration
parameters, for easier navigation; and some more explanatory text too.
I (Álvaro) chose to duplicate max_replication_slots in Chapter 20,
because it has completely different meanings at each side of the
replication link.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: samay sharma <smilingsamay@gmail.com>
Discussion: https://postgr.es/m/CAHut+PsESqpy7w3Y6cX98c255ZuCjvipkhKjy6hZBjOv4E6iJA@mail.gmail.com