It's good hygiene if e.g. an extension launches a subprogram when
being loaded. We went through some effort to close them in the child
process in EXEC_BACKEND mode, but it's better to not hand them down to
the child process in the first place. We still need to close them
after fork when !EXEC_BACKEND, but it's a little simpler.
In the passing, LOG a message if closing the client connection or
listen socket fails. Shouldn't happen, but if it does, would be nice
to know.
Reviewed-by: Tristan Partin, Andres Freund, Thomas Munro
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
This commit introduces descriptively-named macros for the
identifiers used in wire protocol messages. These new macros are
placed in a new header file so that they can be easily used by
third-party code.
Author: Dave Cramer
Reviewed-by: Alvaro Herrera, Tatsuo Ishii, Peter Smith, Robert Haas, Tom Lane, Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/CADK3HHKbBmK-PKf1bPNFoMC%2BoBt%2BpD9PH8h5nvmBQskEHm-Ehw%40mail.gmail.com
Both apply and tablesync workers were using ApplyWorkerMain() as entry
point. As the name implies, ApplyWorkerMain() should be considered as
the main function for apply workers. Tablesync worker's path was hidden
and does not have enough in common to share the same main function with
apply worker.
Also, most of the code shared by both worker types is already combined
in LogicalRepApplyLoop(). There is no need to combine the rest in
ApplyWorkerMain() anymore.
This patch introduces TablesyncWorkerMain() as a new entry point for
tablesync workers. This aims to increase code readability and would help
with future improvements like the reuse of tablesync workers in the
initial synchronization.
Author: Melih Mutlu based on suggestions by Melanie Plageman
Reviewed-by: Peter Smith, Kuroda Hayato, Amit Kapila
Discussion: http://postgr.es/m/CAGPVpCTq=rUDd4JUdaRc1XUWf4BrH2gdSNf3rtOMUGj9rPpfzQ@mail.gmail.com
Until now, when DROP DATABASE got interrupted in the wrong moment, the removal
of the pg_database row would also roll back, even though some irreversible
steps have already been taken. E.g. DropDatabaseBuffers() might have thrown
out dirty buffers, or files could have been unlinked. But we continued to
allow connections to such a corrupted database.
To fix this, mark databases invalid with an in-place update, just before
starting to perform irreversible steps. As we can't add a new column in the
back branches, we use pg_database.datconnlimit = -2 for this purpose.
An invalid database cannot be connected to anymore, but can still be
dropped.
Unfortunately we can't easily add output to psql's \l to indicate that some
database is invalid, it doesn't fit in any of the existing columns.
Add tests verifying that a interrupted DROP DATABASE is handled correctly in
the backend and in various tools.
Reported-by: Evgeny Morozov <postgresql3@realityexists.net>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de
Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de
Backpatch: 11-, bug present in all supported versions
This commit increases the size of the bgw_library_name member of
the BackgroundWorker struct from BGW_MAXLEN (96) bytes to MAXPGPATH
(default of 1024) bytes so that it can store longer file names
(e.g., absolute paths).
Author: Yurii Rashkovskii
Reviewed-by: Daniel Gustafsson, Aleksander Alekseev
Discussion: https://postgr.es/m/CA%2BRLCQyjFV5Y8tG5QgUb6gjteL4S3p%2B1gcyqWTqigyM93WZ9Pg%40mail.gmail.com
Unlike commands such as CREATE DATABASE, ProcessStartupPacket()
does not perform multibyte-aware truncation of overlength names.
This means that connection attempts might fail even if the user
provides the same overlength names that were used in CREATE
DATABASE, CREATE ROLE, etc. Ideally, we'd do the same multibyte-
aware truncation in both code paths, but it doesn't seem worth the
added complexity of trying to discover the encoding of the names.
Instead, let's simply skip truncating the names in the startup
packet and let the user/database lookup fail later on. With this
change, users must provide the exact names stored in the catalogs,
even if the names were truncated.
This reverts commit d18c1d1f51.
Author: Bertrand Drouvot
Reviewed-by: Kyotaro Horiguchi, Tom Lane
Discussion: https://postgr.es/m/07436793-1426-29b2-f924-db7422a05fb7%40gmail.com
Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
Commit a73952b795 (new in 16) required default values in guc_table.c
and C variable initializers to match. This one only matched when
XLOG_BLCKSZ == 8kB. Fix by using the same expression in both places
with a new DEFAULT_XXX macro, as done for other GUCs.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGLNmLV=VrT==5MqnbARgx2ifRSFtdd8ofdfrdSLL3yv5A@mail.gmail.com
Commit 1021bd6a89 excluded autovacuum workers from cost-limit balance
calculations when per-relation options were set. The code checks for
limit and cost_delay being greater than zero, but since cost_delay can
be set to -1 the test needs to check for greater than or zero.
Backpatch to all supported branches since 1021bd6a89 was backpatched
all the way at the time.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAD21AoBS7o6Ljt_vfqPQPf67AhzKu3fR0iqk8B=vVYczMugKMQ@mail.gmail.com
Backpatch-through: v11 (all supported branches)
Commit 7d71d3dd0 introduced finer grained updates of autovacuum option
changes by increasing the frequency of reading the configuration file.
The debug logging of cost parameter was however changed such that some
initial values weren't logged. Fix by changing logging to use the old
frequency of logging regardless of them changing.
Also avoid taking a log for rendering the log message unless the set
loglevel is such that the log entry will be emitted.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBS7o6Ljt_vfqPQPf67AhzKu3fR0iqk8B=vVYczMugKMQ@mail.gmail.com
Add new options to the VACUUM and ANALYZE commands called
BUFFER_USAGE_LIMIT to allow users more control over how large to make the
buffer access strategy that is used to limit the usage of buffers in
shared buffers. Larger rings can allow VACUUM to run more quickly but
have the drawback of VACUUM possibly evicting more buffers from shared
buffers that might be useful for other queries running on the database.
Here we also add a new GUC named vacuum_buffer_usage_limit which controls
how large to make the access strategy when it's not specified in the
VACUUM/ANALYZE command. This defaults to 256KB, which is the same size as
the access strategy was prior to this change. This setting also
controls how large to make the buffer access strategy for autovacuum.
Per idea by Andres Freund.
Author: Melanie Plageman
Reviewed-by: David Rowley
Reviewed-by: Andres Freund
Reviewed-by: Justin Pryzby
Reviewed-by: Bharath Rupireddy
Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb@awork3.anarazel.de
Allow autovacuum to reload the config file more often so that cost-based
delay parameters can take effect while VACUUMing a relation. Previously,
autovacuum workers only reloaded the config file once per relation
vacuumed, so config changes could not take effect until beginning to
vacuum the next table.
Now, check if a reload is pending roughly once per block, when checking
if we need to delay.
In order for autovacuum workers to safely update their own cost delay
and cost limit parameters without impacting performance, we had to
rethink when and how these values were accessed.
Previously, an autovacuum worker's wi_cost_limit was set only at the
beginning of vacuuming a table, after reloading the config file.
Therefore, at the time that autovac_balance_cost() was called, workers
vacuuming tables with no cost-related storage parameters could still
have different values for their wi_cost_limit_base and wi_cost_delay.
Now that the cost parameters can be updated while vacuuming a table,
workers will (within some margin of error) have no reason to have
different values for cost limit and cost delay (in the absence of
cost-related storage parameters). This removes the rationale for keeping
cost limit and cost delay in shared memory. Balancing the cost limit
requires only the number of active autovacuum workers vacuuming a table
with no cost-based storage parameters.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com
Vacuum code run both by autovacuum workers and a backend doing
VACUUM/ANALYZE previously inspected VacuumCostLimit and VacuumCostDelay,
which are the global variables backing the GUCs vacuum_cost_limit and
vacuum_cost_delay.
Autovacuum workers needed to override these variables with their
own values, derived from autovacuum_vacuum_cost_limit and
autovacuum_vacuum_cost_delay and worker cost limit balancing logic.
This led to confusing code which, in some cases, both derived and
set a new value of VacuumCostLimit from VacuumCostLimit.
In preparation for refreshing these GUC values more often, introduce
new, independent global variables and add a function to update them
using the GUCs and existing logic.
Per suggestion by Kyotaro Horiguchi
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com
vacuum() is used for both the VACUUM command and for autovacuum. There
were many prechecks being done inside vacuum() that were just not relevant
to autovacuum. Let's move the bulk of these into ExecVacuum() so that
they're only executed when running the VACUUM command. This removes a
small amount of overhead when autovacuum vacuums a table.
While we are at it, allocate VACUUM's BufferAccessStrategy in ExecVacuum()
and pass it into vacuum() instead of expecting vacuum() to make it if it's
not already made by the calling function. To make this work, we need to
create the vacuum memory context slightly earlier, so we now need to pass
that down to vacuum() so that it's available for use in other memory
allocations.
Author: Melanie Plageman
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20230405211534.4skgskbilnxqrmxg@awork3.anarazel.de
A future patch will add support for extending relations by multiple blocks at
once. To be concurrency safe, the buffers for those blocks need to be marked
as BM_IO_IN_PROGRESS. Until now we only had infrastructure for recovering from
an IO error for a single buffer. This commit extends that infrastructure to
multiple buffers by using the resource owner infrastructure.
This commit increases the size of the ResourceOwnerData struct, which appears
to have a just about measurable overhead in very extreme workloads. Medium
term we are planning to substantially shrink the size of
ResourceOwnerData. Short term the increase is small enough to not worry about
it for now.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de
Discussion: https://postgr.es/m/20221029200025.w7bvlgvamjfo6z44@awork3.anarazel.de
Further to commit 6a9229da, checking for NULL is now redundant. An "out
of memory" error would have been thrown already by palloc() and treated
as FATAL, so we can delete a few more lines.
Back-patch to all releases, like those other commits.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/4040668.1679013388%40sss.pgh.pa.us
A comment was left behind claiming that we needed to use malloc() rather
than palloc() because the corresponding free would run in another
thread, but that's not true anymore. Remove that comment. And, with
the reason being gone, we might as well actually use palloc().
Back-patch to supported releases, like d41a178b.
Discussion: https://postgr.es/m/CA%2BhUKG%2BpdM9v3Jv4tc2BFx2jh_daY3uzUyAGBhtDkotEQDNPYw%40mail.gmail.com
Our waitpid() emulation didn't prevent a PID from being recycled by the
OS before the call to waitpid(). The postmaster could finish up
tracking more than one child process with the same PID, and confuse
them.
Fix, by moving the guts of pgwin32_deadchild_callback() into waitpid(),
so that resources are released synchronously. The process and PID
continue to exist until we close the process handle, which only happens
once we're ready to adjust our book-keeping of running children.
This seems to explain a couple of failures on CI. It had never been
reported before, despite the code being as old as the Windows port.
Perhaps Windows started recycling PIDs more rapidly, or perhaps timing
changes due to commit 7389aad6 made it more likely to break.
Thanks to Alexander Lakhin for analysis and Andres Freund for tracking
down the root cause.
Back-patch to all supported branches.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20230208012852.bvkn2am4h4iqjogq%40awork3.anarazel.de
Disabling this option is useful to run VACUUM (with or without FULL) on
only the toast table of a relation, bypassing the main relation. This
option is enabled by default.
Running directly VACUUM on a toast table was already possible without
this feature, by using the non-deterministic name of a toast relation
(as of pg_toast.pg_toast_N, where N would be the OID of the parent
relation) in the VACUUM command, and it required a scan of pg_class to
know the name of the toast table. So this feature is basically a
shortcut to be able to run VACUUM or VACUUM FULL on a toast relation,
using only the name of the parent relation.
A new switch called --no-process-main is added to vacuumdb, to work as
an equivalent of PROCESS_MAIN.
Regression tests are added to cover VACUUM and VACUUM FULL, looking at
pg_stat_all_tables.vacuum_count to see how many vacuums have run on
each table, main or toast.
Author: Nathan Bossart
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/20221230000028.GA435655@nathanxps13
A new callback named startup_cb, called shortly after a module is
loaded, is added. This makes possible the initialization of any
additional state data required by a module. This initial state data can
be saved in a ArchiveModuleState, that is now passed down to all the
callbacks that can be defined in a module. With this design, it is
possible to have a per-module state, aimed at opening the door to the
support of more than one archive module.
The initialization of the callbacks is changed so as
_PG_archive_module_init() does not anymore give in input a
ArchiveModuleCallbacks that a module has to fill in with callback
definitions. Instead, a module now needs to return a const
ArchiveModuleCallbacks.
All the structure and callback definitions of archive modules are moved
into their own header, named archive_module.h, from pgarch.h.
Command-based archiving follows the same line, with a new set of files
named shell_archive.{c,h}.
There are a few more items that are under discussion to improve the
design of archive modules, like the fact that basic_archive calls
sigsetjmp() by itself to define its own error handling flow. These will
be adjusted later, the changes done here cover already a good portion
of what has been discussed.
Any modules created for v15 will need to be adjusted to this new
design.
Author: Nathan Bossart
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20230130194810.6fztfgbn32e7qarj@awork3.anarazel.de
In standby mode, we don't actually report progress of recovery,
but up until now, startup_progress_timeout_handler() nevertheless
got called every log_startup_progress_interval seconds. That's
an unnecessary expense, so avoid it.
Report by Thomas Munro. Patch by Bharath Rupireddy, reviewed by
Simon Riggs, Thomas Munro, and me. Back-patch to v15, where
the problem was introduced.
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGKCHSffAj8zZJKJvNX7ygnQFxVD6wm1d-2j3fVw%2BMafPQ%40mail.gmail.com
In the 90s we needed to deal with computers that still had the
pre-standard signal masking APIs. That hasn't been relevant for a very
long time on Unix systems, and c94ae9d8 got rid of a remaining
dependency in our Windows porting code. PG_SETMASK didn't expose
save/restore functionality, so we'd already started using sigprocmask()
directly in places, creating the visual distraction of having two ways
to spell it. It's not part of the API that extensions are expected to
be using (but if they are, the change will be trivial). It seems like a
good time to drop the old macro and just call the standard POSIX
function.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BKfQgrhHP2DLTohX1WwubaCBHmTzGnAEDPZ-Gug-Xskg%40mail.gmail.com
As coded, the timeout given to WaitLatch() was always equal to
PGARCH_AUTOWAKE_INTERVAL, as time() was called two times repeatedly.
This simplification could have been done in d75288f.
While on it, this adjusts a comment in pgarch.c to describe the archiver
in a more neutral way.
Author: Sravan Kumar, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CA+=NbjjqYE9-Lnw7H7DAiS5jebmoMikwZQb_sBP7kgBCn9q6Hg@mail.gmail.com
We'd like to use TimestampDifferenceMilliseconds with the stop_time
possibly being TIMESTAMP_INFINITY, but up to now it's disclaimed
responsibility for overflow cases. Define it to clamp its output to
the range [0, INT_MAX], handling overflow correctly. (INT_MAX rather
than LONG_MAX seems appropriate, because the function is already
described as being intended for calculating wait times for WaitLatch
et al, and that infrastructure only handles waits up to INT_MAX.
Also, this choice gets rid of cross-platform behavioral differences.)
Having done that, we can replace some ad-hoc code in walreceiver.c
with a simple call to TimestampDifferenceMilliseconds.
While at it, fix some buglets in existing callers of
TimestampDifferenceMilliseconds: basebackup_copy.c had not read the
memo about TimestampDifferenceMilliseconds never returning a negative
value, and postmaster.c had not read the memo about Min() and Max()
being macros with multiple-evaluation hazards. Neither of these
quite seem worth back-patching.
Patch by me; thanks to Nathan Bossart for review.
Discussion: https://postgr.es/m/3126727.1674759248@sss.pgh.pa.us
Eager freezing strategy avoids large build-ups of all-visible pages. It
makes VACUUM trigger page-level freezing whenever doing so will enable
the page to become all-frozen in the visibility map. This is useful for
tables that experience continual growth, particularly strict append-only
tables such as pgbench's history table. Eager freezing significantly
improves performance stability by spreading out the cost of freezing
over time, rather than doing most freezing during aggressive VACUUMs.
It complements the insert autovacuum mechanism added by commit b07642db.
VACUUM determines its freezing strategy based on the value of the new
vacuum_freeze_strategy_threshold GUC (or reloption) with logged tables.
Tables that exceed the size threshold use the eager freezing strategy.
Unlogged tables and temp tables always use eager freezing strategy,
since the added cost is negligible there. Non-permanent relations won't
incur any extra overhead in WAL written (for the obvious reason), nor in
pages dirtied (since any extra freezing will only take place on pages
whose PD_ALL_VISIBLE bit needed to be set either way).
VACUUM uses lazy freezing strategy for logged tables that fall under the
GUC size threshold. Page-level freezing triggers based on the criteria
established in commit 1de58df4, which added basic page-level freezing.
Eager freezing is strictly more aggressive than lazy freezing. Settings
like vacuum_freeze_min_age still get applied in just the same way in
every VACUUM, independent of the strategy in use. The only mechanical
difference between eager and lazy freezing strategies is that only the
former applies its own additional criteria to trigger freezing pages.
Note that even lazy freezing strategy will trigger freezing whenever a
page happens to have required that an FPI be written during pruning,
provided that the page will thereby become all-frozen in the visibility
map afterwards (due to the FPI optimization from commit 1de58df4).
The vacuum_freeze_strategy_threshold default setting is 4GB. This is a
relatively low setting that prioritizes performance stability. It will
be reviewed at the end of the Postgres 16 beta period.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
Modify the new event loop code from commit 7389aad6 so that it checks
for work requested by signal handlers even if it doesn't see a latch
event yet.
This gives priority to shutdown and reload requests where the latch will
be reported later in the event array, or in a later call to
WaitEventSetWait(), due to scheduling details. In particular, this
guarantees that a SIGHUP-then-connect sequence (as seen in
authentication tests) causes the postmaster to process the reload before
accepting the connection. If the WaitEventSetWait() call saw the socket
as ready, and the reload signal was generated before the connection,
then the latest time the signal handler should be able to run is after
poll/epoll_wait/kevent returns but before we check the
pending_pm_reload_request flag.
While here, also shift the handling of child exit below reload requests,
per Tom Lane's observation that that might start new processes, so we
should make sure we pick up new settings first.
This probably explains the one-off failure of build farm animal
malleefowl.
Reported-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/OS0PR01MB57163D3BF2AB42ECAA94E5C394C29%40OS0PR01MB5716.jpnprd01.prod.outlook.com
When the length was too short, the server read outside the allocation.
That yielded the same log noise as sending the correct length with
(backendPID,cancelAuthCode) matching nothing. Change to a message about
the unexpected length. Given the attacker's lack of control over the
memory layout and the general lack of diversity in memory layouts at the
code in question, we doubt a would-be attacker could cause a segfault.
Hence, while the report arrived via security@postgresql.org, this is not
a vulnerability. Back-patch to v11 (all supported versions).
Andrey Borodin, reviewed by Tom Lane. Reported by Andrey Borodin.
This will ease a follow-up move that will generate automatically this
code. The C file is renamed, for consistency with the node-related
files whose code are generated by gen_node_support.pl:
- queryjumble.c -> queryjumblefuncs.c
- utils/queryjumble.h -> nodes/queryjumble.h
Per a suggestion from Peter Eisentraut.
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/Y5BHOUhX3zTH/ig6@paquier.xyz
This provides a way to reserve connection slots for non-superusers.
The slots reserved via the new GUC are available only to users who
have the new predefined role pg_use_reserved_connections.
superuser_reserved_connections remains as a final reserve in case
reserved_connections has been exhausted.
Patch by Nathan Bossart. Reviewed by Tushar Ahuja and by me.
Discussion: http://postgr.es/m/20230119194601.GA4105788@nathanxps13
This is in preparation for adding a new reserved_connections GUC,
but aligning the GUC name with the variable name is also a good
idea on general principle.
Patch by Nathan Bossart. Reviewed by Tushar Ahuja and by me.
Discussion: http://postgr.es/m/20230119194601.GA4105788@nathanxps13
Switch to a design similar to regular backends, instead of the previous
arrangement where signal handlers did non-trivial state management and
called fork(). The main changes are:
* The postmaster now has its own local latch to wait on. (For now, we
don't want other backends setting its latch directly, but that could
probably be made to work with more research on robustness.)
* The existing signal handlers are cut in two: a handle_pm_XXX() part
that just sets pending_pm_XXX flags and the latch, and a
process_pm_XXX() part that runs later when the latch is seen.
* Signal handlers are now installed with the regular pqsignal()
function rather than the special pqsignal_pm() function; historical
portability concerns about the effect of SA_RESTART on select() are no
longer relevant, and we don't need to block signals anymore.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
There are a number of places where a shell command is constructed with
percent-placeholders (like %x). It's cumbersome to have to open-code
this several times. This factors out this logic into a separate
function. This also allows us to ensure consistency for and document
some subtle behaviors, such as what to do with unrecognized
placeholders.
The unified handling is now that incorrect and unknown placeholders
are an error, where previously in most cases they were skipped or
ignored. This affects the following settings:
- archive_cleanup_command
- archive_command
- recovery_end_command
- restore_command
- ssl_passphrase_command
The following settings are part of this refactoring but already had
stricter error handling and should be unchanged in their behavior:
- basebackup_to_shell.command
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5238bbed-0b01-83a6-d4b2-7eb0562a054e%40enterprisedb.com
Currently, for large transactions, the publisher sends the data in
multiple streams (changes divided into chunks depending upon
logical_decoding_work_mem), and then on the subscriber-side, the apply
worker writes the changes into temporary files and once it receives the
commit, it reads from those files and applies the entire transaction. To
improve the performance of such transactions, we can instead allow them to
be applied via parallel workers.
In this approach, we assign a new parallel apply worker (if available) as
soon as the xact's first stream is received and the leader apply worker
will send changes to this new worker via shared memory. The parallel apply
worker will directly apply the change instead of writing it to temporary
files. However, if the leader apply worker times out while attempting to
send a message to the parallel apply worker, it will switch to
"partial serialize" mode - in this mode, the leader serializes all
remaining changes to a file and notifies the parallel apply workers to
read and apply them at the end of the transaction. We use a non-blocking
way to send the messages from the leader apply worker to the parallel
apply to avoid deadlocks. We keep this parallel apply assigned till the
transaction commit is received and also wait for the worker to finish at
commit. This preserves commit ordering and avoid writing to and reading
from files in most cases. We still need to spill if there is no worker
available.
This patch also extends the SUBSCRIPTION 'streaming' parameter so that the
user can control whether to apply the streaming transaction in a parallel
apply worker or spill the change to disk. The user can set the streaming
parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means
the streaming will be applied via a parallel apply worker, if available.
The parameter value 'on' means the streaming transaction will be spilled
to disk. The default value is 'off' (same as current behaviour).
In addition, the patch extends the logical replication STREAM_ABORT
message so that abort_lsn and abort_time can also be sent which can be
used to update the replication origin in parallel apply worker when the
streaming transaction is aborted. Because this message extension is needed
to support parallel streaming, parallel streaming is not supported for
publications on servers < PG16.
Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko
Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik
Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
VACUUM normally ends by running vac_update_datfrozenxid(), which
requires a scan of pg_class. Therefore, if one attempts to vacuum a
database one table at a time --- as vacuumdb has done since v12 ---
we will spend O(N^2) time in vac_update_datfrozenxid(). That causes
serious performance problems in databases with tens of thousands of
tables, and indeed the effect is measurable with only a few hundred.
To add insult to injury, only one process can run
vac_update_datfrozenxid at the same time per DB, so this behavior
largely defeats vacuumdb's -j option.
Hence, invent options SKIP_DATABASE_STATS and ONLY_DATABASE_STATS
to allow applications to postpone vac_update_datfrozenxid() until the
end of a series of VACUUM requests, and teach vacuumdb to use them.
Per bug #17717 from Gunnar L. Sadly, this answer doesn't seem
like something we'd consider back-patching, so the performance
problem will remain in v12-v15.
Tom Lane and Nathan Bossart
Discussion: https://postgr.es/m/17717-6c50eb1c7d23a886@postgresql.org
The same code pattern is repeated 17 times for int64 counters (0 for
missing entry) and 5 times for timestamps (NULL for missing entry) on
table entries. This code is switched to use a macro for the basic code
instead, shaving a few hundred lines of originally-duplicated code. The
function names remain the same, but some fields of PgStat_StatTabEntry
have to be renamed to cope with the new style.
Author: Bertrand Drouvot
Reviewed-by: Nathan Bossart
Discussion: https:/postgr.es/m/20221204173207.GA2669116@nathanxps13
Logical WAL senders display now as follows, gaining a database name:
postgres: walsender USER DATABASE HOST(PORT) STATE
Physical WAL senders show up the same, as of:
postgres: walsender USER HOST(PORT) STATE
This information was missing, hence it was not possible to know from ps
if a WAL sender was a logical or a physical one, and on which database
it is connected when it is logical.
Author: Tatsuhiro Nakamori
Reviewed-by: Fujii Masao, Bharath Rupireddy
Discussion: https://postgr.es/m/36a3b137e82e0ea9fe7e4234f03b64a1@oss.nttdata.com
The postmaster normally sends SIGQUIT to force-terminate its
child processes after a child crash or immediate-stop request.
If that doesn't result in child exit within a few seconds,
we follow it up with SIGKILL. This patch provides GUC flags
that allow either of these signals to be replaced with SIGABRT.
On typically-configured Unix systems, that will result in a
core dump being produced for each such child. This can be
useful for debugging problems, although it's not something you'd
want to have on in production due to the risk of disk space
bloat from lots of core files.
The old postmaster -T switch, which sent SIGSTOP in place of
SIGQUIT, is changed to be the same as send_abort_for_crash.
As far as I can tell from the code comments, the intent of
that switch was just to block things for long enough to force
core dumps manually, which seems like an unnecessary extra step.
(Maybe at the time, there was no way to get most kernels to
produce core files with per-PID names, requiring manual core
file renaming after each one. But now it's surely the hard way.)
I also took the opportunity to remove the old postmaster -n
(skip shmem reinit) switch, which hasn't actually done anything
in decades, though the documentation still claimed it did.
Discussion: https://postgr.es/m/2251016.1668797294@sss.pgh.pa.us
Setting archive_library and archive_command at the same time is now an
error. Before, archive_library would take precedence over
archive_command.
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/20220914222736.GA3042279%40nathanxps13
This is similar to 7d25958, and this commit takes care of all the
remaining inconsistencies between the initial value used in the C
variable associated to a GUC and its default value stored in the GUC
tables (as of pg_settings.boot_val).
Some of the initial values of the GUCs updated rely on a compile-time
default. These are refactored so as the GUC table and its C declaration
use the same values. This makes everything consistent with other
places, backend_flush_after, bgwriter_flush_after, port,
checkpoint_flush_after doing so already, for example.
Extracted from a larger patch by Peter Smith. The spots updated in the
modules are from me.
Author: Peter Smith, Michael Paquier
Reviewed-by: Nathan Bossart, Tom Lane, Justin Pryzby
Discussion: https://postgr.es/m/CAHut+PtHE0XSfjjRQ6D4v7+dqzCw=d+1a64ujra4EX8aoc_Z+w@mail.gmail.com