A child table can specify a compression or storage method different
from its parents. This was previously an error. (But this was
inconsistently enforced because for example the settings could be
changed later using ALTER TABLE.) This now also allows an explicit
override if multiple parents have different compression or storage
settings, which was previously an error that could not be overridden.
The compression and storage properties remains unchanged in a child
inheriting from parent(s) after its creation, i.e., when using ALTER
TABLE ... INHERIT. (This is not changed.)
Before this change, the error detail would mention the first pair of
conflicting parent compression or storage methods. But with this
change it waits till the child specification is considered by which
time we may have encountered many such conflicting pairs. Hence the
error detail after this change does not include the conflicting
compression/storage methods. Those can be obtained from parent
definitions if necessary. The code to maintain list of all
conflicting methods or even the first conflicting pair does not seem
worth the convenience it offers. This change is inline with what we
do with conflicting default values.
Before this commit, the specified storage method could be stored in
ColumnDef::storage (CREATE TABLE ... LIKE) or ColumnDef::storage_name
(CREATE TABLE ...). This caused the MergeChildAttribute() and
MergeInheritedAttribute() to ignore a storage method specified in the
child definition since it looked only at ColumnDef::storage. This
commit removes ColumnDef::storage and instead uses
ColumnDef::storage_name to save any storage method specification. This
is similar to how compression method specification is handled.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7@eisentraut.org
Per buildfarm animal mylodon, the plan for this test was sometimes
swapping the join order for tenk1 and tenk2. Given that add_path() has
no code that would cause this fluctuation when given paths with consistent
costs, this indicates that the costs must be fluctuating in some runs.
The only proven reason I've seen where that could happen was slight
variations in pg_class.relpages for some tables. This was demonstrated to
be true by f03a9ca43 and related discussion. Manually adjusting tenk2's
pg_class.relpages by subtracting just 1 page does cause the plan to change
for this test.
Here we've not gone to the same lengths to prove that's what's going on
in this case. Proving that does not seem worth the time. Let's just
shrink one side of the join so the additional cost of the swapped join
order is sufficiently different that if the relpages estimate is off a few
pages that the planner still shouldn't swap the join order.
Reported-by: Thomas Munro
Author: Andy Fan, David Rowley
Discussion: https://postgr.es/m/CA+hUKGLqC-NobKYfjxNM3Gexv9OJ-Fhvy9bugUcXsZjTqH7W=Q@mail.gmail.com
Don't deal with transaction timeout in PostgresMain(). Instead, release
transaction timeout activated by StartTransaction() in
CommitTransaction()/AbortTransaction()/PrepareTransaction(). Deal with both
enabling and disabling transaction timeout in assign_transaction_timeout().
Also, remove potentially flaky timeouts-long isolation test, which has no
guarantees to pass on slow/busy machines.
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20240215230856.pc6k57tqxt7fhldm%40awork3.anarazel.de
This commit adds timeout that is expected to be used as a prevention
of long-running queries. Any session within the transaction will be
terminated after spanning longer than this timeout.
However, this timeout is not applied to prepared transactions.
Only transactions with user connections are affected.
Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
Author: Andrey Borodin <amborodin@acm.org>
Author: Japin Li <japinli@hotmail.com>
Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: bt23nguyent <bt23nguyent@oss.nttdata.com>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
For ANY-SUBLINK, we adopted a two-stage pull-up approach to handle
different types of scenarios. In the first stage, the sublink is pulled up
as a subquery. Because of this, when writing this code, we did not have
the ability to perform lateral joins, and therefore, we were unable to
pull up Var with varlevelsup=1. Now that we have the ability to use
lateral joins, we can eliminate this limitation.
Author: Andy Fan <zhihui.fan1213@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Setting the environment variable PG_TEST_INITDB_EXTRA_OPTS passes
extra options to initdb run by pg_regress or
PostgreSQL::Test::Cluster's init.
This can be useful for a wide variety of uses, like running all tests
with checksums enabled, or with JIT enabled, or with different GUC
settings, or with different locale settings. (Not all tests are going
to pass with arbitrary options, but it is useful to run this against
specific test suites.)
Reviewed-by: Ian Lawrence Barwick <barwick@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/d4d2ad9f-1c1d-47a1-bb4d-c10a747d4f15%40eisentraut.org
pathkeys_useful_for_ordering() contained some needless checks to return
0 when either root->query_pathkeys or pathkeys lists were empty. This is
already handled by pathkeys_count_contained_in(), so let's have it do the
work instead of having redundant checks.
Similarly, in pathkeys_useful_for_grouping(), checking pathkeys is an
empty list just before looping over it isn't required. Technically,
neither is the list empty check for group_pathkeys, but I felt a bit
more work would have to be done to get the equivalent behavior if we'd
left it up to the foreach loop to call list_member_ptr().
This was noticed by Andy while he was reviewing a patch to improve the
UNION planner. Since that patch adds another function similar to
pathkeys_useful_for_ordering() and since I wasn't planning to copy these
redundant checks over to the new function, let's adjust the existing
code so that both functions will be consistent.
Author: Andy Fan
Discussion: https://postgr.es/m/87o7cti48f.fsf@163.com
This is extracted from a larger patch to improve the UNION planner.
While working on that, I found myself having to check what the 'rows'
parameter is for. It's not obvious that passing a negative number is the
way to have the rows estimate calculated and to find that out you need
to read code in create_append_path() and in cost_append().
Discussion: https://postgr.es/m/CAApHDvpb_63XQodmxKUF8vb9M7CxyUyT4sWvEgqeQU-GB7QFoQ@mail.gmail.com
Thanks to commit 3b00fdba9f, this check in the SIGTERM handler for
the startup process is now obsolete and can be removed. Instead of
leaving around the dead function write_stderr_signal_safe(), I've
opted to just remove it for now.
This partially reverts commit 97550c0711.
Reviewed-by: Andres Freund, Noah Misch
Discussion: https://postgr.es/m/20231121212008.GA3742740%40nathanxps13
Presently, we rely on each individual signal handler to save the
initial value of errno and then restore it before returning if
needed. This is easily forgotten and, if missed, often goes
undetected for a long time.
In commit 3b00fdba9f, we introduced a wrapper signal handler
function that checks whether MyProcPid matches getpid(). This
commit moves the aforementioned errno restoration code from the
individual signal handlers to the new wrapper handler so that we no
longer need to worry about missing it.
Reviewed-by: Andres Freund, Noah Misch
Discussion: https://postgr.es/m/20231121212008.GA3742740%40nathanxps13
In commit 97550c0711, we added a similar check to the SIGTERM
handler for the startup process. This commit adds this check to
backend signal handlers installed with pqsignal(). This is done by
using a wrapper function that performs the check before calling the
actual handler.
The hope is that this will offer more general protection against
child processes of Postgres backends inadvertently modifying shared
memory due to inherited signal handlers. Another potential
follow-up improvement is to use this wrapper handler function to
restore errno instead of relying on each individual handler
function to do so.
This commit makes the changes in commit 97550c0711 obsolete but
leaves reverting it for a follow-up commit.
Reviewed-by: Andres Freund, Noah Misch
Discussion: https://postgr.es/m/20231121212008.GA3742740%40nathanxps13
For most purposes, multiranges act like dependent objects of the
associated range type: you can't create them separately or drop them
separately. This is like the way that autogenerated array types
behave. However, a couple of points were overlooked: array types
automatically track the ownership of their base type, and array types
do not have their own permissions but use those of the base type,
while multiranges didn't emulate those behaviors. This is fairly
broken, mainly because pg_dump doesn't think it needs to worry about
multiranges as separate objects, and thus it fails to dump/restore
ownership or permissions of multiranges.
There's no apparent value in letting a multirange diverge from
its parent's ownership or permissions, so let's make them act like
arrays in these respects. However, we continue to let multiranges
be renamed or moved to a different schema independently of their
parent, since that doesn't break anything.
Discussion: https://postgr.es/m/1580383.1705343264@sss.pgh.pa.us
The failure is that the remote slot is not synchronized after the same
slot on standby gets invalidated. The reason was that remote_slot's
restart_lsn was lagged behind the standby's oldest WAL segment. The test
didn't ensure that remote_slot's LSN was advanced to the latest position
before we tried to sync the slots via new the function
pg_sync_replication_slots().
Discussion: https://postgr.es/m/CAA4eK1JLBi3HzenB6do3_hd78kN0UDD1mz-vumWE52XHHEq5Bw@mail.gmail.com
This commit introduces a new SQL function pg_sync_replication_slots()
which is used to synchronize the logical replication slots from the
primary server to the physical standby so that logical replication can be
resumed after a failover or planned switchover.
A new 'synced' flag is introduced in pg_replication_slots view, indicating
whether the slot has been synchronized from the primary server. On a
standby, synced slots cannot be dropped or consumed, and any attempt to
perform logical decoding on them will result in an error.
The logical replication slots on the primary can be synchronized to the
hot standby by using the 'failover' parameter of
pg-create-logical-replication-slot(), or by using the 'failover' option of
CREATE SUBSCRIPTION during slot creation, and then calling
pg_sync_replication_slots() on standby. For the synchronization to work,
it is mandatory to have a physical replication slot between the primary
and the standby aka 'primary_slot_name' should be configured on the
standby, and 'hot_standby_feedback' must be enabled on the standby. It is
also necessary to specify a valid 'dbname' in the 'primary_conninfo'.
If a logical slot is invalidated on the primary, then that slot on the
standby is also invalidated.
If a logical slot on the primary is valid but is invalidated on the
standby, then that slot is dropped but will be recreated on the standby in
the next pg_sync_replication_slots() call provided the slot still exists
on the primary server. It is okay to recreate such slots as long as these
are not consumable on standby (which is the case currently). This
situation may occur due to the following reasons:
- The 'max_slot_wal_keep_size' on the standby is insufficient to retain
WAL records from the restart_lsn of the slot.
- 'primary_slot_name' is temporarily reset to null and the physical slot
is removed.
The slot synchronization status on the standby can be monitored using the
'synced' column of pg_replication_slots view.
A functionality to automatically synchronize slots by a background worker
and allow logical walsenders to wait for the physical will be done in
subsequent commits.
Author: Hou Zhijie, Shveta Malik, Ajin Cherian based on an earlier version by Peter Eisentraut
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
This reverts commit 95fb5b4902, for reasons similar to what led to
1aa8324b81. In this case, the callback was called once per row, which
is less worse than the previous callback introduced for COPY TO called
once per argument for each row, still the patch set discussed to plug in
custom routines to the COPY paths would be able to know which subroutine
to use depending on its CopyFromState, so this led to a suboptimal
approach at the end.
For now, this part is reverted to consider better which approach to use.
Discussion: https://postgr.es/m/20240206014125.qofww7ew3dx3v3uk@awork3.anarazel.de
The comment was inexact because query_id_enabled will not be switched to
"true" even if compute_query_id is "on", unless a module requests for
it.
While on it, this adds a comment to mention that IsQueryIdEnabled()
should be used to check if query ID computation is enabled or not.
Author: Yugo Nagata
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/20240209153823.e29a68cadb14225f1362a2cf@sraoss.co.jp
Previously, an interval microseconds field close to INT64_MAX or
INT64_MIN could overflow, producing a result with not even the
correct sign, while being rounded to match a precision specification.
This seems worth fixing, but not worth back-patching, in part
because the ereturn() notation doesn't exist very far back.
Report and patch by Joseph Koshakow (some cosmetic mods by me)
Discussion: https://postgr.es/m/CAAvxfHfpuLgqJYzkUcher466Z1LpmE+5Sm+zc8L6zKCOQ+6TDQ@mail.gmail.com
For a long time, our outfuncs.c code has supposed that the string
contents of a BitString node could just be printed literally with
no concern for quoting/escaping. Now, that's okay if the string
literal contains only valid binary or hex digits ... but our lexer
doesn't check that, preferring to let bitin() be the sole authority
on what's valid. So we could have raw parse trees that contain
incorrect BitString literals, and that can result in failures when
WRITE_READ_PARSE_PLAN_TREES debugging is enabled.
Fix by using outToken() to print the string field, and debackslash()
to read it. This results in a change in the emitted representation
only in cases that would have failed before, and don't represent valid
SQL in the first place. Between that and the fact that we don't store
raw parse trees in the catalogs, I judge this safe to apply without a
catversion bump.
Per bug #18340 from Alexander Lakhin. Back-patch to v16; before that,
we lacked readfuncs support for BitString nodes, so that the problem
was only cosmetic.
Discussion: https://postgr.es/m/18340-4aa1ae6ed4121912@postgresql.org
The macOS Finder application creates .DS_Store files in directories
when opened, which creates problems for serverside utilities which
expect all files to be PostgreSQL specific files. Skip these files
when encountered in pg_checksums, pg_rewind and pg_basebackup.
This was extracted from a larger patchset for skipping hidden files
and system files, where the concencus was to just skip these. Since
this is equally likely to happen in every version, backpatch to all
supported versions.
Reported-by: Mark Guertin <markguertin@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Tobias Bussmann <t.bussmann@gmx.net>
Discussion: https://postgr.es/m/E258CE50-AB0E-455D-8AAD-BB4FE8F882FB@gmail.com
Backpatch-through: v12
If available, read directly from WAL buffers, avoiding the need to go
through the filesystem. Only for physical replication for now, but can
be expanded to other callers.
In preparation for replicating unflushed WAL data.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com
Reviewed-by: Andres Freund, Alvaro Herrera, Nathan Bossart, Dilip Kumar, Nitin Jadhav, Melih Mutlu, Kyotaro Horiguchi
Doing this instead of regular updates serves two purposes. First, that avoids
possible waiting on the row-level lock. Second, that avoids dealing with
TOAST.
It's known that changes made by heap_inplace_update() may be lost due to
concurrent normal updates. However, we are OK with that. The subsequent
connections will still have a chance to set "dathasloginevt" to false.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/e2a0248e-5f32-af0c-9832-a90d303c2c61%40gmail.com
Commit 5579388d removed code that supplied a fallback implementation of
getaddrinfo(), which was dead code on modern systems. One tiny piece of
the removed code was still doing something useful on Windows, though:
that OS's own gai_strerror()/gai_strerrorA() function returns a pointer
to a static buffer that it overwrites each time, so it's not
thread-safe. In rare circumstances, a multi-threaded client program
could get an incorrect or corrupted error message.
Restore the replacement gai_strerror() function, though now that it's
only for Windows we can put it into a win32-specific file and cut it
down to the errors that Windows documents. The error messages here are
taken from FreeBSD, because Windows' own messages seemed too verbose.
Back-patch to 16.
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKz%2BF9d2PTiXwfYV7qJw%2BWg2jzACgSDgPizUw7UG%3Di58A%40mail.gmail.com
This makes use of StringInfo to assemble command lines, instead of
using fixed-size buffers and the (remote) possibility of "command too
long" errors. Also makes the code a bit simpler.
This covers the test driver programs pg_regress and
pg_isolation_regress.
Similar to the changes done for pg_rewind in a33e17f210.
Discussion: https://www.postgresql.org/message-id/2be4fee5-738f-4749-b9f8-b452032c7ade%40eisentraut.org
Timezones are not immutable and so neither is any function that relies on
them. In commit 66ea94e8, we introduced a few methods which do casting
from one time to another and thus may involve the current timezone. To
preserve the immutability of jsonpath functions currently marked
immutable, disallow these methods from being called from non-TZ aware
functions.
Jeevan Chalke, per a report from Jian He.
Since its introduction, pg_get_expr() has intended to silently
return NULL if called with an invalid relation OID, as can happen
when scanning the catalogs concurrently with relation drops.
However, there is a race condition: we check validity of the OID
at the start, but it could get dropped just afterward, leading to
failures. This is the cause of some intermittent instability we're
seeing in a proposed new test case, and presumably it's a hazard in
the field as well.
We can fix this by AccessShareLock-ing the target relation for the
duration of pg_get_expr(). Since we don't require any permissions
on the target relation, this is semantically a bit undesirable. But
it turns out that the set_relation_column_names() subroutine already
takes a transient AccessShareLock on that relation, and has done since
commit 2ffa740be in 2012. Given the lack of complaints about that, it
seems like there should be no harm in holding the lock a bit longer.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/31ddcc01-a71b-4e8c-9948-01d1c47293ca@eisentraut.org
We previously supposed that it was okay for different threads to
call bindtextdomain() concurrently (cf. commit 1f655fdc3).
It now emerges that there's at least one gettext implementation
in which that triggers an abort() crash, so let's stop doing that.
Add mutexes guarding libpq's and ecpglib's calls, which are the
only ones that need worry about multithreaded callers.
Note: in libpq, we could perhaps have piggybacked on
default_threadlock() to avoid defining a new mutex variable.
I judge that not terribly safe though, since libpq_gettext could
be called from code that is holding the default mutex. If that
were the first such call in the process, it'd fail. An extra
mutex is cheap insurance against unforeseen interactions.
Per bug #18312 from Christian Maurer. Back-patch to all
supported versions.
Discussion: https://postgr.es/m/18312-bbbabc8113592b78@postgresql.org
Discussion: https://postgr.es/m/264860.1707163416@sss.pgh.pa.us
Fix pthread-win32.h and pthread-win32.c to provide a more complete
emulation of POSIX pthread mutexes: define PTHREAD_MUTEX_INITIALIZER
and make sure that pthread_mutex_lock() can operate on a mutex
object that's been initialized that way. Then we don't need the
duplicative platform-specific logic in default_threadlock() and
pgtls_init(), which we'd otherwise need yet a third copy of for
an upcoming bug fix.
Also, since default_threadlock() supposes that pthread_mutex_lock()
cannot fail, try to ensure that that's actually true, by getting
rid of the malloc call that was formerly involved in initializing
an emulated mutex. We can define an extra state for the spinlock
field instead.
Also, replace the similar code in ecpglib/misc.c with this version.
While ecpglib's version at least had a POSIX-compliant API, it
also had the potential of failing during mutex init (but here,
because of CreateMutex failure rather than malloc failure). Since
all of misc.c's callers ignore failures, it seems like a wise idea
to avoid failures here too.
A further improvement in this area could be to unify libpq's and
ecpglib's implementations into a src/port/pthread-win32.c file.
But that doesn't seem like a bug fix, so I'll desist for now.
In preparation for the aforementioned bug fix, back-patch to all
supported branches.
Discussion: https://postgr.es/m/264860.1707163416@sss.pgh.pa.us
Commit 5b2f4afffe refactored find_other_exec() and in the process
created pipe_read_line() into a static routine for reading a single
line of output, aimed at reading version numbers. Commit a7e8ece41
later exposed it externally in order to read a postgresql.conf GUC
using "postgres -C ..". Further, f06b1c598 also made use of it for
reading a version string much like find_other_exec(). The internal
variable remained "pgver", even when used for other purposes.
Since the function requires passing a buffer and its size, and at
most size - 1 bytes will be read via fgets(), there is a truncation
risk when using this for reading GUCs (like how pg_rewind does,
though the risk in this case is marginal).
To keep this as generic functionality for reading a line from a pipe,
this refactors pipe_read_line() into returning an allocated buffer
containing all of the line to remove the risk of silent truncation.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/DEDF73CE-D528-49A3-9089-B3592FD671A9@yesql.se
group_keys_reorder_by_pathkeys() function searched for matching pathkeys
within root->group_pathkeys. That could lead to picking an aggregate pathkey
and using its pathkey->pk_eclass->ec_sortref as an argument of
get_sortgroupref_clause_noerr(). Given that ec_sortref of an aggregate pathkey
references aggregate targetlist not query targetlist, this leads to incorrect
query optimization.
Fix this by looking for matching pathkeys only within the first
num_groupby_pathkeys pathkeys.
Reported-by: David G. Johnston
Discussion: https://postgr.es/m/CAKFQuwY3Ek%3DcLThgd8FdaSc5JRDVt0FaV00gMcWra%2BTAR4gGUw%40mail.gmail.com
Author: Andrei Lepikhov, Alexander Korotkov
Fix for 344d62fb9a9: That commit introduced unlogged sequences and
made it so that identity/serial sequences automatically get the
persistence level of their owning table. But this works only for
CREATE TABLE and not for ALTER TABLE / ADD COLUMN. The latter would
always create the sequence as logged (default), independent of the
persistence setting of the table. This is fixed here.
Note: It is allowed to change the persistence of identity sequences
directly using ALTER SEQUENCE. So mistakes in existing databases can
be fixed manually.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/c4b6e2ed-bcdf-4ea7-965f-e49761094827%40eisentraut.org
This commit fixes an oversight introduced in c61a2f5841, where COPY TO
would attempt to do encoding conversions even if the encodings of the
client and the server matched for multi-byte encodings. All conversions
go through pg_any_to_server() that makes the conversion a no-op when the
encodings of the client and the server match, even for multi-byte
encodings.
The logic was fine, but setting CopyToStateData->need_transcoding would
cause strlen() to be called for nothing for each attribute of all the
rows copied, and that was showing high in some profiles (more attributes
make that easier to reach). This change improves the runtime of some
worst-case COPY TO queries by 15%~ (number present at least here).
This is a performance improvement, so no backpatch is done out of
caution as this is not a regression.
Reported-by: Andres Freund
Analyzed-by: Andres Freund
Author: Michael Paquier
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/20240206020504.edijzczkgd25ek6z@awork3.anarazel.de
The TransactionIdInRecentPast() should return false for all the transactions
older than TransamVariables->oldestClogXid. However, the function contains
a bug in comparison FullTransactionId to TransactionID allowing full
transactions between nextXid - 2^32 and oldestClogXid - 2^31.
This commit fixes TransactionIdInRecentPast() by turning the oldestClogXid into
FullTransactionId first, then performing the comparison.
Backpatch to all supported versions.
Reported-by: Egor Chindyaskin
Bug: 18212
Discussion: https://postgr.es/m/18212-547307f8adf57262%40postgresql.org
Author: Karina Litskevich
Reviewed-by: Kyotaro Horiguchi
Backpatch-through: 12
Various int variables were compared to macros that are of type size_t,
which caused -Wsign-compare warnings in cpluspluscheck. Change those
to size_t, which also better describes their purpose.
Per report from Peter Eisentraut
Discussion: https://postgr.es/m/486847dc-6de5-464a-938e-bac98ec2438b%40eisentraut.org
Commit a4fd3aa719 moved setup_cancel_handler out of psql and
exporeted it as a global function. While pg_dump isn't using
the header it's exported in, having a conflicting name still
risks causing confusion when grepping the code for callsites,
so rename the static function in pg_dump to avoid this.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/20240126094245.cf6718cc659273765f3ab69a@sraoss.co.jp
A comment in grouping_planner() claimed that the PlannerInfo
upper_targets array was not used in core code. However, the code that
generated the paths for the UPPERREL_PARTIAL_DISTINCT rel made that
comment untrue.
Here we adjust the create_distinct_paths() function signature to pass
down the PathTarget the same as is done for create_grouping_paths(),
thus making the aforementioned comment true again.
In passing adjust the order of the upper_targets[] assignments. These
seem to be following the reverse enum order apart from
UPPERREL_PARTIAL_DISTINCT.
Also, update the header comment for generate_gather_paths() to mention
the function is also used to create gather paths for partial distinct
paths.
Author: Richard Guo, David Rowley
Discussion: https://postgr.es/m/CAMbWs48u9VoVOouJsys1qOaC9WVGVmBa+wT1dx8KvxF5GPzezA@mail.gmail.com
Commit 861f86beea used REGBUF_NO_CHANGE at one of the places in the hash
index to register the clean buffers but forgot to avoid setting LSN in
that case.
Reported-by: Michael Paquier
Author: Kuroda Hayato
Reviewed-by: Amit Kapila, Michael Paquier
Discussion: https://postgr.es/m/ZbyVVG_7eW3YD5-A@paquier.xyz
Following are a few clean-ups related to failover option support in slots:
1. Improve the documentation in create_subscription.sgml.
2. Remove the spurious blank line in subscriptioncmds.c.
3. Remove the NOTICE for alter_replication_slot in subscriptioncmds.c as
we would sometimes print it even when nothing has changed. One can find
the change by enabling log_replication_commands on the publisher.
4. Optimize ReplicationSlotAlter() function to prevent disk flushing when
the slot's data remains unchanged.
Author: Hou Zhijie
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
Discussion: https://postgr.es/m/OS0PR01MB57164904651FB588A518E98894472@OS0PR01MB5716.jpnprd01.prod.outlook.com
This reverts commit 2889fd23be, following a discussion with Andres
Freund as this callback, being called once per attribute when sending a
relation's row, can involve a lot of indirect function calls (more
attributes to deal with means more impact). The effects of a dispatch
at this level would become more visible when improving the per-row code
execution of COPY TO, impacting future potential performance
improvements.
Discussion: https://postgr.es/m/20240206014125.qofww7ew3dx3v3uk@awork3.anarazel.de
The new concurrency model proposed for slru.c to improve performance
does not include any single lock that would coordinate processes
doing concurrent reads/writes on SlruShared->latest_page_number.
We can instead use atomic reads and writes for that variable.
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
The standalone functions fasthash{32,64} use length for two purposes:
how many bytes to hash, and how to perturb the internal seed.
Developers using the incremental interface may not know the length
ahead of time (e.g. for C strings). In this case, it's advised to
pass length to the finalizer, but initialization still needed some
length up front, in the form of a placeholder macro.
Separate the concerns by having the standalone functions perturb the
internal seed themselves from their own length parameter, allowing
to remove "len" from fasthash_init(), as well as the placeholder macro.
Discussion: https://postgr.es/m/CANWCAZbTUk2LOyhsFo33gjLyLAHZ7ucXCi5K9u%3D%2BPtnTShDKtw%40mail.gmail.com
When assertions are disabled, the built SQL statement is invalid and
you get a "syntax error". So this isn't a serious problem, but let's
avoid the assertion failure.
Backpatch to all supported versions.
Reviewed-by: Noah Misch
The internal commands in REFRESH MATERIALIZED VIEW CONCURRENTLY are
correctly executed in SECURITY_RESTRICTED_OPERATION mode, except for
creating the temporary "diff" table, because you cannot create
temporary tables in SRO mode. But creating the temporary "diff" table
is a pretty complex CTAS command that selects from another temporary
table created earlier in the command. If you can cajole that CTAS
command to execute code defined by the table owner, the table owner
can run code with the privileges of the user running the REFRESH
command.
The proof-of-concept reported to the security team relied on CREATE
RULE to convert the internally-built temp table to a view. That's not
possible since commit b23cd185fd, and I was not able to find a
different way to turn the SELECT on the temp table into code
execution, so as far as I know this is only exploitable in v15 and
below. That's a fiddly assumption though, so apply this patch to
master and all stable versions.
Thanks to Pedro Gallegos for the report.
Security: CVE-2023-5869
Reviewed-by: Noah Misch
This patch provides support for regular (non-replication) connections in
libpqrcv_connect(). This can be used to execute SQL statements on the
primary server without starting a walsender.
A new API libpqrcv_get_dbname_from_conninfo() is also added to extract the
database name from the given connection-info.
Note that this patch doesn't change any existing functionality but later
patches implementing the slot synchronization will use this functionality
to connect to the primary server to fetch required slot information.
Author: Shveta Malik, Hou Zhijie, Ajin Cherian
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
These routines are used by the text and CSV formats to send an output
representation of a string, applying quotes if required. This is
similar to 95fb5b4902, reducing the number of "if" branches that need
to be checked on a per-row basis when sending representation of fields
in text or CSV mode.
While on it, this simplifies the signature of CopyAttributeOutCSV() as
it is possible to know that an attribute is alone on a line thanks to
CopyToState. Headers should not use quotes, even if forced at query
level.
Extracted from a larger patch by the same author.
Author: Sutou Kouhei
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
CopyReadAttributes{CSV,Text}() are used to parse lines for text and CSV
format. This reduces the number of "if" branches that need to be
checked when parsing fields in CSV and text mode when dealing with a
COPY FROM, something that can become more noticeable with more
attributes and more lines to process.
Extracted from a larger patch by the same author.
Author: Sutou Kouhei
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
MINIMUM_VERSION_FOR_WAL_SUMMARIES is used to check if the directory
pg_wal/summaries/ should be created in a base backup, but its condition
was reversed: pg_wal/summaries/ would be created when taking base
backups from backends of v16 and older versions, but it should be
created in base backups taken from backends of v17 and newer versions.
Author: Artur Zakirov
Reviewed-by: Yugo Nagata, Nazir Bilal Yavuz
Discussion: https://postgr.es/m/CAKNkYnzkkQ0gb_ZmLTY0r2-qV1q6imXgcCWxdA6UoA6yJkujGg@mail.gmail.com
A ResourceOwnerEnlarge() call was missing. That led to an error:
ERROR: ResourceOwnerRemember called but array was full
and an assertion failure, if you tried to extend a temp relation again
after a failure. Alexander's test case used running out of disk space
to trigger the original failure.
This bug was introduced in the large ResourceOwner rewrite commit
b8bff07daa. Before that, the UnpinLocalBuffer() call guaranteed that
the subsequent PinLocalBuffer() will succeed, but after the rewrite,
releasing an old resource doesn't guarantee that there is space for a
new one.
Add a comment explaining why the UnpinBuffer + PinBuffer calls in
BufferAlloc(), with no ResourceOwnerEnlarge() in between, are safe.
Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/dc574fea-c83e-a600-08cd-10881762e4fa@gmail.com
Here we adjust the partial path generation for parallel DISTINCT queries
to add Sort nodes on top of any unsorted partial distinct paths.
This increases the likelihood of the planner pushing a Sort below a Gather
Merge which enables the final phase of the parallel distinct to be
implemented using a Unique node in more cases.
Sorting the partial distinct paths is particularly useful when the
DISTINCT query has an ORDER BY and LIMIT clause as this can allow cheaper
plans by having the workers Hash Aggregate then Sort before feeding the
results into the Gather Merge. The non-parallel portion of the plan then
becomes very cheap as it leaves only Unique and Limit to do in the leader
process.
Author: Richard Guo
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/CAMbWs48u9VoVOouJsys1qOaC9WVGVmBa+wT1dx8KvxF5GPzezA@mail.gmail.com
An OS crash could leave PG_VERSION empty or missing. The same symptom
appeared in a backup by block device snapshot, taken after the next
checkpoint and before the OS flushes the PG_VERSION blocks. Device
snapshots are not a documented backup method, however. Back-patch to
v15, where commit 9c08aea6a3 introduced
STRATEGY=WAL_LOG and made it the default.
Discussion: https://postgr.es/m/20240130195003.0a.nmisch@google.com
Restoring a base backup taken in the middle of CreateDirAndVersionFile()
or write_relmap_file() would lose the function's effects. The symptom
was absence of the database directory, PG_VERSION file, or
pg_filenode.map. If missing the directory, recovery would fail. Either
missing file would not fail recovery but would render the new database
unusable. Fix CreateDirAndVersionFile() with the transam/README "action
first and then write a WAL entry" strategy. That has a side benefit of
moving filesystem mutations out of a critical section, reducing the ways
to PANIC. Fix the write_relmap_file() call with a lock acquisition, so
it interacts with checkpoints like non-CREATE DATABASE calls do.
Back-patch to v15, where commit 9c08aea6a3
introduced STRATEGY=WAL_LOG and made it the default.
Discussion: https://postgr.es/m/20240130195003.0a.nmisch@google.com
DST law changes in Ittoqqortoormiit, Greenland (America/Scoresbysund),
Kazakhstan (Asia/Almaty and Asia/Qostanay) and Palestine; as well as
updates for the Antarctic stations Casey and Vostok.
Historical corrections for Vietnam, Toronto, and Miquelon.
Further fallout from commit 6ee26c6a4b. To keep code in sync and avoid
issues on older releases with different package names, simply use the
unqualified name like many other places in our code.
The latest buildfarm failures show that after the insert, we don't
actually wait long enough for WAL summarization to catch up, apparently
because the on disk state gets updated before the in-memory state, and
so by checking the on disk state to see whether we're caught up and then
the in-memory state to see where exactly how far we've progressed, we
can, if unlucky, derive an older value of summarized_lsn, messing up
the rest of the test.
Attempt to fix this by using pg_available_wal_summaries() everywhere in
the test and pg_get_wal_summarizer_state() nowhere.
Per buildfarm.
This impacts the statistics retrieved in transactions for the following
views when updating the value of stats_fetch_consistency, leading to
behaviors contrary to what is documented since 605994651b as an update
of this parameter should discard all statistics snapshot data:
- pg_stat_archiver
- pg_stat_bgwriter
- pg_stat_checkpointer
- pg_stat_io
- pg_stat_slru
- pg_stat_wal
For example, updating stats_fetch_consistency from "snapshot" to "cache"
in a transaction did not re-fetch any fresh data, using data cached from
the time when "snapshot" was in use.
Author: Shinya Kato
Discussion: https://postgr.es/m/d77fc5190d4dbe1738d77231488e768b@oss.nttdata.com
Backpatch-through: 15
When building libpq there is a check to ensure that we're not
linking against code that calls exit(). This check is using a
heuristic grep with exclusions for known false positives. The
Threadsanitizer library instrumentation for function exits is
named such that it triggers the check, so add an exclusion.
This fix is only applied to the Makefile since the meson build
files don't yet have this check. Adding the check to meson is
outside the scope of this patch though.
Reported-by: Roman Lozko <lozko.roma@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEhC_BmNGKgj2wKArH2EAU11BsaHYgLnrRFJGRm5Vs8WJzyiQA@mail.gmail.com
When building a MergeAppendPath which has child paths that are not
sorted correctly for the MergeAppend's sort order, we apply the cost of
sorting those paths to the MergeAppendPath costs.
Here we fix a bug where the number of tuples specified that needed to be
sorted was effectively pg_class.reltuples rather than the number of
expected row in the subpath. This effectively penalizes MergeAppend
plans any time any filter is present on the MergeAppend subpath as the
sort cost added is to sort all tuples in the table rather than just the
ones expected the path to return.
This did not affect UNION ALL type queries as the RelOptInfo tuples is
set from the subquery's path rows. It does affect MergeAppends uses for
inheritance and partitioned tables.
This is a long-standing bug introduced when MergeAppend was first added
in 11cad29c9. No backpatch as this could result in plan changes.
Author: Alexander Kuzmenkov
Reviewed-by: Ashutosh Bapat, Aleksander Alekseev, David Rowley
Discussion: https://postgr.es/m/CALzhyqyhoXQDR-Usd_0HeWk%3DuqNLzoVeT8KhRoo%3DpV_KzgO3QQ%40mail.gmail.com
Make the new tuple larger than the old one so that it, hopefully, won't
manage to squeeze into leftover freespace on the same page. The test
is trying to verify that the UPDATE touches 2 pages, but if a HOT
update happens, then it doesn't.
Per buildfarm.
Analysis of buildfarm results showed that the code that was intended
to wait for the inserts performed by this test to complete did not
actually do so. Try to make that logic more robust.
Improve error checking elsewhere in the script, too, so that we
don't miss things like poll_query_until failing.
Along the way, fix a bit of pgindent damage introduced by commit
5ddf997347, which aimed to help us
debug the failures that this commit is trying to fix. It's making
the buildfarm sad.
Discussion: http://postgr.es/m/CA+TgmobWFb8NqyfC31YnKAbZiXf9tLuwmyuvx=iYMXMniPQ4nw@mail.gmail.com
After calling smgropen(), it was not clear how long you could continue
to use the result, because various code paths including cache
invalidation could call smgrclose(), which freed the memory.
Guarantee that the object won't be destroyed until the end of the
current transaction, or in recovery, the commit/abort record that
destroys the underlying storage.
smgrclose() is now just an alias for smgrrelease(). It closes files
and forgets all state except the rlocator, but keeps the SMgrRelation
object valid.
A new smgrdestroy() function is used by rare places that know there
should be no other references to the SMgrRelation.
The short version:
* smgrclose() is now just an alias for smgrrelease(). It releases
resources, but doesn't destroy until EOX
* smgrdestroy() now frees memory, and should rarely be used.
Existing code should be unaffected, but it is now possible for code that
has an SMgrRelation object to use it repeatedly during a transaction as
long as the storage hasn't been physically dropped. Such code would
normally hold a lock on the relation.
This also replaces the "ownership" mechanism of SMgrRelations with a
pin counter. An SMgrRelation can now be "pinned", which prevents it
from being destroyed at end of transaction. There can be multiple pins
on the same SMgrRelation. In practice, the pin mechanism is only used
by the relcache, so there cannot be more than one pin on the same
SMgrRelation. Except with swap_relation_files XXX
Author: Thomas Munro, Heikki Linnakangas
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJ8NTvqLHz6dqbQnt2c8XCki4r2QvXjBQcXpVwxTY_pvA@mail.gmail.com
Before the advent of PROCSIGNAL_BARRIER_SMGRRELEASE, we didn't have a
comprehensive way to deal with Windows file handles that get in the way
of unlinking directories. We had smgrcloseall() calls in a few places
to try to mitigate.
It's still a good idea for bgwriter and checkpointer to do that once per
checkpoint so they don't accumulate unbounded SMgrRelation objects, but
there is no longer any reason to close them at other random places such
as the error path, and the explanation as given in the comments is now
obsolete.
Author: Thomas Munro
Reviewed-by: Heikki Linnakangas, Robert Haas
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJ8NTvqLHz6dqbQnt2c8XCki4r2QvXjBQcXpVwxTY_pvA@mail.gmail.com
Similar to what was done in 5543677ec for non-parallel DISTINCT, apply
the same optimization when the distinct_pathkeys are empty for the
partial paths too.
This can be faster than the non-parallel version when the first row
matching the WHERE clause of the query takes a while to find. Parallel
workers could speed that process up considerably.
Author: Richard Guo
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/CAMbWs49JC0qvfUbzs-TVzgMpSSBiMJ_6sN=BaA9iohBgYkr=LA@mail.gmail.com
This commit addresses a set of issues when changing token type mappings
in a text search configuration when using duplicated token names:
- ADD MAPPING would fail on insertion because of a constraint failure
after inserting the same mapping.
- ALTER MAPPING with an "overridden" configuration failed with "tuple
already updated by self" when the token mappings are removed.
- DROP MAPPING failed with "tuple already updated by self", like
previously, but in a different code path.
The code is refactored so the token names (with their numbers) are
handled as a List with unique members rather than an array with numbers,
ensuring that no duplicates mess up with the catalog inserts, updates
and deletes. The list is generated by getTokenTypes(), with the same
error handling as previously while duplicated tokens are discarded from
the list used to work on the catalogs.
Regression tests are expanded to cover much more ground for the cases
fixed by this commit, as there was no coverage for the code touched in
this commit. A bit more is done regarding the fact that a token name
not supported by a configuration's parser should result in an error even
if IF EXISTS is used in a DROP MAPPING clause. This is implied in the
code but there was no coverage for that, and it was very easy to miss.
These issues exist since at least their introduction in core with
140d4ebcb4, so backpatch all the way down.
Reported-by: Alexander Lakhin
Author: Tender Wang, Michael Paquier
Discussion: https://postgr.es/m/18310-1eb233c5908189c8@postgresql.org
Backpatch-through: 12
File::Find converts backslashes to slashes in the newer Perl versions.
See: 414f14df98
So, do the same conversion for Windows before comparing paths. To
support all Perl versions, always convert them on Windows regardless of
the Perl's version.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Backpatch to all live branches
Here we consolidate the generation of partial sort and partial incremental
sort paths in a similar way to what was done in 4a29eabd1. Since the cost
penalty for incremental sort was removed by that commit, there's no
point in creating a sort path on the cheapest partial path if an
incremental sort could be done instead.
This has the added benefit of reducing the amount of code required to
build these paths.
Author: Richard Guo
Reviewed-by: Etsuro Fujita, Shubham Khanna, David Rowley
Discussion: https://postgr.es/m/CAMbWs49PaKxBZU9cN7k3DKB7id+YfGfOfS9H_Fo5tkqPMt=fDg@mail.gmail.com
predicate.c has been using SerialSLRULock (the control lock for its SLRU
structure) to coordinate access to SerialControlData, another of its
numerous shared memory structures; this is unnecessary and confuses
further SLRU scalability work. Create a separate LWLock to cover
SerialControlData.
Extracted from a larger patch from the same author, and some additional
changes by Álvaro.
Author: Dilip Kumar <dilip.kumar@enterprisedb.com>
Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
This commit introduces a new subscription option named 'failover', which
provides users with the ability to set the failover property of the
replication slot on the publisher when creating or altering a
subscription.
This uses the replication commands introduced by commit 7329240437 to
enable the failover option for a logical replication slot.
If the failover option is set to true, the associated replication slots
(i.e. the main slot and the table sync slots) in the upstream database are
enabled to be synchronized to the standbys. Note that the capability to
sync the replication slots will be added in subsequent commits.
Thanks to Masahiko Sawada for the design inputs.
Author: Shveta Malik, Hou Zhijie, Ajin Cherian
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com