Commit Graph

22843 Commits

Author SHA1 Message Date
Amit Kapila 208c5d65bb Add ALTER SUBSCRIPTION ... SKIP.
This feature allows skipping the transaction on subscriber nodes.

If incoming change violates any constraint, logical replication stops
until it's resolved. Currently, users need to either manually resolve the
conflict by updating a subscriber-side database or by using function
pg_replication_origin_advance() to skip the conflicting transaction. This
commit introduces a simpler way to skip the conflicting transactions.

The user can specify LSN by ALTER SUBSCRIPTION ... SKIP (lsn = XXX),
which allows the apply worker to skip the transaction finished at
specified LSN. The apply worker skips all data modification changes within
the transaction.

Author: Masahiko Sawada
Reviewed-by: Takamichi Osumi, Hou Zhijie, Peter Eisentraut, Amit Kapila, Shi Yu, Vignesh C, Greg Nancarrow, Haiying Tang, Euler Taveira
Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2022-03-22 07:11:19 +05:30
Andres Freund 315ae75e9b pgstat: reorder pgstat.[ch] contents.
Now that 13619598f1 has split pgstat up into multiple files it isn't quite as
hard to come up with a sensible order for pgstat.[ch]. Inconsistent naming
makes it still not quite right looking, but that's work for another commit.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-03-21 16:21:00 -07:00
Tom Lane 2591ee8ec4 Fix assorted missing logic for GroupingFunc nodes.
The planner needs to treat GroupingFunc like Aggref for many purposes,
in particular with respect to processing of the argument expressions,
which are not to be evaluated at runtime.  A few places hadn't gotten
that memo, notably including subselect.c's processing of outer-level
aggregates.  This resulted in assertion failures or wrong plans for
cases in which a GROUPING() construct references an outer aggregation
level.

Also fix missing special cases for GroupingFunc in cost_qual_eval
(resulting in wrong cost estimates for GROUPING(), although it's
not clear that that would affect plan shapes in practice) and in
ruleutils.c (resulting in excess parentheses in pretty-print mode).

Per bug #17088 from Yaoguang Chen.  Back-patch to all supported
branches.

Richard Guo, Tom Lane

Discussion: https://postgr.es/m/17088-e33882b387de7f5c@postgresql.org
2022-03-21 17:44:29 -04:00
Andres Freund 13619598f1 pgstat: split different types of stats into separate files.
pgstat.c is very long, and it's hard to find an order that makes sense and is
likely to be maintained over time. Splitting the different pieces into
separate files makes that a lot easier.

With a few exceptions, this commit just moves code around. Those exceptions
are:
- adding file headers for new files
- removing 'static' from functions
- adapting pgstat_assert_is_up() to work across TUs
- minor comment adjustments
git diff --color-moved=dimmed-zebra is very helpful separating code movement
from code changes.

The next commit in this series will reorder pgstat.[ch] contents to be a bit
more coherent.

Earlier revisions of this patch had "global" statistics (archiver, bgwriter,
checkpointer, replication slots, SLRU, WAL) in one file, because each seemed
small enough. However later commits will increase their size and their
aggregate size is not insubstantial. It also just seems easier to split each
type of statistic into its own file.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-03-21 12:02:25 -07:00
Tom Lane cb02fcb4c9 Fix bogus dependency handling for GENERATED expressions.
For GENERATED columns, we record all dependencies of the generation
expression as AUTO dependencies of the column itself.  This means
that the generated column is silently dropped if any dependency
is removed, even if CASCADE wasn't specified.  This is at least
a POLA violation, but I think it's actually based on a misreading
of the standard.  The standard does say that you can't drop a
dependent GENERATED column in RESTRICT mode; but that's buried down
in a subparagraph, on a different page from some pseudocode that
makes it look like an AUTO drop is being suggested.

Change this to be more like the way that we handle regular default
expressions, ie record the dependencies as NORMAL dependencies of
the pg_attrdef entry.  Also, make the pg_attrdef entry's dependency
on the column itself be INTERNAL not AUTO.  That has two effects:

* the column will go away, not just lose its default, if any
dependency of the expression is dropped with CASCADE.  So we
don't need any special mechanism to make that happen.

* it provides an additional cross-check preventing someone from
dropping the default expression without dropping the column.

catversion bump because of change in the contents of pg_depend
(which also requires a change in one information_schema view).

Per bug #17439 from Kevin Humphreys.  Although this is a longstanding
bug, it seems impractical to back-patch because of the need for
catalog contents changes.

Discussion: https://postgr.es/m/17439-7df4421197e928f0@postgresql.org
2022-03-21 14:58:49 -04:00
Tom Lane 17f3bc0928 Move pg_attrdef manipulation code into new file catalog/pg_attrdef.c.
This is a pure refactoring commit: there isn't (I hope) any functional
change.

StoreAttrDefault and RemoveAttrDefault[ById] are moved from heap.c,
reducing the size of that overly-large file by about 300 lines.
I took the opportunity to trim unused #includes from heap.c, too.

Two new functions for translating between a pg_attrdef OID and the
relid/attnum of the owning column are created by extracting ad-hoc
code from objectaddress.c.  This already removes one copy of said
code, and a follow-on bug fix will create more callers.

The only other function directly manipulating pg_attrdef is
AttrDefaultFetch.  I judged it was better to leave that in relcache.c,
since it shares special concerns about recursion and error handling
with the rest of that module.

Discussion: https://postgr.es/m/651168.1647451676@sss.pgh.pa.us
2022-03-21 14:38:23 -04:00
Tom Lane 7b6ec86532 Fix risk of deadlock failure while dropping a partitioned index.
DROP INDEX needs to lock the index's table before the index itself,
else it will deadlock against ordinary queries that acquire the
relation locks in that order.  This is correctly mechanized for
plain indexes by RangeVarCallbackForDropRelation; but in the case of
a partitioned index, we neglected to lock the child tables in advance
of locking the child indexes.  We can fix that by traversing the
inheritance tree and acquiring the needed locks in RemoveRelations,
after we have acquired our locks on the parent partitioned table and
index.

While at it, do some refactoring to eliminate confusion between
the actual and expected relkind in RangeVarCallbackForDropRelation.
We can save a couple of syscache lookups too, by having that function
pass back info that RemoveRelations will need.

Back-patch to v11 where partitioned indexes were added.

Jimmy Yih, Gaurab Dey, Tom Lane

Discussion: https://postgr.es/m/BYAPR05MB645402330042E17D91A70C12BD5F9@BYAPR05MB6454.namprd05.prod.outlook.com
2022-03-21 12:22:13 -04:00
Tom Lane 1f8bc44868 Remove workarounds for avoiding [U]INT64_FORMAT in translatable strings.
Further code simplification along the same lines as d914eb347
and earlier patches.

Aleksander Alekseev, Japin Li

Discussion: https://postgr.es/m/CAJ7c6TMSKi3Xs8h5MP38XOnQQpBLazJvVxVfPn++roitDJcR7g@mail.gmail.com
2022-03-21 11:11:55 -04:00
Magnus Hagander c540d37157 Fix typo in file identification
Clearly a simple copy/paste mistake when the file was created.
2022-03-21 12:35:48 +02:00
Andres Freund d4ba8b51c7 pgstat: separate "xact level" handling out of relation specific functions.
This is in preparation of a later commit moving relation stats handling into
its own file.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-03-20 19:12:09 -07:00
Andres Freund bff258a273 pgstat: rename pgstat_initstats() to pgstat_relation_init().
The old name was overly generic. An upcoming commit moves relation stats
handling into its own file, making pgstat_initstats() look even more out of
place.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-03-20 19:12:09 -07:00
Andres Freund 8363102009 pgstat: introduce pgstat_relation_should_count().
A later commit will make the check more complicated than the
current (rel)->pgstat_info != NULL. It also just seems nicer to have a central
copy of the logic, even while still simple.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-03-20 19:12:09 -07:00
Alvaro Herrera 2d655a08d5
Blind fix for uninitialized memory bug in ba9a7e3921
Valgrind animal skink shows a crash in this new code.  I couldn't
reproduce the problem locally, but going by blind code inspection,
initializing insert_destrel should be sufficient to fix the problem.
2022-03-20 22:10:24 +01:00
Alvaro Herrera ba9a7e3921
Enforce foreign key correctly during cross-partition updates
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior.  For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.

This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation.  Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.

The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table.  Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.

The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions.  That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.

This misbehavior stems from commit f56f8f8da6 (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f17844104 (which had introduced cross-partition
updates a year earlier).  Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches.  It also depends on
commit f4566345cf, which had its own share of backpatchability issues
as well.

Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
2022-03-20 18:43:40 +01:00
Peter Eisentraut 3a671e1f7c Fix global ICU collations for ICU < 54
createdb() didn't check for collation attributes validity, which has
to be done explicitly on ICU < 54.  It also forgot to close the ICU collator
opened during the check which leaks some memory.

To fix both, add a new check_icu_locale() that does all the appropriate
verification and close the ICU collator.

initdb also had some partial check for ICU < 54.  To have consistent error
reporting across major ICU versions, and get rid of the need to include ucol.h,
remove the partial check there.  The backend will report an error if needed
during the post-boostrap iniitialization phase.

Author: Julien Rouhaud <julien.rouhaud@free.fr>
Discussion: https://www.postgresql.org/message-id/20220319041459.qqqiqh335sga5ezj@jrouhaud
2022-03-20 10:21:45 +01:00
Andres Freund 78f9506b38 pgstat: split out WAL handling from pgstat_{initialize,report_stat}.
A later commit will move the handling of the different kinds of stats into
separate files.  By splitting out WAL handling in this commit that later move
will just move code around without other changes.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-03-19 11:42:22 -07:00
Andres Freund 89c546c294 pgstat: split relation, database handling out of pgstat_report_stat().
pgstat_report_stat() handles several types of stats, yet relation stats have
so far been handled directly in pgstat_report_stat().

A later commit will move the handling of the different kinds of stats into
separate files.  By splitting out relation handling in this commit that later
move will just move code around without other changes.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-03-19 11:42:22 -07:00
Andres Freund a3a75b982b pgstat: run pgindent on pgstat.c/h.
Upcoming commits will touch a lot of the pgstats code. Reindenting separately
makes it easier to keep the code in a well-formatted shape each step.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-03-19 11:42:22 -07:00
Alvaro Herrera a1fc50672c
Fix an outdated and grammatically wrong comment
Authored by Amit Langote and myself independently
Discussion: https://postgr.es/m/CA+HiwqGCjcH0gG-=tM7hhP7TEDmzrHMHJbPGSHtHgFmx9mnFkg@mail.gmail.com
2022-03-19 19:34:04 +01:00
Andres Freund 97bddda61b Silence -Wmaybe-uninitialized compiler warning in dbcommands.c.
Introduced in f2553d4306. See also 3f6b3be39c, which did so for nearby
variables.

Discussion: https://postgr.es/m/20220319014707.kgtomqdzm6m2ulro@alap3.anarazel.de
2022-03-18 18:48:03 -07:00
Tom Lane 068739fb4f Fix incorrect xmlschema output for types timetz and timestamptz.
The output of table_to_xmlschema() and allied functions includes
a regex describing valid values for these types ... but the regex
was itself invalid, as it failed to escape a literal "+" sign.

Report and fix by Renan Soares Lopes.  Back-patch to all
supported branches.

Discussion: https://postgr.es/m/7f6fabaa-3f8f-49ab-89ca-59fbfe633105@me.com
2022-03-18 16:01:42 -04:00
Thomas Munro 3f1ce97346 Add circular WAL decoding buffer, take II.
Teach xlogreader.c to decode the WAL into a circular buffer.  This will
support optimizations based on looking ahead, to follow in a later
commit.

 * XLogReadRecord() works as before, decoding records one by one, and
   allowing them to be examined via the traditional XLogRecGetXXX()
   macros and certain traditional members like xlogreader->ReadRecPtr.

 * An alternative new interface XLogReadAhead()/XLogNextRecord() is
   added that returns pointers to DecodedXLogRecord objects so that it's
   now possible to look ahead in the WAL stream while replaying.

 * In order to be able to use the new interface effectively while
   streaming data, support is added for the page_read() callback to
   respond to a new nonblocking mode with XLREAD_WOULDBLOCK instead of
   waiting for more data to arrive.

No direct user of the new interface is included in this commit, though
XLogReadRecord() uses it internally.  Existing code doesn't need to
change, except in a few places where it was accessing reader internals
directly and now needs to go through accessor macros.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
2022-03-18 18:45:47 +13:00
Tom Lane d7b5c071dd Don't bother to attach column name lists to RowExprs of named types.
If a RowExpr is marked as returning a named composite type, we aren't
going to consult its colnames list; we'll use the attribute names
shown for the type in pg_attribute.  Hence, skip storing that list,
to save a few nanoseconds when copying the expression tree around.

Discussion: https://postgr.es/m/2950001.1638729947@sss.pgh.pa.us
2022-03-17 18:25:44 -04:00
Tom Lane ec62cb0aac Revert applying column aliases to the output of whole-row Vars.
In commit bf7ca1587, I had the bright idea that we could make the
result of a whole-row Var (that is, foo.*) track any column aliases
that had been applied to the FROM entry the Var refers to.  However,
that's not terribly logically consistent, because now the output of
the Var is no longer of the named composite type that the Var claims
to emit.  bf7ca1587 tried to handle that by changing the output
tuple values to be labeled with a blessed RECORD type, but that's
really pretty disastrous: we can wind up storing such tuples onto
disk, whereupon they're not readable by other sessions.

The only practical fix I can see is to give up on what bf7ca1587
tried to do, and say that the column names of tuples produced by
a whole-row Var are always those of the underlying named composite
type, query aliases or no.  While this introduces some inconsistencies,
it removes others, so it's not that awful in the abstract.  What *is*
kind of awful is to make such a behavioral change in a back-patched
bug fix.  But corrupt data is worse, so back-patched it will be.

(A workaround available to anyone who's unhappy about this is to
introduce an extra level of sub-SELECT, so that the whole-row Var is
referring to the sub-SELECT's output and not to a named table type.
Then the Var is of type RECORD to begin with and there's no issue.)

Per report from Miles Delahunty.  The faulty commit dates to 9.5,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/2950001.1638729947@sss.pgh.pa.us
2022-03-17 18:18:05 -04:00
Robert Haas 39f0c4bd67 Refactor code for reading and writing relation map files.
Restructure things so that the functions which update the global
variables shared_map and local_map are separate from the functions
which just read and write relation map files without touching any
global variables.

In the new structure of things, write_relmap_file() writes a relmap
file but no longer performs global variable updates. A symmetric
function read_relmap_file() that just reads a file without changing
any global variables is added, and load_relmap_file(), which does
change the global variables, uses it as a subroutine.

Because write_relmap_file() no longer updates shared_map and
local_map, that logic is moved to perform_relmap_update(). However,
no similar logic is added to relmap_redo() even though it also calls
write_relmap_file(). That's because recovery must not rely on the
contents of the relation map, and therefore there is no need to
initialize it. In fact, doing so seems like a mistake, because we
might then manage to rely on the in-memory map where we shouldn't.

Patch by me, based on earlier work by Dilip Kumar. Reviewed by
Ashutosh Sharma.

Discussion: http://postgr.es/m/CA+TgmobQLgrt4AXsc0ru7aFFkzv=9fS-Q_yO69=k9WY67RCctg@mail.gmail.com
2022-03-17 13:21:07 -04:00
Tomas Vondra 5a07966225 Fix row filters with multiple publications
When publishing changes through a artition root, we should use the row
filter for the top-most ancestor. The relation may be added to multiple
publications, using different ancestors, and 52e4f0cd47 handled this
incorrectly. With c91f71b9dc we find the correct top-most ancestor, but
the code tried to fetch the row filter from all publications, including
those using a different ancestor etc. No row filter can be found for
such publications, which was treated as replicating all rows.

Similarly to c91f71b9dc, this seems to be a rare issue in practice. It
requires multiple publications including the same partitioned relation,
through different ancestors.

Fixed by only passing publications containing the top-most ancestor to
pgoutput_row_filter_init(), so that treating a missing row filter as
replicating all rows is correct.

Report and fix by me, test case by Hou zj. Reviews and improvements by
Amit Kapila.

Author: Tomas Vondra, Hou zj, Amit Kapila
Reviewed-by: Amit Kapila, Hou zj
Discussion: https://postgr.es/m/d26d24dd-2fab-3c48-0162-2b7f84a9c893%40enterprisedb.com
2022-03-17 17:03:48 +01:00
Alvaro Herrera 25e777cf8e
Split ExecUpdate and ExecDelete into reusable pieces
Create subroutines ExecUpdatePrologue / ExecUpdateAct /
ExecUpdateEpilogue, and similar for ExecDelete.

Introduce a new struct to be used internally in nodeModifyTable.c,
dubbed ModifyTableContext, which contains all context information needed
to perform these operations, as well as ExecInsert and others.

This allows using a different schedule and a different way of evaluating
the results of these operations, which can be exploited by a later
commit introducing support for MERGE.  It also makes ExecUpdate and
ExecDelete proper shorter and (hopefully) simpler.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/202202271724.4z7xv3cf46kv@alvherre.pgsql
2022-03-17 11:47:04 +01:00
Peter Eisentraut f2553d4306 Add option to use ICU as global locale provider
This adds the option to use ICU as the default locale provider for
either the whole cluster or a database.  New options for initdb,
createdb, and CREATE DATABASE are used to select this.

Since some (legacy) code still uses the libc locale facilities
directly, we still need to set the libc global locale settings even if
ICU is otherwise selected.  So pg_database now has three
locale-related fields: the existing datcollate and datctype, which are
always set, and a new daticulocale, which is only set if ICU is
selected.  A similar change is made in pg_collation for consistency,
but in that case, only the libc-related fields or the ICU-related
field is set, never both.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com
2022-03-17 11:13:16 +01:00
Michael Paquier f6f0db4d62 Fix pg_tablespace_location() with in-place tablespaces
Using this system function with an in-place tablespace (created when
allow_in_place_tablespaces is enabled by specifying an empty string as
location) caused a failure when using readlink(), as the tablespace is,
in this case, not a symbolic link in pg_tblspc/ but a directory.

Rather than getting a failure, the commit changes
pg_tablespace_location() so as a relative path to the data directory is
returned for in-place tablespaces, to make a difference between
tablespaces created when allow_in_place_tablespaces is enabled or not.
Getting a path rather than an empty string that would match the CREATE
TABLESPACE command in this case is more useful for tests that would like
to rely on this function.

While on it, a regression test is added for this case.  This is simple
to add in the main regression test suite thanks to regexp_replace() to
mask the part of the tablespace location dependent on its OID.

Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi, Thomas Munro
Discussion: https://postgr.es/m/YiG1RleON1WBcLnX@paquier.xyz
2022-03-17 11:25:02 +09:00
Tomas Vondra c91f71b9dc Fix publish_as_relid with multiple publications
Commit 83fd4532a7 allowed publishing of changes via ancestors, for
publications defined with publish_via_partition_root. But the way
the ancestor was determined in get_rel_sync_entry() was incorrect,
simply updating the same variable. So with multiple publications,
replicating different ancestors, the outcome depended on the order
of publications in the list - the value from the last loop was used,
even if it wasn't the top-most ancestor.

This is a probably rare situation, as in most cases publications do
not overlap, so each partition has exactly one candidate ancestor
to replicate as and there's no ambiguity.

Fixed by tracking the "ancestor level" for each publication, and
picking the top-most ancestor. Adds a test case, verifying the
correct ancestor is used for publishing the changes and that this
does not depend on order of publications in the list.

Older releases have another bug in this loop - once all actions are
replicated, the loop is terminated, on the assumption that inspecting
additional publications is unecessary. But that misses the fact that
those additional applications may replicate different ancestors.

Fixed by removal of this break condition. We might still terminate the
loop in some cases (e.g. when replicating all actions and the ancestor
is the partition root).

Backpatch to 13, where publish_via_partition_root was introduced.

Initial report and fix by me, test added by Hou zj. Reviews and
improvements by Amit Kapila.

Author: Tomas Vondra, Hou zj, Amit Kapila
Reviewed-by: Amit Kapila, Hou zj
Discussion: https://postgr.es/m/d26d24dd-2fab-3c48-0162-2b7f84a9c893%40enterprisedb.com
2022-03-16 18:05:58 +01:00
Robert Haas d0083c1d2a Suppress compiler warnings.
Michael Paquier

Discussion: http://postgr.es/m/YjGvq4zPDT6j15go@paquier.xyz
2022-03-16 09:26:48 -04:00
Thomas Munro 46d9bfb0a6 Fix race between DROP TABLESPACE and checkpointing.
Commands like ALTER TABLE SET TABLESPACE may leave files for the next
checkpoint to clean up.  If such files are not removed by the time DROP
TABLESPACE is called, we request a checkpoint so that they are deleted.
However, there is presently a window before checkpoint start where new
unlink requests won't be scheduled until the following checkpoint.  This
means that the checkpoint forced by DROP TABLESPACE might not remove the
files we expect it to remove, and the following ERROR will be emitted:

	ERROR:  tablespace "mytblspc" is not empty

To fix, add a call to AbsorbSyncRequests() just before advancing the
unlink cycle counter.  This ensures that any unlink requests forwarded
prior to checkpoint start (i.e., when ckpt_started is incremented) will
be processed by the current checkpoint.  Since AbsorbSyncRequests()
performs memory allocations, it cannot be called within a critical
section, so we also need to move SyncPreCheckpoint() to before
CreateCheckPoint()'s critical section.

This is an old bug, so back-patch to all supported versions.

Author: Nathan Bossart <nathandbossart@gmail.com>
Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220215235845.GA2665318%40nathanxps13
2022-03-16 17:20:24 +13:00
Thomas Munro 3390ef1b7b Fix waiting in RegisterSyncRequest().
If we run out of space in the checkpointer sync request queue (which is
hopefully rare on real systems, but common with very small buffer pool),
we wait for it to drain.  While waiting, we should report that as a wait
event so that users know what is going on, and also handle postmaster
death, since otherwise the loop might never terminate if the
checkpointer has exited.

Back-patch to 12.  Although the problem exists in earlier releases too,
the code is structured differently before 12 so I haven't gone any
further for now, in the absence of field complaints.

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
2022-03-16 15:35:16 +13:00
Thomas Munro 5e6368b42e Wake up for latches in CheckpointWriteDelay().
The checkpointer shouldn't ignore its latch.  Other backends may be
waiting for it to drain the request queue.  Hopefully real systems don't
have a full queue often, but the condition is reached easily when
shared_buffers is small.

This involves defining a new wait event, which will appear in the
pg_stat_activity view often due to spread checkpoints.

Back-patch only to 14.  Even though the problem exists in earlier
branches too, it's hard to hit there.  In 14 we stopped using signal
handlers for latches on Linux, *BSD and macOS, which were previously
hiding this problem by interrupting the sleep (though not reliably, as
the signal could arrive before the sleep begins; precisely the problem
latches address).

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
2022-03-16 13:57:59 +13:00
Thomas Munro a56e7b6601 Silence LLVM 14 API deprecation warnings.
We are going to need to handle the upcoming opaque pointer API
changes[1], possibly in time for LLVM 15, but in the meantime let's
silence the warnings produced by LLVM 14.

[1] https://llvm.org/docs/OpaquePointers.html

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bp%3DfaBQR2PSAqWoWa%2B_tJdKPT0wjZPQe7XcDEttUCgdQ%40mail.gmail.com
2022-03-16 10:30:55 +13:00
Robert Haas 8ef1fa3ee0 Remove accidentally-committed file. 2022-03-15 13:41:36 -04:00
Robert Haas e4ba69f3f4 Allow extensions to add new backup targets.
Commit 3500ccc39b allowed for base backup
targets, meaning that we could do something with the backup other than
send it to the client, but all of those targets had to be baked in to
the core code. This commit makes it possible for extensions to define
additional backup targets.

Patch by me, reviewed by Abhijit Menon-Sen.

Discussion: http://postgr.es/m/CA+TgmoaqvdT-u3nt+_kkZ7bgDAyqDB0i-+XOMmr5JN2Rd37hxw@mail.gmail.com
2022-03-15 13:22:04 -04:00
Robert Haas 75eae09087 Change HAVE_LIBLZ4 and HAVE_LIBZSTD tests to USE_LZ4 and USE_ZSTD.
These tests were added recently, but older code tests USE_LZ4 rathr
than HAVE_LIBLZ4, so let's follow the established precedent. It
also seems more consistent with the intent of the configure tests,
since I think that the USE_* symbols are intended to correspond to
what the user requested, and the HAVE_* symbols to what configure
found while probing.

Discussion: http://postgr.es/m/CA+Tgmoap+hTD2-QNPJLH4tffeFE8MX5+xkbFKMU3FKBy=ZSNKA@mail.gmail.com
2022-03-15 13:06:25 -04:00
Amit Kapila 695f459f17 Fix compiler warning introduced in commit 705e20f855.
Reported-by: Nathan Bossart
Author: Nathan Bossart
Reviewed-by: Osumi Takamichi
Discussion : https://postgr.es/m/20220314230424.GA1085716@nathanxps13
2022-03-15 08:11:17 +05:30
Michael Paquier 6bdf1a1400 Fix collection of typos in the code and the documentation
Some words were duplicated while other places were grammatically
incorrect, including one variable name in the code.

Author: Otto Kekalainen, Justin Pryzby
Discussion: https://postgr.es/m/7DDBEFC5-09B6-4325-B942-B563D1A24BDC@amazon.com
2022-03-15 11:29:35 +09:00
Thomas Munro c6f2f01611 Fix pg_basebackup with in-place tablespaces.
Previously, pg_basebackup from a cluster that contained an 'in-place'
tablespace, as introduced by commit 7170f215, would produce a harmless
warning on Unix and fail completely on Windows.

Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20220304.165449.1200020258723305904.horikyota.ntt%40gmail.com
2022-03-15 14:01:23 +13:00
Robert Haas 9dde82899c Support "of", "tzh", and "tzm" format codes.
The upper case versions "OF", "TZH", and "TZM" are already supported,
and all other format codes that are supported in upper case are also
supported in lower case, so we should support these as well for
consistency.

Nitin Jadhav, with a tiny cosmetic change by me. Reviewed by Suraj
Kharage and David Zhang.

Discussion: http://postgr.es/m/CAMm1aWZ-oZyKd75+8D=VJ0sAoSwtdXWLP-MAWD4D8R1Dgandzw@mail.gmail.com
2022-03-14 16:50:54 -04:00
Amit Kapila 705e20f855 Optionally disable subscriptions on error.
Logical replication apply workers for a subscription can easily get stuck
in an infinite loop of attempting to apply a change, triggering an error
(such as a constraint violation), exiting with the error written to the
subscription server log, and restarting.

To partially remedy the situation, this patch adds a new subscription
option named 'disable_on_error'. To be consistent with old behavior, this
option defaults to false. When true, both the tablesync worker and apply
worker catch any errors thrown and disable the subscription in order to
break the loop. The error is still also written in the logs.

Once the subscription is disabled, users can either manually resolve the
conflict/error or skip the conflicting transaction by using
pg_replication_origin_advance() function. After resolving the conflict,
users need to enable the subscription to allow apply process to proceed.

Author: Osumi Takamichi and Mark Dilger
Reviewed-by: Greg Nancarrow, Vignesh C, Amit Kapila, Wang wei, Tang Haiying, Peter Smith, Masahiko Sawada, Shi Yu
Discussion : https://postgr.es/m/DB35438F-9356-4841-89A0-412709EBD3AB%40enterprisedb.com
2022-03-14 09:32:40 +05:30
Peter Geoghegan 6e20f4600a VACUUM VERBOSE: tweak scanned_pages logic.
Commit 872770fd6c taught VACUUM VERBOSE and autovacuum logging to
display the total number of pages scanned by VACUUM.  This information
was also displayed as a percentage of rel_pages in parenthesis, which
makes it easy to spot trends over time and across tables.

The instrumentation displayed "0 scanned (0.00% of total)" for totally
empty tables.  Tweak the instrumentation: have it show "0 scanned
(100.00% of total)" for empty tables instead.  This approach is clearer
and more consistent.
2022-03-13 13:07:49 -07:00
Peter Geoghegan e370f100f0 vacuumlazy.c: Standardize rel_pages terminology.
VACUUM's rel_pages field indicates the size of the target heap rel just
after the table_relation_vacuum() operation began.  There are specific
expectations around how rel_pages can be related to other nearby state.
In particular, the range of rel_pages must contain every tuple in the
relation whose tuple headers might contain an XID < OldestXmin.

Consistently refer to the field as rel_pages to make this clearer and
more discoverable.

This is follow-up work to commit 73f6ec3d from earlier today.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220311031351.sbge5m2bpvy2ttxg@alap3.anarazel.de
2022-03-12 13:20:45 -08:00
Peter Geoghegan 73f6ec3d3c vacuumlazy.c: document vistest and OldestXmin.
Explain the relationship between vacuumlazy.c's vistest and OldestXmin
cutoffs.  These closely related cutoffs are different in subtle but
important ways.  Also document a closely related rule: we must establish
rel_pages _after_ OldestXmin to ensure that no XID < OldestXmin can be
missed by lazy_scan_heap().

It's easier to explain these issues by initializing everything together,
so consolidate initialization of vacrel state.  Now almost every vacrel
field is initialized by heap_vacuum_rel().  The only remaining exception
is the dead_items array, which is still managed by lazy_scan_heap() due
to interactions with how we initialize parallel VACUUM.

Also move the process that updates pg_class entries for each index into
heap_vacuum_rel(), and adjust related assertions.  All pg_class updates
now take place after lazy_scan_heap() returns, which seems clearer.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20211211045710.ljtuu4gfloh754rs@alap3.anarazel.de
Discussion: https://postgr.es/m/CAH2-WznYsUxVT156rCQ+q=YD4S4=1M37hWvvHLz-H1pwSM8-Ew@mail.gmail.com
2022-03-12 12:52:38 -08:00
Peter Geoghegan 5b68f75e12 Normalize heap_prepare_freeze_tuple argument name.
We called the argument totally_frozen in its function prototype as well
as in code comments, even though totally_frozen_p was used in the
function definition.  Standardize on totally_frozen.
2022-03-11 19:30:21 -08:00
Alvaro Herrera 3a46a45f6f
Add API of sorts for transition table handling in trigger.c
Preparatory patch for further additions in this area, particularly to
allow MERGE to have separate transition tables for each action.

Author: Pavan Deolasee <pavan.deolasee@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CABOikdNj+8HEJ5D8tu56mrPkjHVRrBb2_cdKWwpiYNcjXgDw8g@mail.gmail.com
Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql
2022-03-11 20:40:03 -03:00
Tom Lane 641f3dffcd Restore the previous semantics of get_constraint_index().
Commit 8b069ef5d changed this function to look at pg_constraint.conindid
rather than searching pg_depend.  That was a good performance improvement,
but it failed to preserve the exact semantics.  The old code would only
return an index that was "owned by" (internally dependent on) the
specified constraint, whereas the new code will also return indexes that
are just referenced by foreign key constraints.  This confuses ALTER
TABLE, which was implicitly expecting the previous semantics, into
failing with errors like
    ERROR:  relation 146621 has multiple clustered indexes
or
    ERROR:  "pk_attbl" is not an index for table "atref"

We can fix this without reverting the performance improvement by adding
a contype check in get_constraint_index().  Another way could be to
make ALTER TABLE check it, but I'm worried that extension code could
also have subtle dependencies on the old semantics.

Tom Lane and Japin Li, per bug #17409 from Holly Roberts.
Back-patch to v14 where the error crept in.

Discussion: https://postgr.es/m/17409-52871dda8b5741cb@postgresql.org
2022-03-11 13:47:29 -05:00
Peter Eisentraut e94bb1473e DefineCollation() code cleanup
Reorganize the code in DefineCollation() so that the parts using the
FROM clause and the parts not doing so are more cleanly separated.  No
functionality change intended.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/29ae752f-80e9-8d31-601c-62cf01cc93d8@enterprisedb.com
2022-03-11 08:32:52 +01:00
Michael Paquier e9537321a7 Add support for zstd with compression of full-page writes in WAL
wal_compression gains a new value, "zstd", to allow the compression of
full-page images using the compression method of the same name.

Compression is done using the default level recommended by the library,
as of ZSTD_CLEVEL_DEFAULT = 3.  Some benchmarking has shown that it
could make sense to use a level lower for the FPI compression, like 1 or
2, as the compression rate did not change much with a bit less CPU
consumed, but any tests done would only cover few scenarios so it is
hard to come to a clear conclusion.  Anyway, there is no reason to not
use the default level instead, which is the level recommended by the
library so it should be fine for most cases.

zstd outclasses easily pglz, and is better than LZ4 where one wants to
have more compression at the cost of extra CPU but both are good enough
in their own scenarios, so the choice between one or the other of these
comes to a study of the workload patterns and the schema involved,
mainly.

This commit relies heavily on 4035cd5, that reshaped the code creating
and restoring full-page writes to be aware of the compression type,
making this integration straight-forward.

This patch borrows some early work from Andrey Borodin, though the patch
got a complete rewrite.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20220222231948.GJ9008@telsasoft.com
2022-03-11 12:18:53 +09:00
Michael Paquier 0071fc7127 Fix header inclusion order in xloginsert.c with lz4.h
Per project policy, all system and library headers need to be declared
in the backend code after "postgres.h" and before the internal headers,
but 4035cd5 broke this policy when adding support for LZ4 in
wal_compression.

Noticed while reviewing the patch to add support for zstd in this area.
This only impacts HEAD, so there is no need for a back-patch.
2022-03-11 10:59:47 +09:00
Andres Freund 352d297dc7 dshash: Add sequential scan support.
Add ability to scan all entries sequentially to dshash. The interface is
similar but a bit different both from that of dynahash and simple dshash
search functions. The most significant differences is that dshash's interfac
always needs a call to dshash_seq_term when scan ends. Another is
locking. Dshash holds partition lock when returning an entry,
dshash_seq_next() also holds lock when returning an entry but callers
shouldn't release it, since the lock is essential to continue a scan. The
seqscan interface allows entry deletion while a scan is in progress using
dshash_delete_current().

Reviewed-By: Andres Freund <andres@anarazel.de>
Author: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
2022-03-10 12:57:05 -08:00
Peter Eisentraut df4c3cbd8f Add parse_analyze_withcb()
This extracts code from pg_analyze_and_rewrite_withcb() into a
separate function that mirrors the existing
parse_analyze_fixedparams() and parse_analyze_varparams().

Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://www.postgresql.org/message-id/flat/c67ce276-52b4-0239-dc0e-39875bf81840@enterprisedb.com
2022-03-09 11:08:16 +01:00
Robert Haas 1d4be6be65 Fix LZ4 tests for remaining buffer space.
We should flush the buffer when the remaining space is less than
the maximum amount that we might need, not when it is less than or
equal to the maximum amount we might need.

Jeevan Ladhe, per an observation from me.

Discussion: http://postgr.es/m/CANm22CgVMa85O1akgs+DOPE8NSrT1zbz5_vYfS83_r+6nCivLQ@mail.gmail.com
2022-03-08 10:05:55 -05:00
Robert Haas 7cf085f077 Add support for zstd base backup compression.
Both client-side compression and server-side compression are now
supported for zstd. In addition, a backup compressed by the server
using zstd can now be decompressed by the client in order to
accommodate the use of -Fp.

Jeevan Ladhe, with some edits by me.

Discussion: http://postgr.es/m/CA+Tgmobyzfbz=gyze2_LL1ZumZunmaEKbHQxjrFkOR7APZGu-g@mail.gmail.com
2022-03-08 09:52:43 -05:00
Michael Paquier c28839c832 Improve comment in execReplication.c
Author: Peter Smith
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/CAHut+PuRVf3ghNTg8EV5XOQu6unGSZma0ahsRoz-haaOFZe-1A@mail.gmail.com
2022-03-08 14:29:03 +09:00
Amit Kapila d3e8368c4b Add the additional information to the logical replication worker errcontext.
This commits adds both the finish LSN (commit_lsn in case transaction got
committed, prepare_lsn in case of a prepared transaction, etc.) and
replication origin name to the existing error context message.

This will help users in specifying the origin name and transaction finish
LSN to pg_replication_origin_advance() SQL function to skip a particular
transaction.

Author: Masahiko Sawada
Reviewed-by: Takamichi Osumi, Euler Taveira, and Amit Kapila
Discussion: https://postgr.es/m/CAD21AoBarBf2oTF71ig2g_o=3Z_Dt6_sOpMQma1kFgbnA5OZ_w@mail.gmail.com
2022-03-08 08:08:32 +05:30
Tomas Vondra d5ed9da41d Call ReorderBufferProcessXid from sequence_decode
Commit 0da92dc530 added sequence_decode() implementing logical decoding
of sequences, but it failed to call ReorderBufferProcessXid() as it
should. So add the missing call.

Reported-by: Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KGn6cQqJEsubOOENwQOANsExiV2sKL52r4U10J8NJEMQ%40mail.gmail.com
2022-03-07 20:53:16 +01:00
Peter Eisentraut 25751f54b8 Add pg_analyze_and_rewrite_varparams()
This new function extracts common code from PrepareQuery() and
exec_parse_message().  It is then exactly analogous to the existing
pg_analyze_and_rewrite_fixedparams() and
pg_analyze_and_rewrite_withcb().

To unify these two code paths, this makes PrepareQuery() now subject
to log_parser_stats.  Also, both paths now invoke
TRACE_POSTGRESQL_QUERY_REWRITE_START().  PrepareQuery() no longer
checks whether a utility statement was specified.  The grammar doesn't
allow that anyway, and exec_parse_message() supports it, so
restricting it doesn't seem necessary.

This also adds QueryEnvironment support to the *varparams functions,
for consistency with its cousins, even though it is not used right
now.

Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://www.postgresql.org/message-id/flat/c67ce276-52b4-0239-dc0e-39875bf81840@enterprisedb.com
2022-03-07 08:13:30 +01:00
Amit Kapila 5e0e99a80b Make the errcontext message in logical replication worker translation friendly.
Previously, the message for logical replication worker errcontext is
incrementally built, which was not translation friendly.  Instead, we use
complete sentences with if-else branches.

We also remove the commit timestamp from the context message since it's
not important information and made the message long.

Author: Masahiko Sawada
Reviewed-by: Takamichi Osumi, and Amit Kapila
Discussion: https://postgr.es/m/CAD21AoBarBf2oTF71ig2g_o=3Z_Dt6_sOpMQma1kFgbnA5OZ_w@mail.gmail.com
2022-03-07 08:33:58 +05:30
Michael Paquier 9e98583898 Create routine able to set single-call SRFs for Materialize mode
Set-returning functions that use the Materialize mode, creating a
tuplestore to include all the tuples returned in a set rather than doing
so in multiple calls, use roughly the same set of steps to prepare
ReturnSetInfo for this job:
- Check if ReturnSetInfo supports returning a tuplestore and if the
materialize mode is enabled.
- Create a tuplestore for all the tuples part of the returned set in the
per-query memory context, stored in ReturnSetInfo->setResult.
- Build a tuple descriptor mostly from get_call_result_type(), then
stored in ReturnSetInfo->setDesc.  Note that there are some cases where
the SRF's tuple descriptor has to be the one specified by the function
caller.

This refactoring is done so as there are (well, should be) no behavior
changes in any of the in-core functions refactored, and the centralized
function that checks and sets up the function's ReturnSetInfo can be
controlled with a set of bits32 options.  Two of them prove to be
necessary now:
- SRF_SINGLE_USE_EXPECTED to use expectedDesc as tuple descriptor, as
expected by the function's caller.
- SRF_SINGLE_BLESS to validate the tuple descriptor for the SRF.

The same initialization pattern is simplified in 28 places per my
count as of src/backend/, shaving up to ~900 lines of code.  These
mostly come from the removal of the per-query initializations and the
sanity checks now grouped in a single location.  There are more
locations that could be simplified in contrib/, that are left for a
follow-up cleanup.

fcc2817, 07daca5 and d61a361 have prepared the areas of the code related
to this change, to ease this refactoring.

Author: Melanie Plageman, Michael Paquier
Reviewed-by: Álvaro Herrera, Justin Pryzby
Discussion: https://postgr.es/m/CAAKRu_azyd1Z3W_r7Ou4sorTjRCs+PxeHw1CWJeXKofkE6TuZg@mail.gmail.com
2022-03-07 10:26:29 +09:00
Peter Eisentraut 791b1b71da Parse/analyze function renaming
There are three parallel ways to call parse/analyze: with fixed
parameters, with variable parameters, and by supplying your own parser
callback.  Some of the involved functions were confusingly named and
made this API structure more confusing.  This patch renames some
functions to make this clearer:

parse_analyze() -> parse_analyze_fixedparams()
pg_analyze_and_rewrite() -> pg_analyze_and_rewrite_fixedparams()

(Otherwise one might think this variant doesn't accept parameters, but
in fact all three ways accept parameters.)

pg_analyze_and_rewrite_params() -> pg_analyze_and_rewrite_withcb()

(Before, and also when considering pg_analyze_and_rewrite(), one might
think this is the only way to pass parameters.  Moreover, the parser
callback doesn't necessarily need to parse only parameters, it's just
one of the things it could do.)

parse_fixed_parameters() -> setup_parse_fixed_parameters()
parse_variable_parameters() -> setup_parse_variable_parameters()

(These functions don't actually do any parsing, they just set up
callbacks to use during parsing later.)

This patch also adds some const decorations to the fixed-parameters
API, so the distinction from the variable-parameters API is more
clear.

Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://www.postgresql.org/message-id/flat/c67ce276-52b4-0239-dc0e-39875bf81840@enterprisedb.com
2022-03-04 14:50:22 +01:00
Tom Lane f7ea240aa7 Tighten overflow checks in tidin().
This code seems to have been written on the assumption that
"unsigned long" is 32 bits; or at any rate it ignored the
possibility of conversion overflow.  Rewrite, borrowing some
logic from oidin().

Discussion: https://postgr.es/m/3441768.1646343914@sss.pgh.pa.us
2022-03-03 20:04:35 -05:00
Tom Lane 46ab07ffda Clean up assorted failures under clang's -fsanitize=undefined checks.
Most of these are cases where we could call memcpy() or other libc
functions with a NULL pointer and a zero count, which is forbidden
by POSIX even though every production version of libc allows it.
We've fixed such things before in a piecemeal way, but apparently
never made an effort to try to get them all.  I don't claim that
this patch does so either, but it gets every failure I observe in
check-world, using clang 12.0.1 on current RHEL8.

numeric.c has a different issue that the sanitizer doesn't like:
"ln(-1.0)" will compute log10(0) and then try to assign the
resulting -Inf to an integer variable.  We don't actually use the
result in such a case, so there's no live bug.

Back-patch to all supported branches, with the idea that we might
start running a buildfarm member that tests this case.  This includes
back-patching c1132aae3 (Check the size in COPY_POINTER_FIELD),
which previously silenced some of these issues in copyfuncs.c.

Discussion: https://postgr.es/m/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com
2022-03-03 18:13:24 -05:00
Michael Paquier 62ce0c758d Fix catalog data of pg_stop_backup(), labelled v2
This function has been incorrectly marked as a set-returning function
with prorows (estimated number of rows) set to 1 since its creation in
7117685, that introduced non-exclusive backups.  There is no need for
that as the function is designed to return only one tuple.

This commit fixes the catalog definition of pg_stop_backup_v2() so as it
is not marked as proretset anymore, with prorows set to 0.  This
simplifies its internals by removing one tuplestore (used for one single
record anyway) and by removing all the checks related to a set-returning
function.

Issue found during my quest to simplify some of the logic used in
in-core system functions.

Bump catalog version.

Reviewed-by: Aleksander Alekseev, Kyotaro Horiguchi
Discussion: https://postgr.es/m/Yh8guT78f1Ercfzw@paquier.xyz
2022-03-03 10:51:57 +09:00
Amit Kapila 7a85073290 Reconsider pg_stat_subscription_workers view.
It was decided (refer to the Discussion link below) that the stats
collector is not an appropriate place to store the error information of
subscription workers.

This patch changes the pg_stat_subscription_workers view (introduced by
commit 8d74fc96db) so that it stores only statistics counters:
apply_error_count and sync_error_count, and has one entry for
each subscription. The removed error information such as error-XID and
the error message would be stored in another way in the future which is
more reliable and persistent.

After removing these error details, there is no longer any relation
information, so the subscription statistics are now a cluster-wide
statistics.

The patch also changes the view name to pg_stat_subscription_stats since
the word "worker" is an implementation detail that we use one worker for
one tablesync and one apply.

Author: Masahiko Sawada, based on suggestions by Andres Freund
Reviewed-by: Peter Smith, Haiying Tang, Takamichi Osumi, Amit Kapila
Discussion: https://postgr.es/m/20220125063131.4cmvsxbz2tdg6g65@alap3.anarazel.de
2022-03-01 06:17:52 +05:30
Tom Lane 54bd1e43ca Handle integer overflow in interval justification functions.
justify_interval, justify_hours, and justify_days didn't check for
overflow when promoting hours to days or days to months; but that's
possible when the upper field's value is already large.  Detect and
report any such overflow.

Also, we can avoid unnecessary overflow in some cases in justify_interval
by pre-justifying the days field.  (Thanks to Nathan Bossart for this
idea.)

Joe Koshakow

Discussion: https://postgr.es/m/CAAvxfHeNqsJ2xYFbPUf_8nNQUiJqkag04NW6aBQQ0dbZsxfWHA@mail.gmail.com
2022-02-28 15:36:54 -05:00
Tom Lane a59c79564b Allow root-owned SSL private keys in libpq, not only the backend.
This change makes libpq apply the same private-key-file ownership
and permissions checks that we have used in the backend since commit
9a83564c5.  Namely, that the private key can be owned by either the
current user or root (with different file permissions allowed in the
two cases).  This allows system-wide management of key files, which
is just as sensible on the client side as the server, particularly
when the client is itself some application daemon.

Sync the comments about this between libpq and the backend, too.

David Steele

Discussion: https://postgr.es/m/f4b7bc55-97ac-9e69-7398-335e212f7743@pgmasters.net
2022-02-28 14:12:52 -05:00
Tom Lane 12d768e704 Don't use static storage for SaveTransactionCharacteristics().
This is pretty queasy-making on general principles, and the more so
once you notice that CommitTransactionCommand() is actually stomping
on the values saved by _SPI_commit().  It's okay as long as the
active values didn't change during HoldPinnedPortals(); but that's
a larger assumption than I think we want to make, especially since
the fix is so simple.

Discussion: https://postgr.es/m/1533956.1645731245@sss.pgh.pa.us
2022-02-28 12:54:12 -05:00
Tom Lane 2e517818f4 Fix SPI's handling of errors during transaction commit.
SPI_commit previously left it up to the caller to recover from any error
occurring during commit.  Since that's complicated and requires use of
low-level xact.c facilities, it's not too surprising that no caller got
it right.  Let's move the responsibility for cleanup into spi.c.  Doing
that requires redefining SPI_commit as starting a new transaction, so
that it becomes equivalent to SPI_commit_and_chain except that you get
default transaction characteristics instead of preserving the prior
transaction's characteristics.  We can make this pretty transparent
API-wise by redefining SPI_start_transaction() as a no-op.  Callers
that expect to do something in between might be surprised, but
available evidence is that no callers do so.

Having made that API redefinition, we can fix this mess by having
SPI_commit[_and_chain] trap errors and start a new, clean transaction
before re-throwing the error.  Likewise for SPI_rollback[_and_chain].
Some cleanup is also needed in AtEOXact_SPI, which was nowhere near
smart enough to deal with SPI contexts nested inside a committing
context.

While plperl and pltcl need no changes beyond removing their now-useless
SPI_start_transaction() calls, plpython needs some more work because it
hadn't gotten the memo about catching commit/rollback errors in the
first place.  Such an error resulted in longjmp'ing out of the Python
interpreter, which leaks Python stack entries at present and is reported
to crash Python 3.11 altogether.  Add the missing logic to catch such
errors and convert them into Python exceptions.

We are probably going to have to back-patch this once Python 3.11 ships,
but it's a sufficiently basic change that I'm a bit nervous about doing
so immediately.  Let's let it bake awhile in HEAD first.

Peter Eisentraut and Tom Lane

Discussion: https://postgr.es/m/3375ffd8-d71c-2565-e348-a597d6e739e3@enterprisedb.com
Discussion: https://postgr.es/m/17416-ed8fe5d7213d6c25@postgresql.org
2022-02-28 12:45:36 -05:00
Dean Rasheed d1b307eef2 Optimise numeric division for one and two base-NBASE digit divisors.
Formerly div_var() had "fast path" short division code that was
significantly faster when the divisor was just one base-NBASE digit,
but otherwise used long division.

This commit adds a new function div_var_int() that divides by an
arbitrary 32-bit integer, using the fast short division algorithm, and
updates both div_var() and div_var_fast() to use it for one and two
digit divisors. In the case of div_var(), this is slightly faster in
the one-digit case, because it avoids some digit array copying, and is
much faster in the two-digit case where it replaces long division. For
div_var_fast(), it is much faster in both cases because the main
div_var_fast() algorithm is optimised for larger inputs.

Additionally, optimise exp() and ln() by using div_var_int(), allowing
a NumericVar to be replaced by an int in a couple of places, most
notably in the Taylor series code. This produces a significant speedup
of exp(), ln() and the numeric_big regression test.

Dean Rasheed, reviewed by Tom Lane.

Discussion: https://postgr.es/m/CAEZATCVwsBi-ND-t82Cuuh1=8ee6jdOpzsmGN+CUZB6yjLg9jw@mail.gmail.com
2022-02-27 11:12:30 +00:00
Dean Rasheed d996d648f3 Simplify the inner loop of numeric division in div_var().
In the standard numeric division algorithm, the inner loop multiplies
the divisor by the next quotient digit and subtracts that from the
working dividend. As suggested by the original code comment, the
separate "carry" and "borrow" variables (from the multiplication and
subtraction steps respectively) can be folded together into a single
variable. Doing so significantly improves performance, as well as
simplifying the code.

Dean Rasheed, reviewed by Tom Lane.

Discussion: https://postgr.es/m/CAEZATCVwsBi-ND-t82Cuuh1=8ee6jdOpzsmGN+CUZB6yjLg9jw@mail.gmail.com
2022-02-27 10:41:12 +00:00
Dean Rasheed e3d41d08a1 Apply auto-vectorization to the inner loop of div_var_fast().
This loop is basically the same as the inner loop of mul_var(), which
was auto-vectorized in commit 8870917623, but the compiler will only
consider auto-vectorizing the div_var_fast() loop if the assignment
target div[qi + i] is replaced by div_qi[i], where div_qi = &div[qi].

Additionally, since the compiler doesn't know that qdigit is
guaranteed to fit in a 16-bit NumericDigit, cast it to NumericDigit
before multiplying to make the resulting auto-vectorized code more
efficient (avoiding unnecessary multiplication of the high 16 bits).

While at it, per suggestion from Tom Lane, change var1digit in
mul_var() to be a NumericDigit rather than an int for the same
reason. This actually makes no difference with modern gcc, but it
might help other compilers generate more efficient assembly.

Dean Rasheed, reviewed by Tom Lane.

Discussion: https://postgr.es/m/CAEZATCVwsBi-ND-t82Cuuh1=8ee6jdOpzsmGN+CUZB6yjLg9jw@mail.gmail.com
2022-02-27 10:15:46 +00:00
Andres Freund d33aeefd9b Fix warning on mingw due to pid_t width, introduced in fe0972ee5e. 2022-02-26 16:07:07 -08:00
Amit Kapila a89850a57e Fix typo in logicalfuncs.c.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACX1mVtw8LWEnZgnpPdk2bPFR1xX2ZN+8GfXCffyip_9=Q@mail.gmail.com
2022-02-26 10:38:37 +05:30
Andres Freund fe0972ee5e Add further debug info to help debug 019_replslot_limit.pl failures.
See also afdeff1052. Failures after that commit provided a few more hints,
but not yet enough to understand what's going on.

In 019_replslot_limit.pl shut down nodes with fast instead of immediate mode
if we observe the failure mode. That should tell us whether the failures we're
observing are just a timing issue under high load. PGCTLTIMEOUT should prevent
buildfarm animals from hanging endlessly.

Also adds a bit more logging to replication slot drop and ShutdownPostgres().

Discussion: https://postgr.es/m/20220225192941.hqnvefgdzaro6gzg@alap3.anarazel.de
2022-02-25 17:04:39 -08:00
Peter Geoghegan 73c61a50a1 vacuumlazy.c: Remove obsolete num_tuples field.
Commit 49c9d9fc unified VACUUM VERBOSE and autovacuum logging.  It
neglected to remove an old vacrel field that was only used by the old
VACUUM VERBOSE, so remove it now.

The previous num_tuples approach doesn't seem to have any real advantage
over the approach VACUUM VERBOSE takes now (also the approach used by
the autovacuum logging code), which is to show new_rel_tuples.
new_rel_tuples is the possibly-estimated total number of tuples left in
the table, whereas num_tuples meant the number of tuples encountered
during the VACUUM operation, after pruning, without regard for tuples
from pages skipped via the visibility map.

In passing, reorder a related vacrel field for consistency.
2022-02-24 19:01:54 -08:00
Peter Geoghegan cf879d3069 Remove unnecessary heap_tuple_needs_freeze argument.
The buffer argument hasn't been used since the function was first added
by commit bbb6e559c4.  The sibling heap_prepare_freeze_tuple function
doesn't have such an argument either.  Remove it.
2022-02-24 18:31:07 -08:00
Heikki Linnakangas 6c46e8a5df Fix data loss on crash after sorted GiST index build.
If a checkpoint happens during sorted GiST index build, and the system
crashes after the checkpoint and after the index build has finished,
the data written to the index before the checkpoint started could be
lost. The checkpoint won't fsync it, and it won't be replayed at crash
recovery either. Fix by calling smgrimmedsync() after the index build,
just like in B-tree index build.

Backpatch to v14 where the sorted GiST index build was introduced.

Reported-by: Melanie Plageman
Discussion: https://www.postgresql.org/message-id/CAAKRu_ZJJynimxKj5xYBSziL62-iEtPE+fx-B=JzR=jUtP92mw@mail.gmail.com
2022-02-24 16:15:12 +02:00
Michael Paquier e77216fcb0 Simplify more checks related to set-returning functions
This makes more consistent the SRF-related checks in the area of
PL/pgSQL, PL/Perl, PL/Tcl, pageinspect and some of the JSON worker
functions, making it easier to grep for the same error patterns through
the code, reducing a bit the translation work.

It is worth noting that each_worker_jsonb()/each_worker() in jsonfuncs.c
and pageinspect's brin_page_items() were doing a check on expectedDesc
that is not required as they fetch their tuple descriptor directly from
get_call_result_type().  This looks like a set of copy-paste errors that
have spread over the years.

This commit is a continuation of the changes begun in 07daca5, for any
remaining code paths on sight.  Like fcc2817, this makes the code more
consistent, easing the integration of a larger patch that will refactor
the way tuplestores are created and checked in a good portion of the
set-returning functions present in core.

I have worked my way through the changes of this patch by myself, and
Ranier has proposed the same changes in a different thread in parallel,
though there were some inconsistencies related in expectedDesc in what
was proposed by him.

Author: Michael Paquier, Ranier Vilela
Discussion: https://postgr.es/m/CAAKRu_azyd1Z3W_r7Ou4sorTjRCs+PxeHw1CWJeXKofkE6TuZg@mail.gmail.com
Discussion: https://postgr.es/m/CAEudQApm=AFuJjEHLBjBcJbxcw4pBMwg2sHwXyCXYcbBOj3hpg@mail.gmail.com
2022-02-24 16:54:59 +09:00
Michael Paquier fcc28178c6 Clean up and simplify code in a couple of set-returning functions
The following set-returning functions have their logic simplified, to be
more consistent with other in-core areas:
- pg_prepared_statement()'s tuple descriptor is now created with
get_call_result_type() instead of being created from scratch, saving
from some duplication with pg_proc.dat.
- show_all_file_settings(), similarly, now uses get_call_result_type()
to build its tuple descriptor instead of creating it from scratch.
- pg_options_to_table() made use of a static routine called only once.
This commit removes this internal routine to make the function easier to
follow.
- pg_config() was using a unique logic style, doing checks on the tuple
descriptor passed down in expectedDesc, but it has no need to do so.
This switches the function to use a tuplestore with a tuple descriptor
retrieved from get_call_result_type(), instead.

This simplifies an upcoming patch aimed at refactoring the way
tuplestores are created and checked in set-returning functions, this
change making sense as its own independent cleanup by shaving some
code.

Author: Melanie Plageman, Michael Paquier
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/CAAKRu_azyd1Z3W_r7Ou4sorTjRCs+PxeHw1CWJeXKofkE6TuZg@mail.gmail.com
2022-02-24 16:11:34 +09:00
Tom Lane bd74c4037c Re-allow underscore as first character of custom GUC names.
Commit 3db826bd5 intended that valid_custom_variable_name's
rules for valid identifiers match those of scan.l.  However,
I (tgl) had some kind of brain fade and put "_" in the wrong
list.

Fix by Japin Li, per bug #17415 from Daniel Polski.

Discussion: https://postgr.es/m/17415-ebdb683d7e09a51c@postgresql.org
2022-02-23 11:10:46 -05:00
Daniel Gustafsson 2313a3ee22 Fix statenames in mergejoin comments
The names in the comments were on a few states not consistent with
the documented state.

Author: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/CALNJ-vQVthfQXVqmrHR8BKHtC4fMGbhM1xbvJNJAPexTq_dH=w@mail.gmail.com
2022-02-23 10:54:03 +01:00
Andres Freund afdeff1052 Add temporary debug info to help debug 019_replslot_limit.pl failures.
I have not been able to reproduce the occasional failures of
019_replslot_limit.pl we are seeing in the buildfarm and not for lack of
trying. The additional logging and increased log level will hopefully help.

Will be reverted once the cause is identified.

Discussion: https://postgr.es/m/20220218231415.c4plkp4i3reqcwip@alap3.anarazel.de
2022-02-22 18:02:34 -08:00
Amit Kapila 52e4f0cd47 Allow specifying row filters for logical replication of tables.
This feature adds row filtering for publication tables. When a publication
is defined or modified, an optional WHERE clause can be specified. Rows
that don't satisfy this WHERE clause will be filtered out. This allows a
set of tables to be partially replicated. The row filter is per table. A
new row filter can be added simply by specifying a WHERE clause after the
table name. The WHERE clause must be enclosed by parentheses.

The row filter WHERE clause for a table added to a publication that
publishes UPDATE and/or DELETE operations must contain only columns that
are covered by REPLICA IDENTITY. The row filter WHERE clause for a table
added to a publication that publishes INSERT can use any column. If the
row filter evaluates to NULL, it is regarded as "false". The WHERE clause
only allows simple expressions that don't have user-defined functions,
user-defined operators, user-defined types, user-defined collations,
non-immutable built-in functions, or references to system columns. These
restrictions could be addressed in the future.

If you choose to do the initial table synchronization, only data that
satisfies the row filters is copied to the subscriber. If the subscription
has several publications in which a table has been published with
different WHERE clauses, rows that satisfy ANY of the expressions will be
copied. If a subscriber is a pre-15 version, the initial table
synchronization won't use row filters even if they are defined in the
publisher.

The row filters are applied before publishing the changes. If the
subscription has several publications in which the same table has been
published with different filters (for the same publish operation), those
expressions get OR'ed together so that rows satisfying any of the
expressions will be replicated.

This means all the other filters become redundant if (a) one of the
publications have no filter at all, (b) one of the publications was
created using FOR ALL TABLES, (c) one of the publications was created
using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema.

If your publication contains a partitioned table, the publication
parameter publish_via_partition_root determines if it uses the partition's
row filter (if the parameter is false, the default) or the root
partitioned table's row filter.

Psql commands \dRp+ and \d <table-name> will display any row filters.

Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian
Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang
Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
2022-02-22 08:11:50 +05:30
Michael Paquier ebf6c5249b Add compute_query_id = regress
"regress" is a new mode added to compute_query_id aimed at facilitating
regression testing when a module computing query IDs is loaded into the
backend, like pg_stat_statements.  It works the same way as "auto",
meaning that query IDs are computed if a module enables it, except that
query IDs are hidden in EXPLAIN outputs to ensure regression output
stability.

Like any GUCs of the kind (force_parallel_mode, etc.), this new
configuration can be added to an instance's postgresql.conf, or just
passed down with PGOPTIONS at command level.  compute_query_id uses an
enum for its set of option values, meaning that this addition ensures
ABI compatibility.

Using this new configuration mode allows installcheck-world to pass when
running the tests on an instance with pg_stat_statements enabled,
stabilizing the test output while checking the paths doing query ID
computations.

Reported-by: Anton Melnikov
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/1634283396.372373993@f75.i.mail.ru
Discussion: https://postgr.es/m/YgHlxgc/OimuPYhH@paquier.xyz
Backpatch-through: 14
2022-02-22 10:22:15 +09:00
Tom Lane 88103567cb Disallow setting bogus GUCs within an extension's reserved namespace.
Commit 75d22069e tried to throw a warning for setting a custom GUC whose
prefix belongs to a previously-loaded extension, if there is no such GUC
defined by the extension.  But that caused unstable behavior with
parallel workers, because workers don't necessarily load extensions and
GUCs in the same order their leader did.  To make that work safely, we
have to completely disallow the case.  We now actually remove any such
GUCs at the time of initial extension load, and then throw an error not
just a warning if you try to add one later.  While this might create a
compatibility issue for a few people, the improvement in error-detection
capability seems worth it; it's hard to believe that there's any good
use-case for choosing such GUC names.

This also un-reverts 5609cc01c (Rename EmitWarningsOnPlaceholders() to
MarkGUCPrefixReserved()), since that function's old name is now even
more of a misnomer.

Florin Irion and Tom Lane

Discussion: https://postgr.es/m/1902182.1640711215@sss.pgh.pa.us
2022-02-21 14:10:43 -05:00
Andres Freund 2776922201 Assert in init_toast_snapshot() that some snapshot registered or active.
Commit <FIXME> fixed the bug that RemoveTempRelationsCallback() did not
push/register a snapshot. That only went unnoticed because often a valid
catalog snapshot exists and is returned by GetOldestSnapshot(). But due to
invalidation processing that is not reliable.

Thus assert in init_toast_snapshot() that there is a registered or active
snapshot, using the new HaveRegisteredOrActiveSnapshot().

Author: Andres Freund
Discussion: https://postgr.es/m/20220219180002.6tubjq7iw7m52bgd@alap3.anarazel.de
2022-02-21 08:58:29 -08:00
Andres Freund 7c38ef2a5d Fix temporary object cleanup failing due to toast access without snapshot.
When cleaning up temporary objects during process exit the cleanup could fail
with:
  FATAL: cannot fetch toast data without an active snapshot

The bug is caused by RemoveTempRelationsCallback() not setting up a
snapshot. If an object with toasted catalog data needs to be cleaned up,
init_toast_snapshot() could fail with the above error.

Most of the time however the the problem is masked due to cached catalog
snapshots being returned by GetOldestSnapshot(). But dropping an object can
cause catalog invalidations to be emitted. If no further catalog accesses are
necessary between the invalidation processing and the next toast datum
deletion, the bug becomes visible.

It's easy to miss this bug because it typically happens after clients
disconnect and the FATAL error just ends up in the log.

Luckily temporary table cleanup at the next use of the same temporary schema
or during DISCARD ALL does not have the same problem.

Fix the bug by pushing a snapshot in RemoveTempRelationsCallback(). Also add
isolation tests for temporary object cleanup, including objects with toasted
catalog data.

A future HEAD only commit will add an assertion trying to make this more
visible.

Reported-By: Miles Delahunty
Author: Andres Freund
Discussion: https://postgr.es/m/CAOFAq3BU5Mf2TTvu8D9n_ZOoFAeQswuzk7yziAb7xuw_qyw5gw@mail.gmail.com
Backpatch: 10-
2022-02-21 08:57:34 -08:00
Peter Eisentraut 5c868c92ca Fix possible null pointer reference
Per Coverity.  Introduced in 37851a8b83.
2022-02-21 09:42:46 +01:00
Andres Freund fbabdf8f9a Fix meaning-changing typo introduced in fa0e03c15a. 2022-02-20 13:51:36 -08:00
Heikki Linnakangas 69639e2b5c Fix uninitialized variable.
I'm very surprised the compiler didn't warn about it. But Coverity and
Valgrind did.
2022-02-20 18:33:50 +02:00
John Naylor 4b35408f1e Use bitwise rotate functions in more places
There were a number of places in the code that used bespoke bit-twiddling
expressions to do bitwise rotation. While we've had pg_rotate_right32()
for a while now, we hadn't gotten around to standardizing on that. Do so
now. Since many potential call sites look more natural with the "left"
equivalent, add that function too.

Reviewed by Tom Lane and Yugo Nagata

Discussion:
https://www.postgresql.org/message-id/CAFBsxsH7c1LC0CGZ0ADCBXLHU5-%3DKNXx-r7tHYPAW51b2HK4Qw%40mail.gmail.com
2022-02-20 13:22:08 +07:00
Michael Paquier 07daca53bf Fix inconsistencies in SRF checks of pg_config() and string_to_table()
The execution paths of those functions have been using a set of checks
inconsistent with any other SRF function:
- string_to_table() missed a check on expectedDesc, the tuple descriptor
expected by the caller, that should never be NULL.  Introduced in
66f1630.
- pg_config() should check for a ReturnSetInfo, and expectedDesc cannot
be NULL.  Its error messages were also inconsistent.  Introduced in
a5c43b8.

Extracted from a larger patch by the same author, in preparation for a
larger patch set aimed at refactoring the way tuplestores are created
and checked in SRF functions.

Author: Melanie Plageman
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/CAAKRu_azyd1Z3W_r7Ou4sorTjRCs+PxeHw1CWJeXKofkE6TuZg@mail.gmail.com
2022-02-19 14:58:51 +09:00
Tom Lane de447bb8e6 Suppress warning about stack_base_ptr with late-model GCC.
GCC 12 complains that set_stack_base is storing the address of
a local variable in a long-lived pointer.  This is an entirely
reasonable warning (indeed, it just helped us find a bug);
but that behavior is intentional here.  We can work around it
by using __builtin_frame_address(0) instead of a specific local
variable; that produces an address a dozen or so bytes different,
in my testing, but we don't care about such a small difference.
Maybe someday a compiler lacking that function will start to issue
a similar warning, but we'll worry about that when it happens.

Patch by me, per a suggestion from Andres Freund.  Back-patch to
v12, which is as far back as the patch will go without some pain.
(Recently-established project policy would permit a back-patch as
far as 9.2, but I'm disinclined to expend the work until GCC 12
is much more widespread.)

Discussion: https://postgr.es/m/3773792.1645141467@sss.pgh.pa.us
2022-02-17 22:46:01 -05:00
Fujii Masao f927a6ec3e Fix comment in CheckIndexCompatible().
Commit 5f173040 removed the parameter "heapRelation" from
CheckIndexCompatible(), but forgot to remove the mention of it
from the comment. This commit removes that unnecessary mention.

Also this commit adds the missing mention of the parameter "oldId"
in the comment.

Author: Yugo Nagata
Reviewed-by: Nathan Bossart, Fujii Masao
Discussion: https://postgr.es/m/20220204014634.b39314f278ff4ae3de96e201@sraoss.co.jp
2022-02-18 12:19:10 +09:00
Amit Kapila c476f380e2 Fix a comment in worker.c.
The comment incorrectly states that worker gets killed during
ALTER SUBSCRIPTION ... DISABLE. Remove that part of the comment.

Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoCbEN==oH7BhP3U6WPHg3zgH6sDOeKhJjy4W2dx-qoVCw@mail.gmail.com
2022-02-18 07:46:51 +05:30
Peter Geoghegan 8f388f6f55 Increase hash_mem_multiplier default to 2.0.
Double the default setting for hash_mem_multiplier, from 1.0 to 2.0.
This setting makes hash-based executor nodes use twice the usual
work_mem limit.

The PostgreSQL 15 release notes should have a compatibility note about
this change.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzndc_ROk6CY-bC6p9O53q974Y0Ey4WX8jcPbuTZYM4Q3A@mail.gmail.com
2022-02-16 18:41:52 -08:00
Peter Geoghegan 74388a1ac3 Avoid VACUUM reltuples distortion.
Add a heuristic that avoids distortion in the pg_class.reltuples
estimates used by VACUUM.  Without the heuristic, successive manually
run VACUUM commands (run against a table that is never modified after
initial bulk loading) will scan the same page in each VACUUM operation.
Eventually pg_class.reltuples may reach the point where one single heap
page is accidentally considered highly representative of the entire
table.  This is likely to be completely wrong, since the last heap page
typically has fewer tuples than average for the table.

It's not obvious that this was a problem prior to commit 44fa8488, which
made vacuumlazy.c consistently scan the last heap page (even when it is
all-visible in the visibility map).  It seems possible that there were
more subtle variants of the same problem that went unnoticed for quite
some time, though.  Commit 44fa8488 simplified certain aspects of when
and how relation truncation was considered, but it did not introduce the
"scan the last page" behavior.  Essentially the same behavior was
introduced much earlier, in commit e8429082.  It was conditioned on
whether or not truncation looked promising towards the end of the
initial heap pass by VACUUM until recently, which was at least somewhat
protective.  That doesn't seem like something that we should be relying
on, though.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkNKORurux459M64mR63Aw4Jq7MBRVcX=CvALqN3A88WA@mail.gmail.com
2022-02-16 17:15:50 -08:00
Michael Paquier d61a361d1a Remove all traces of tuplestore_donestoring() in the C code
This routine is a no-op since dd04e95 from 2003, with a macro kept
around for compatibility purposes.  This has led to the same code
patterns being copy-pasted around for no effect, sometimes in confusing
ways like in pg_logical_slot_get_changes_guts() from logical.c where the
code was actually incorrect.

This issue has been discussed on two different threads recently, so
rather than living with this legacy, remove any uses of this routine in
the C code to simplify things.  The compatibility macro is kept to avoid
breaking any out-of-core modules that depend on it.

Reported-by: Tatsuhito Kasahara, Justin Pryzby
Author: Tatsuhito Kasahara
Discussion: https://postgr.es/m/20211217200419.GQ17618@telsasoft.com
Discussion: https://postgr.es/m/CAP0=ZVJeeYfAeRfmzqAF2Lumdiv4S4FewyBnZd4DPTrsSQKJKw@mail.gmail.com
2022-02-17 09:52:02 +09:00
Heikki Linnakangas 4620892344 Fix bogus log message when starting from a cleanly shut down state.
In commit 70e81861fa to split xlog.c, I moved the startup code that
updates the state in the control file and prints out the "database
system was not properly shut down" message to the log, but I
accidentally removed the "if (InRecovery)" check around it. As a
result, that message was printed even if the system was cleanly shut
down, also during 'initdb'.

Discussion: https://www.postgresql.org/message-id/3357075.1645031062@sss.pgh.pa.us
2022-02-16 23:15:08 +02:00
John Naylor 01ad1c9530 Add missing TYPEALIGN macros
A couple call sites still had hard-coded characters.

Amul Sul

Discussion: https://www.postgresql.org/message-id/CAAJ_b94Y35MWB3PJoCbc_O-_Q4%2B-9DHKhWtAwboEyx8wm4mqcA%40mail.gmail.com
2022-02-16 19:33:28 +07:00
Heikki Linnakangas 9ed87a78e0 Fix read beyond buffer bug introduced by the split xlog.c patch.
FinishWalRecovery() copied the valid part of the last WAL block into a
palloc'd buffer, and the code in StartupXLOG() copied it to the WAL
buffer. But the memcpy in StartupXLOG() copied a full 8kB block, not
just the valid part, i.e. it copied from beyond the end of the buffer.
The invalid part was cleared immediately afterwards, so as long as the
memory was allocated and didn't segfault, it didn't do any harm, but
it can definitely segfault.

Discussion: https://www.postgresql.org/message-id/efc12e32-5af2-3485-5b1d-5af9f707491a@iki.fi
2022-02-16 12:01:32 +02:00
Peter Eisentraut 2549f0661b Reject trailing junk after numeric literals
After this, the PostgreSQL lexers no longer accept numeric literals
with trailing non-digits, such as 123abc, which would be scanned as
two tokens: 123 and abc.  This is undocumented and surprising, and it
might also interfere with some extended numeric literal syntax being
contemplated for the future.

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
2022-02-16 10:37:31 +01:00
Heikki Linnakangas 70e81861fa Split xlog.c into xlog.c and xlogrecovery.c.
This moves the functions related to performing WAL recovery into the new
xlogrecovery.c source file, leaving xlog.c responsible for maintaining
the WAL buffers, coordinating the startup and switch from recovery to
normal operations, and other miscellaneous stuff that have always been in
xlog.c.

Reviewed-by: Andres Freund, Kyotaro Horiguchi, Robert Haas
Discussion: https://www.postgresql.org/message-id/a31f27b4-a31d-f976-6217-2b03be646ffa%40iki.fi
2022-02-16 09:30:38 +02:00
Heikki Linnakangas be1c00ab13 Move code around in StartupXLOG().
This is in preparation for the next commit, which will split off
recovery-related code from xlog.c into a new source file. This is the
order that things will happen with the next commit, and the point of
this commit is to make these ordering changes more explicit, while the
next commit mechanically moves the source code to the new file. To aid
review, I added "BEGIN/END function" comments to mark which blocks of
code are moved to which functions in the next commit. They will be gone
in the next commit.

Reviewed-by: Andres Freund, Kyotaro Horiguchi, Robert Haas
Discussion: https://www.postgresql.org/message-id/a31f27b4-a31d-f976-6217-2b03be646ffa%40iki.fi
2022-02-16 09:22:44 +02:00
Heikki Linnakangas b3a5d01c05 Refactor setting XLP_FIRST_IS_OVERWRITE_CONTRECORD.
Set it directly in CreateOverwriteContrecordRecord(). That way,
AdvanceXLInsertBuffer() doesn't need the missingContrecPtr global
variable. This is in preparation for splitting xlog.c into multiple
files.

Reviewed-by: Robert Haas
Discussion: https://www.postgresql.org/message-id/a462d79c-cb5a-47cc-e9ac-616b5003965f%40iki.fi
2022-02-16 09:22:41 +02:00
Heikki Linnakangas d231be00cb Run pgindent on xlog.c.
To tidy up after some recent refactorings in xlog.c. These would be
fixed by the pgindent run we do at the end of the development cycle,
but I want to clean these up now as I'm about to do some more big
refactorings on xlog.c.
2022-02-16 09:22:34 +02:00
Peter Geoghegan 988ffc3063 Update "don't truncate with failsafe" rationale.
There is a very good (though non-obvious) reason to avoid relation
truncation during a VACUUM that has triggered the failsafe mechanism,
which was missed before now.  Update related comments, so this isn't
forgotten.

Reported-By: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/CAFBsxsFiMPxQ-dHZ8tOgktn=+ffeJT3+GinZ4zdOGbmAnCYadA@mail.gmail.com
2022-02-15 15:16:19 -08:00
Tom Lane 3b0ee7f583 Ensure that length argument of memcmp() isn't seen as negative.
I think this will shut up a weird warning from buildfarm member
serinus.  Perhaps it'd be better to change tsCompareString's
length arguments to unsigned, but that seems more invasive
than is justified.

Part of a general push to remove off-the-beaten-track warnings
where we can easily do so.
2022-02-15 17:28:17 -05:00
Tom Lane 4c1a1a347a Ensure that the argument of shmdt(2) is declared "void *".
Our gcc-on-Solaris buildfarm members emit "incompatible pointer type"
warnings in places where it's not.  This is a bit odd, since AFAICT
Solaris follows the POSIX spec in declaring shmdt's argument as
"const void *", and you'd think any pointer argument would satisfy that.
But whatever.  Part of a general push to remove off-the-beaten-track
warnings where we can easily do so.
2022-02-15 17:17:28 -05:00
Tom Lane 2523928b28 Reject change of output-column collation in CREATE OR REPLACE VIEW.
checkViewTupleDesc() didn't get the memo that it should verify
same attcollation along with same type/typmod.  (A quick scan
did not find other similar oversights.)

Per bug #17404 from Pierre-Aurélien Georges.  On another day
I might've back-patched this, but today I'm feeling paranoid
about unnecessary behavioral changes in back branches.

Discussion: https://postgr.es/m/17404-8a4a270ef30a6709@postgresql.org
2022-02-15 12:57:44 -05:00
Peter Eisentraut 797129e591 Remove IS_AF_UNIX macro
The AF_UNIX macro was being used unprotected by HAVE_UNIX_SOCKETS,
apparently since 2008.  So the redirection through IS_AF_UNIX() is
apparently no longer necessary.  (More generally, all supported
platforms are now HAVE_UNIX_SOCKETS, but even if there were a new
platform in the future, it seems plausible that it would define the
AF_UNIX symbol even without kernel support.)  So remove the
IS_AF_UNIX() macro and make the code a bit more consistent.

Discussion: https://www.postgresql.org/message-id/flat/f2d26815-9832-e333-d52d-72fbc0ade896%40enterprisedb.com
2022-02-15 10:16:34 +01:00
Peter Eisentraut 73508475d6 Remove pg_atoi()
The last caller was int2vectorin(), and having such a general function
for one user didn't seem useful, so just put the required parts inline
and remove the function.

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
2022-02-15 07:44:26 +01:00
Andres Freund 2f6501fa3c Move replication slot release to before_shmem_exit().
Previously, replication slots were released in ProcKill() on error, resulting
in reporting replication slot drop of ephemeral slots after the stats
subsystem was already shut down.

To fix this problem, move replication slot release to a before_shmem_exit()
hook that is called before the stats collector shuts down. There wasn't really
a good reason for the slot handling to be in ProcKill() anyway.

Patch by Masahiko Sawada, with very minor polishing by me.

I, Andres, wrote a test for dropping slots during process exit, but there may
be some OS dependent issues around the number of times FATAL error messages
are displayed due to a still debated libpq issue. So that test will be
committed separately / later.

Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com
2022-02-14 17:08:17 -08:00
Peter Eisentraut b45fa79340 Remove one use of pg_atoi()
There was no real need to use this here instead of a simpler API.

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
2022-02-14 23:07:35 +01:00
Peter Eisentraut cfc7191dfe Move scanint8() to numutils.c
Move scanint8() to numutils.c and rename to pg_strtoint64().  We
already have a "16" and "32" version of that, and the code inside the
functions was aligned, so this move makes all three versions
consistent.  The API is also changed to no longer provide the errorOK
case.  Users that need the error checking can use strtoi64().

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
2022-02-14 21:57:26 +01:00
Peter Eisentraut 1383d52faa Add missing node support functions
forgotten in 37851a8b83
2022-02-14 09:11:13 +01:00
Peter Eisentraut 37851a8b83 Database-level collation version tracking
This adds to database objects the same version tracking that collation
objects have.  There is a new pg_database column datcollversion that
stores the version, a new function
pg_database_collation_actual_version() to get the version from the
operating system, and a new subcommand ALTER DATABASE ... REFRESH
COLLATION VERSION.

This was not originally added together with pg_collation.collversion,
since originally version tracking was only supported for ICU, and ICU
on a database-level is not currently supported.  But we now have
version tracking for glibc (since PG13), FreeBSD (since PG14), and
Windows (since PG13), so this is useful to have now.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f0ff3190-29a3-5b39-a179-fa32eee57db6%40enterprisedb.com
2022-02-14 08:27:26 +01:00
Thomas Munro cba5b994c9 Use WL_SOCKET_CLOSED for client_connection_check_interval.
Previously we used poll() directly to check for a POLLRDHUP event.
Instead, use the WaitEventSet API to poll the socket for
WL_SOCKET_CLOSED, which knows how to detect this condition on many more
operating systems.

Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Maksim Milyutin <milyutinma@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
2022-02-14 16:52:23 +13:00
Thomas Munro 50e570a59e Add WL_SOCKET_CLOSED for socket shutdown events.
Provide a way for WaitEventSet to report that the remote peer has shut
down its socket, independently of whether there is any buffered data
remaining to be read.  This works only on systems where the kernel
exposes that information, namely:

* WAIT_USE_POLL builds using POLLRDHUP, if available
* WAIT_USE_EPOLL builds using EPOLLRDHUP
* WAIT_USE_KQUEUE builds using EV_EOF

Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Maksim Milyutin <milyutinma@gmail.com>
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
2022-02-14 16:52:23 +13:00
Amit Kapila 5e01001ffb WAL log unchanged toasted replica identity key attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.

Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera, Haiying Tang, Andres Freund
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
2022-02-14 08:55:58 +05:30
Thomas Munro 0052fb4890 Track LLVM 15 changes.
This isn't an API change, it's just a missing #include that we got away
with before.  Per buildfarm animal seawasp.
2022-02-14 15:51:43 +13:00
John Naylor b19a7e392a Correct Makefile dependencies for catalog scripts
At some point, Gen_fmgrtab.pl stopped needing the value of defined symbols
from access/transam.h, while genbki.pl starting doing so. The Makefiles
didn't get the memo, so update the relevant dependencies.
2022-02-14 09:07:09 +07:00
Alexander Korotkov 3f74daa8df Fix memory leak in IndexScan node with reordering
Fix ExecReScanIndexScan() to free the referenced tuples while emptying the
priority queue.  Backpatch to all supported versions.

Discussion: https://postgr.es/m/CAHqSB9gECMENBQmpbv5rvmT3HTaORmMK3Ukg73DsX5H7EJV7jw%40mail.gmail.com
Author: Aliaksandr Kalenik
Reviewed-by: Tom Lane, Alexander Korotkov
Backpatch-through: 10
2022-02-14 04:17:04 +03:00
Michael Paquier c963e84fb8 Make origin data initialization consistent other fields in 2PC header
As of 1eb6d65, the origin data is optionally stored in a 2PC file
header, with the data filled in EndPrepare() even in the default case
where there is no origin data to add.  This was inconsistent with all
the other fields of TwoPhaseFileHeader which are initialized in
StartPrepare(), so move the initialization of origin_lsn and
origin_timestamp there instead.  The effect of missing the
initialization at this early stage is only cosmetic based on the current
logic of the code, but could have led to issues in the long-term, and it
is more consistent done this way.

Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAooECJ+gU_RZB-yhioPOV94R4ucoHAf68PiJhLpgpVpBw@mail.gmail.com
2022-02-14 09:30:35 +09:00
Tom Lane 994d76707a Fix misuse of "const" qualifier.
"const foo *" is quite different from "foo * const".
This code was evidently trying to avoid casting away
const from the arguments, but entirely failed to do so.

Per study of some buildfarm warnings from anole
(which unfortunately are mostly ignorable, since it
seems not to understand "restrict" very well).
I'm surprised though that nothing else has complained.
2022-02-13 19:20:56 -05:00
Tom Lane 302612a6c7 Silence minor compiler warnings.
Depending on compiler version and optimization level, we might
get a complaint that lazy_scan_heap's "freespace" is used
uninitialized.

Compilers not aware that ereport(ERROR) doesn't return complained
about bbsink_lz4_new().

Assigning "-1" to a uint64 value has unportable results; fortunately,
the value of xlogreadsegno is unimportant when xlogreadfd is -1.
(It looks to me like there is no need for xlogreadsegno to be static
in the first place, but I didn't venture to change that.)
2022-02-13 13:06:55 -05:00
Peter Geoghegan efa4a9462a Consolidate VACUUM xid cutoff logic.
Push the logic for determining whether or not a VACUUM operation will be
aggressive down into vacuum_set_xid_limits().  This makes the function's
signature significantly simpler, and seems clearer overall.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com
2022-02-11 18:26:15 -08:00
Peter Geoghegan 872770fd6c Add VACUUM instrumentation for scanned pages, relfrozenxid.
Report on scanned pages within VACUUM VERBOSE and autovacuum logging.
These are pages that were physically examined during the VACUUM
operation.  Note that this can include a small number of pages that were
marked all-visible in the visibility map by some earlier VACUUM
operation.  VACUUM won't skip all-visible pages that aren't part of a
range of all-visible pages that's at least 32 blocks in length (partly
to avoid missing out on opportunities to advance relfrozenxid during
non-aggressive VACUUMs).

Commit 44fa8488 simplified the definition of scanned pages.  It became
the complement of the pages (of those pages from rel_pages) that were
skipped using the visibility map.  And so scanned pages precisely
indicates how effective the visibility map was at saving work.  (Before
now we displayed the number of pages skipped via the visibility map when
happened to be frozen pages, but not when they were merely all-visible,
which was less useful to users.)

Rename the user-visible OldestXmin output field to "removal cutoff", and
show some supplementary information: how far behind the cutoff is
(number of XIDs behind) by the time the VACUUM operation finished.  This
will help users to figure out what's _not_ working in extreme cases
where VACUUM is fundamentally unable to remove dead tuples or freeze
older tuples (e.g., due to a leaked replication slot).  Also report when
relfrozenxid is advanced by VACUUM in output that immediately follows
"removal cutoff".  This structure is intended to highlight the
relationship between the new relfrozenxid value for the table, and the
VACUUM operation's removal cutoff.

Finally, add instrumentation of "missed dead tuples", and the number of
pages that had at least one such tuple.  These are fully DEAD (not just
RECENTLY_DEAD) tuples with storage that could not be pruned due to
failure to acquire a cleanup lock on a heap page.  This is a replacement
for the "skipped due to pin" instrumentation removed by commit 44fa8488.
It shows more details than before for pages where failing to get a
cleanup lock actually resulted in VACUUM missing out on useful work, but
usually shows nothing at all instead (the mere fact that we couldn't get
a cleanup lock is usually of no consequence whatsoever now).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wznp=c=Opj8Z7RMR3G=ec3_JfGYMN_YvmCEjoPCHzWbx0g@mail.gmail.com
2022-02-11 16:48:40 -08:00
Peter Geoghegan 44fa84881f Simplify lazy_scan_heap's handling of scanned pages.
Redefine a scanned page as any heap page that actually gets pinned by
VACUUM's first pass over the heap, regardless of whether or not the page
was cleanup locked.  Although it's fundamentally impossible to prune a
heap page without a cleanup lock (since we cannot safely defragment the
page), we can do just about everything else.  The only notable further
exception is freezing tuples, though even that is arguably a consequence
of not being able to prune (not a separate issue).

VACUUM now does as much of the same processing as possible for pages
that could not be cleanup locked.  Any failure to do specific required
processing is treated as a special case exception, which will be rare in
practice.  We now collect any preexisting LP_DEAD items (left behind by
earlier opportunistic pruning) in the dead_items array for these heap
pages, and count their tuples in the usual way.  Steps used to decide if
we'll attempt relation truncation are performed in the usual way for
no-cleanup-lock scanned pages, too.

Although eliminating these special cases is intrinsically useful, it's
even more useful as an enabler of further simplifications.  The only
essential difference between aggressive and non-aggressive is that only
aggressive is _guaranteed_ to be able to advance relfrozenxid up to
FreezeLimit.  Advancing relfrozenxid is always useful, but before now
non-aggressive VACUUMs threw away the opportunity to do so whenever a
cleanup lock could not be acquired on any page, no matter what the
details were.  This was very pessimistic.

It isn't actually necessary to "behave aggressively" to maintain the
ability to advance relfrozenxid when a cleanup lock isn't immediately
available (most of the time).  The non-aggressive case will now make
sure that it isn't safe to advance relfrozenxid (without waiting) using
only a share lock.  It will usually notice that there are no tuples that
need to be frozen anyway, just like in the aggressive case -- and so it
no longer wastes an opportunity to advance relfrozenxid over nothing.
(The non-aggressive case still won't wait for a cleanup lock when there
really are tuples on the page that need to be frozen, since that really
would amount to "behaving aggressively".)

VACUUM currently has a tendency to set heap pages to all-visible in the
visibility map before it freezes all of the tuples on the page.  Only a
subsequent aggressive VACUUM will visit these pages to freeze their
tuples, usually only when the tuple XIDs are much older than the
vacuum_freeze_min_age GUC (FreezeLimit cutoff) is supposed to allow.
And so non-aggressive VACUUMs are still far less likely to be able to
advance relfrozenxid in practice, even with the enhancements from this
commit.  This remaining issue will be addressed by future work that
overhauls the criteria for freezing tuples.  Once that's in place,
almost every VACUUM operation will be able to advance relfrozenxid in
practice.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wznp=c=Opj8Z7RMR3G=ec3_JfGYMN_YvmCEjoPCHzWbx0g@mail.gmail.com
2022-02-11 14:32:17 -08:00
Thomas Munro 4eb2176318 Fix DROP {DATABASE,TABLESPACE} on Windows.
Previously, it was possible for DROP DATABASE, DROP TABLESPACE and ALTER
DATABASE SET TABLESPACE to fail because other backends still had file
handles open for dropped tables.  Windows won't allow a directory
containing unlinked-but-still-open files to be unlinked.  Tackle this
problem by forcing all backends to close all smgr fds.  No change for
Unix systems, which don't suffer from the problem, but the new code path
can be tested by Unix-based developers by defining
USE_BARRIER_SMGRRELEASE explicitly.

It's possible that PROCSIGNAL_BARRIER_SMGRRELEASE will have more
bug-fixing applications soon (under discussion).  Note that this is the
first user of the ProcSignalBarrier mechanism from commit 16a4e4aec.  It
could in principle be back-patched as far as 14, but since field
complaints are rare and ProcSignalBarrier hasn't been battle-tested,
that seems like a bad idea.  Fix in master only, where these failures
have started to show up in automated testing due to new tests.

Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com
2022-02-12 10:21:23 +13:00
Tom Lane e5691cc917 Don't use_physical_tlist for an IOS with non-returnable columns.
createplan.c tries to save a runtime projection step by specifying
a scan plan node's output as being exactly the table's columns, or
index's columns in the case of an index-only scan, if there is not a
reason to do otherwise.  This logic did not previously pay attention
to whether an index's columns are returnable.  That worked, sort of
accidentally, until commit 9a3ddeb51 taught setrefs.c to reject plans
that try to read a non-returnable column.  I have no desire to loosen
setrefs.c's new check, so instead adjust use_physical_tlist() to not
try to optimize this way when there are non-returnable column(s).

Per report from Ryan Kelly.  Like the previous patch, back-patch
to all supported branches.

Discussion: https://postgr.es/m/CAHUie24ddN+pDNw7fkhNrjrwAX=fXXfGZZEHhRuofV_N_ftaSg@mail.gmail.com
2022-02-11 15:24:02 -05:00
Robert Haas dab298471f Add suport for server-side LZ4 base backup compression.
LZ4 compression can be a lot faster than gzip compression, so users
may prefer it even if the compression ratio is not as good. We will
want pg_basebackup to support LZ4 compression and decompression on the
client side as well, and there is a pending patch for that, but it's
by a different author, so I am committing this part separately for
that reason.

Jeevan Ladhe, reviewed by Tushar Ahuja and by me.

Discussion: http://postgr.es/m/CANm22Cg9cArXEaYgHVZhCnzPLfqXCZLAzjwTq7Fc0quXRPfbxA@mail.gmail.com
2022-02-11 08:29:38 -05:00
Tomas Vondra 0da92dc530 Logical decoding of sequences
This extends the logical decoding to also decode sequence increments.
We differentiate between sequences created in the current (in-progress)
transaction, and sequences created earlier. This mixed behavior is
necessary because while sequences are not transactional (increments are
not subject to ROLLBACK), relfilenode changes are. So we do this:

* Changes for sequences created in the same top-level transaction are
  treated as transactional, i.e. just like any other change from that
  transaction, and discarded in case of a rollback.

* Changes for sequences created earlier are applied immediately, as if
  performed outside any transaction. This applies also after ALTER
  SEQUENCE, which may create a new relfilenode.

Moreover, if we ever get support for DDL replication, the sequence
won't exist until the transaction gets applied.

Sequences created in the current transaction are tracked in a simple
hash table, identified by a relfilenode. That means a sequence may
already exist, but if a transaction does ALTER SEQUENCE then the
increments for the new relfilenode will be treated as transactional.

For each relfilenode we track the XID of (sub)transaction that created
it, which is needed for cleanup at transaction end. We don't need to
check the XID to decide if an increment is transactional - if we find a
match in the hash table, it has to be the same transaction.

This requires two minor changes to WAL-logging. Firstly, we need to
ensure the sequence record has a valid XID - until now the the increment
might have XID 0 if it was the first change in a subxact. But the
sequence might have been created in the same top-level transaction. So
we ensure the XID is assigned when WAL-logging increments.

The other change is addition of "created" flag, marking increments for
newly created relfilenodes. This makes it easier to maintain the hash
table of sequences that need transactional handling.
Note: This is needed because of subxacts. A XID 0 might still have the
sequence created in a different subxact of the same top-level xact.

This does not include any changes to test_decoding and/or the built-in
replication - those will be committed in separate patches.

A patch adding decoding of sequences was originally submitted by Cary
Huang. This commit reworks various important aspects (e.g. the WAL
logging and transactional/non-transactional handling). However, the
original patch and reviews were very useful.

Author: Tomas Vondra, Cary Huang
Reviewed-by: Peter Eisentraut, Hannu Krosing, Andres Freund
Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com
Discussion: https://postgr.es/m/1710ed7e13b.cd7177461430746.3372264562543607781@highgo.ca
2022-02-10 18:43:51 +01:00
Robert Haas 0d4513b613 Remove server support for the previous base backup protocol.
Commit cc333f3233 added a new COPY
sub-protocol for taking base backups, but retained support for the
previous protocol. For the same reasons articulated in the message
for commit 9cd28c2e5f, remove support
for the previous protocol from the server.

Discussion: http://postgr.es/m/CA+TgmoazKcKUWtqVa0xZqSzbKgTH+X-aw4V7GyLD68EpDLMh8A@mail.gmail.com
2022-02-10 12:12:43 -05:00
Tom Lane d37776e451 Make timeout.c more robust against missed timer interrupts.
Commit 09cf1d522 taught schedule_alarm() to not do anything if
the next requested event is after when we expect the next interrupt
to fire.  However, if somehow an interrupt gets lost, we'll continue
to not do anything indefinitely, even after the "next interrupt" time
is obviously in the past.  Thus, one missed interrupt can break
timeout scheduling for the life of the session.  Michael Harris
reported a scenario where a bug in a user-defined function caused this
to happen, so you don't even need to assume kernel bugs exist to think
this is worth fixing.  We can make things more robust at little cost
by detecting the case where signal_due_at is before "now" and forcing
a new setitimer call to occur.  This isn't a completely bulletproof
fix of course; but in our typical usage pattern where we frequently set
timeouts and clear them before they are reached, the interrupt will
get re-enabled after at most one timeout interval, which with a little
luck will be before we really need it.

While here, let's mark signal_due_at as volatile, since the signal
handler can both examine and set it.  I'm not sure there's any
actual risk given that signal_pending is already volatile, but
it's surely questionable.

Backpatch to v14 where this logic came in.

Michael Harris and Tom Lane

Discussion: https://postgr.es/m/CADofcAWbMrvgwSMqO4iG_iD3E2v8ZUrC-_crB41my=VMM02-CA@mail.gmail.com
2022-02-10 11:52:28 -05:00
Robert Haas 9cd28c2e5f Remove server support for old BASE_BACKUP command syntax.
Commit 0ba281cb4b added a new syntax
for the BASE_BACKUP command, with extensible options, but maintained
support for the legacy syntax. This isn't important for PostgreSQL,
where pg_basebackup works with older server versions but not newer
ones, but it could in theory matter for out-of-core users of the
replication protocol.

Discussion on pgsql-hackers, however, suggests that no one is aware
of any out-of-core use of the BASE_BACKUP command, and the consensus
is in favor of removing support for the old syntax to simplify the
code, so do that.

Discussion: http://postgr.es/m/CA+TgmoazKcKUWtqVa0xZqSzbKgTH+X-aw4V7GyLD68EpDLMh8A@mail.gmail.com
2022-02-10 10:48:33 -05:00
Peter Eisentraut f5744f1d1e Update comment
Update a comment that assumed that libc collations don't support
versioning.  Also improve an adjacent error message a bit.
2022-02-10 09:16:17 +01:00
Fujii Masao 400fc6b648 Add min() and max() aggregates for xid8.
Bump catalog version.

Author: Ken Kato
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/47d77b18c44f87f8222c4c7a3e2dee6b@oss.nttdata.com
2022-02-10 12:33:41 +09:00
Michael Paquier 0147fc7c8c Fix typo in multixact.c
Introduced in aa64f23.

Author: Nathan Bossart
Discussion: https://postgr.es/m/20220209175338.GB1627503@nathanxps13
2022-02-10 10:45:14 +09:00
Michael Paquier 4567596316 Reduce more the number of calls to GetMaxBackends()
Some of the code paths changed by aa64f23 can reduce the number of times
GetMaxBackends() is called.  The performance gain is marginal, but most
of the code changed by this commit already did that.  Hence, let's be
clean and apply the same rule everywhere, for consistency.

Some of the code paths, like in deadlock.c, involve only assertions.
These are left unchanged.

Reviewed-by: Nathan Bossart, Robert Haas
Discussion: https://postgr.es/m/YgMpGZhPOjNfS7er@paquier.xyz
2022-02-10 10:27:29 +09:00
Tom Lane c5f5b4dd4b Test honestly for <sys/signalfd.h>.
Commit 6a2a70a02 supposed that any platform having <sys/epoll.h>
would also have <sys/signalfd.h>.  It turns out there are still a
few people using platforms where that's not so, so we'd better make
a separate configure probe for it.  But since it took this long to
notice, I'm content with the decision to not have a separate code
path for epoll-only machines; we'll just fall back to using poll()
for these stragglers.

Per gripe from Gabriela Serventi.  Back-patch to v14 where this
code came in.

Discussion: https://postgr.es/m/CAHOHWE-JjJDfcYuLAAEO7Jk07atFAU47z8TzHzg71gbC0aMy=g@mail.gmail.com
2022-02-09 14:24:54 -05:00
Michael Paquier cf29a11ef6 Retire src/backend/utils/misc/check_guc
This script has existed for a long time, and attempting to run it today
causes a lot of false positives as an effect of GUCs added in the last
couple of years.  An equivalent, automatically-run and cross-platform
solution is available in the TAP test introduced in b0a55f4.  So, let it
go.

Discussion: https://postgr.es/m/Yf9YGSwPiMu0c7fP@paquier.xyz
2022-02-09 12:10:31 +09:00
Robert Haas aa64f23b02 Remove MaxBackends variable in favor of GetMaxBackends() function.
Previously, it was really easy to write code that accessed MaxBackends
before we'd actually initialized it, especially when coding up an
extension. To make this less error-prune, introduce a new function
GetMaxBackends() which should be used to obtain the correct value.
This will ERROR if called too early. Demote the global variable to
a file-level static, so that nobody can peak at it directly.

Nathan Bossart. Idea by Andres Freund. Review by Greg Sabino Mullane,
by Michael Paquier (who had doubts about the approach), and by me.

Discussion: http://postgr.es/m/20210802224204.bckcikl45uezv5e4@alap3.anarazel.de
2022-02-08 15:53:19 -05:00
Alexander Korotkov f1ea98a797 Reduce non-leaf keys overlap in GiST indexes produced by a sorted build
The GiST sorted build currently chooses split points according to the only page
space utilization.  That may lead to higher non-leaf keys overlap and, in turn,
slower search query answers.

This commit makes the sorted build use the opclass's picksplit method.  Once
four pages at the level are accumulated, the picksplit method is applied until
each split partition fits the page.  Some of our split algorithms could show
significant performance degradation while processing 4-times more data at once.
But those opclasses haven't received the sorted build support and shouldn't
receive it before their split algorithms are improved.

Discussion: https://postgr.es/m/CAHqSB9jqtS94e9%3D0vxqQX5dxQA89N95UKyz-%3DA7Y%2B_YJt%2BVW5A%40mail.gmail.com
Author: Aliaksandr Kalenik, Sergei Shoulbakov, Andrey Borodin
Reviewed-by: Björn Harrtell, Darafei Praliaskouski, Andres Freund
Reviewed-by: Alexander Korotkov
2022-02-07 23:20:42 +03:00
Tom Lane 5e26aa641e Test, don't just Assert, that mergejoin's inputs are in order.
There are two Asserts in nodeMergejoin.c that are reachable if
the input data is not in the expected order.  This seems way too
fragile.  Alexander Lakhin reported a case where the assertions
could be triggered with misconfigured foreign-table partitions,
and bitter experience with unstable operating system collation
definitions suggests another easy route to hitting them.  Neither
Assert is in a place where we can't afford one more test-and-branch,
so replace 'em with plain test-and-elog logic.

Per bug #17395.  While the reported symptom is relatively recent,
collation changes could happen anytime, so back-patch to all
supported branches.

Discussion: https://postgr.es/m/17395-8c326292078d1a57@postgresql.org
2022-02-05 11:59:29 -05:00
John Naylor b31e3f5613 Improve worst-case performance of text_position_get_match_pos()
This function converts a byte position to a character position after
a successful string match. Rather than calling pg_mblen() in a loop,
use pg_mbstrlen_with_len() since the latter can inline its own call to
pg_mblen(). When the string match is at the end of the haystack text, this
change results in 10-20% performance improvement, depending on platform and
typical character length in bytes. This also simplifies the code a little.

Specializing for UTF-8 could result in further improvement, but the
performance gain was not found to be reliable between platforms. The modest
gain in this commit is stable between platforms and usable by all server
encodings.

Discussion:
https://www.postgresql.org/message-id/CAFBsxsH1Yutrmu+6LLHKK8iXY+vG--Do6zN+2900spHXQNNQKQ@mail.gmail.com
2022-02-04 10:53:24 -05:00
Thomas Munro 807fee1a39 Track LLVM 14 API changes, up to 2022-01-30.
Tested with LLVM 11, LLVM 13 and LLVM's main branch at commit
8d8fce87bbd5.  There are still some deprecation warnings that will need
to be sorted out, but this may be enough to turn "seawasp" green again.

Like commit e6a76002, done on master only for now.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2B3Ac3He9_SpJcxeiiVknbcES1tbZEkH9sRBdJFGj8K5Q%40mail.gmail.com
2022-02-04 16:16:10 +13:00
Amit Kapila 7f481b8d38 Improve invalidation handling in pgoutput.c.
Fix the following issues in pgoutput.c:

* rel_sync_cache_relation_cb does the wrong thing when called for a cache
flush (i.e., relid == 0). Instead of invalidating all RelationSyncCache
entries as it should, it does nothing.

* When rel_sync_cache_relation_cb does invalidate an entry, it immediately
zaps the entry->map structure, even though that might still be in use. We
instead just mark the entry as invalid and rebuild it at a later safe
point.

* Similarly, rel_sync_cache_publication_cb is way too eager to reset the
pubactions flags, which would likely lead to failing to transmit changes
that we should transmit. In this case also, we just mark the entry as
invalid and rebuild it at a later safe point.

Author: Tom Lane
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/885288.1641420714@sss.pgh.pa.us
2022-02-04 07:30:40 +05:30
Robert Haas 5ef1eefd76 Allow archiving via loadable modules.
Running a shell command for each file to be archived has a lot of
overhead and may not offer as much error checking as you want, or the
exact semantics that you want. So, offer the option to call a loadable
module for each file to be archived, rather than running a shell command.

Also, add a 'basic_archive' contrib module as an example implementation
that archives to a local directory.

Nathan Bossart, with a little bit of kibitzing by me.

Discussion: http://postgr.es/m/20220202224433.GA1036711@nathanxps13
2022-02-03 14:05:02 -05:00
Andres Freund 7c1aead6cb Fix compiler warning in non-assert builds, introduced in f862d57057.
Discussion: https://postgr.es/m/20220203183655.ralgkh54sdcgysmn@alap3.anarazel.de
Backpatch: 14-, like f862d57057
2022-02-03 10:44:26 -08:00
Peter Eisentraut 94aa7cc5f7 Add UNIQUE null treatment option
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.

Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com
2022-02-03 11:48:21 +01:00
Etsuro Fujita f862d57057 Further fix for EvalPlanQual with mix of local and foreign partitions.
We assume that direct-modify ForeignScan nodes cannot be re-evaluated
during EvalPlanQual processing, but the rework for inherited
UPDATE/DELETE in commit 86dc90056 changed things, without considering
that, so that such ForeignScan nodes get called as part of the
EvalPlanQual subtree during EvalPlanQual processing in the case of an
inherited UPDATE/DELETE where the inheritance set contains foreign
target relations.  To avoid re-evaluating such ForeignScan nodes during
EvalPlanQual processing, commit c3928b467 modified nodeForeignscan.c,
but the assumption made there that ExecForeignScan() should never be
called for such ForeignScan nodes during EvalPlanQual processing turned
out to be wrong in some cases, leading to a segmentation fault or a
"cannot re-evaluate a Foreign Update or Delete during EvalPlanQual"
error.

Fix by modifying nodeForeignscan.c further to avoid re-evaluating such
ForeignScan nodes even in ExecForeignScan()/ExecReScanForeignScan()
during EvalPlanQual processing.  Since this makes non-reachable the
test-and-elog added to ForeignNext() by commit c3928b467 that produced
the aforesaid error, convert the test-and-elog to an Assert.

Per bug #17355 from Alexander Lakhin.  Back-patch to v14 where both
commits came in.

Patch by me, reviewed and tested by Alexander Lakhin and Amit Langote.

Discussion: https://postgr.es/m/17355-de8e362eb7001a96@postgresql.org
2022-02-03 15:15:00 +09:00
Andres Freund f3feff8259 windows: Improve crash / assert / exception handling.
startup_hacks() called SetErrorMode() with the SEM_NOGPFAULTERRORBOX argument
to prevent GUI popups on error. While that likely was sufficient at some
point, there are other sources of error popups.

At the same time SEM_NOGPFAULTERRORBOX unfortunately also prevents
"just-in-time debuggers" from working reliably, i.e. the ability to attach to
a process on crash. This prevents collecting crash dumps as part of CI.

The error popups are particularly problematic when they occur during automated
testing, as they can cause the tests to hang, waiting for a button to be
clicked.

This commit improves the error handling setup in startup_hacks() to address
those problems. SEM_NOGPFAULTERRORBOX is not used anymore, instead various
other APIs are used to disable popups and to redirect output to stderr where
possible.

While this improves the situation for postgres.exe, it doesn't address similar
issues in all the other executables. There currently is no codepath that's
called early on for all frontend programs.

I've tested that this prevents GUI popups and allows JIT debugging in case of
crashes due to:
- abort()
- assert()
- C runtime errors
- unhandled exceptions
both in debug and non-debug mode, on Win10 with MSVC 2019 and with MinGW.

Now that crash reports are generated on windows, collect them in windows CI.

Discussion: https://postgr.es/m/20211005193033.tg4pqswgvu3hcolm@alap3.anarazel.de
2022-02-02 18:33:25 -08:00
Robert Haas 8e2b6d45a0 Fix server crash bug in 'server' backup target.
When this code executed as superuser it appeared to work because no
system catalog lookups happened, but otherwise it crashes because there
is no transaction environment. Fix that.

Report and code change by me. Test case by Dagfinn Ilmari Mannsåker.

Discussion: http://postgr.es/m/CA+TgmobiKLXne-2AVzYyWRiO8=rChBQ=7ywoxp=2SmcFw=oDDw@mail.gmail.com
2022-02-02 13:50:33 -05:00
Peter Eisentraut 87669de72c Some cleanup for change of collate and ctype fields to type text
Some cleanup for commit 54637508f87bd5f07fb9406bac6b08240283be3b:
Reformat pg_database.dat to reflect the new field order.  Also update
the corresponding example in bki.sgml.  Reorder the way the fields are
filled in dbcommands.c to correspond to the new order.
2022-02-02 11:58:55 +01:00
Tom Lane b426bd48ee Simplify coding around path_contains_parent_reference().
Given the existing stipulation that path_contains_parent_reference()
must only be invoked on canonicalized paths, we can simplify things
in the wake of commit c10f830c5.  It is now only possible to see
".." at the start of a relative path.  That means we can simplify
path_contains_parent_reference() itself quite a bit, and it makes
the two existing outside call sites dead code, since they'd already
checked that the path is absolute.

We could now fold path_contains_parent_reference() into its only
remaining caller path_is_relative_and_below_cwd().  But it seems
better to leave it as a separately callable function, in case any
extensions are using it.

Also document the pre-existing requirement for
path_is_relative_and_below_cwd's input to be likewise canonicalized.

Shenhao Wang and Tom Lane

Discussion: https://postgr.es/m/OSBPR01MB4214FA221FFE046F11F2AD74F2D49@OSBPR01MB4214.jpnprd01.prod.outlook.com
2022-01-31 13:53:38 -05:00
Michael Paquier d10e41d423 Introduce pg_settings_get_flags() to find flags associated to a GUC
The most meaningful flags are shown, which are the ones useful for the
user and for automating and extending the set of tests supported
currently by check_guc.

This script may actually be removed in the future, but we are not
completely sure yet if and how we want to support the remaining sanity
checks performed there, that are now integrated in the main regression
test suite as of this commit.

Thanks also to Peter Eisentraut and Kyotaro Horiguchi for the
discussion.

Bump catalog version.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20211129030833.GJ17618@telsasoft.com
2022-01-31 08:56:41 +09:00
Alvaro Herrera b3d7d6e462
Remove xloginsert.h from xlog.h
xlog.h is directly and indirectly #included in a lot of places.  With
this change, xloginsert.h is no longer unnecessarily included in the
large number of them that don't need it.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CALj2ACVe-W+WM5P44N7eG9C2_FmaeM8Dq5aCnD3fHt0Ba=WR6w@mail.gmail.com
2022-01-30 12:25:24 -03:00
Tom Lane 8e2e0f7586 Fix failure to validate the result of select_common_type().
Although select_common_type() has a failure-return convention, an
apparent successful return just provides a type OID that *might* work
as a common supertype; we've not validated that the required casts
actually exist.  In the mainstream use-cases that doesn't matter,
because we'll proceed to invoke coerce_to_common_type() on each input,
which will fail appropriately if the proposed common type doesn't
actually work.  However, a few callers didn't read the (nonexistent)
fine print, and thought that if they got back a nonzero OID then the
coercions were sure to work.

This affects in particular the recently-added "anycompatible"
polymorphic types; we might think that a function/operator using
such types matches cases it really doesn't.  A likely end result
of that is unexpected "ambiguous operator" errors, as for example
in bug #17387 from James Inform.  Another, much older, case is that
the parser might try to transform an "x IN (list)" construct to
a ScalarArrayOpExpr even when the list elements don't actually have
a common supertype.

It doesn't seem desirable to add more checking to select_common_type
itself, as that'd just slow down the mainstream use-cases.  Instead,
write a separate function verify_common_type that performs the
missing checks, and add a call to that where necessary.  Likewise add
verify_common_type_from_oids to go with select_common_type_from_oids.

Back-patch to v13 where the "anycompatible" types came in.  (The
symptom complained of in bug #17387 doesn't appear till v14, but
that's just because we didn't get around to converting || to use
anycompatible till then.)  In principle the "x IN (list)" fix could
go back all the way, but I'm not currently convinced that it makes
much difference in real-world cases, so I won't bother for now.

Discussion: https://postgr.es/m/17387-5dfe54b988444963@postgresql.org
2022-01-29 11:41:18 -05:00
Michael Paquier 5ecd0183fb Fix comments about bgworker registration before MaxBackends initialization
Since 6bc8ef0b, InitializeMaxBackends() has used max_worker_processes
instead of adapting MaxBackends to the number of background workers
registered by modules loaded in shared_preload_libraries (at this time,
bgworkers were only static, but gained dynamic capabilities as a matter
of supporting parallel queries meaning that a control cap was
necessary).

Some comments referred to the past registration logic, making them
confusing and incorrect, so fix these.

Some of the out-of-core modules that could be loaded in this path
sometimes like to manipulate dynamically some of the resource-related
GUCs for their own needs, this commit adds a note about that.

Author: Nathan Bossart
Discussion: https://postgr.es/m/20220127181815.GA551692@nathanxps13
2022-01-29 10:47:36 +09:00
Peter Geoghegan bf42fcace5 vacuumlazy.c: Rename state field for consistency.
Rename pages_removed to removed_pages, for consistency with nearby
vacrel fields.
2022-01-28 17:41:09 -08:00
Michael Paquier dc084d7c73 Fix incorrect memory context switch in COPY TO execution
c532d15 has split the logic of COPY commands into multiple files, one
change being to move the internals of BeginCopy() to BeginCopyTo().
Originally the code was written so as we'd switch back-and-forth between
the current execution memory context and the dedicated memory context
for the COPY command, and this refactoring has introduced an extra
switch to the current memory context from the COPY context once
BeginCopyTo() is done with the past logic coming from BeginCopy().

The code was correctly doing the analyze, rewrite and planning phases in
the COPY context, but it was not assigning "copy_file" (FILE* used when
copying to a source file) and "filename" in the COPY context, making the
COPY status data inconsistent.

Author: Bharath Rupireddy
Reviewed-by: Japin Li
Discussion: https://postgr.es/m/CALj2ACWvVa69foi9jhHFY=2BuHxAoYboyE+vXQTARwxZcJnVrQ@mail.gmail.com
Backpatch-through: 14
2022-01-29 10:22:42 +09:00
Robert Haas aeb4cc9ea0 Move the code to archive files via the shell to a separate file.
This is preparatory work for allowing more extensibility in this area.

Nathan Bossart

Discussion: http://postgr.es/m/668D2428-F73B-475E-87AE-F89D67942270@amazon.com
2022-01-28 13:29:32 -05:00
Robert Haas 7f6772317b Adjust server-side backup to depend on pg_write_server_files.
I had made it depend on superuser, but that seems clearly inferior.
Also document the permissions requirement in the straming replication
protocol section of the documentation, rather than only in the
section having to do with pg_basebackup.

Idea and patch from Dagfinn Ilmari Mannsåker.

Discussion: http://postgr.es/m/87bkzw160u.fsf@wibble.ilmari.org
2022-01-28 12:31:40 -05:00
Peter Eisentraut 43f33dc018 Add HEADER support to COPY text format
The COPY CSV format supports the HEADER option to output a header
line.  This patch adds the same option to the default text format.  On
input, the HEADER option causes the first line to be skipped, same as
with CSV.

Author: Rémi Lapeyre <remi.lapeyre@lenstra.fr>
Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
2022-01-28 09:44:47 +01:00
Peter Eisentraut 5553cbd4fe Add some const decorations 2022-01-28 09:13:11 +01:00
Etsuro Fujita eabcfd99ed Fix typo in comment. 2022-01-28 15:45:00 +09:00
Fujii Masao 108505d763 Prevent memory context logging from sending log message to connected client.
When pg_log_backend_memory_contexts() is executed, the target backend
should use LOG_SERVER_ONLY to log its memory contexts, to prevent them
from being sent to its connected client regardless of client_min_messages.
But previously the backend unexpectedly used LOG to log the message
"logging memory contexts of PID %d" and it could be sent to the client.
This is a bug in memory context logging.

To fix the bug, this commit changes that message so that it's logged with
LOG_SERVER_ONLY.

Back-patch to v14 where pg_log_backend_memory_contexts() was added.

Author: Fujii Masao
Reviewed-by: Bharath Rupireddy, Atsushi Torikoshi
Discussion: https://postgr.es/m/82c12f36-86f7-5e72-79af-7f5c37f6cad7@oss.nttdata.com
2022-01-28 11:24:42 +09:00
Robert Haas 71cbbbbe80 pg_basebackup: Add a dummy return to bbsink_gzip_new().
Apparently, this is needed to avoid warnings on MVCC.

David Rowley

Discussion: http://postgr.es/m/CAApHDvosHkgyo_PZs7CSB4Kgs2ey4FdmFpcK0N_QOci9DJ=wnw@mail.gmail.com
2022-01-27 14:20:18 -05:00
Tomas Vondra f192e1bdf3 Fix ordering of XIDs in ProcArrayApplyRecoveryInfo
Commit 8431e296ea reworked ProcArrayApplyRecoveryInfo to sort XIDs
before adding them to KnownAssignedXids. But the XIDs are sorted using
xidComparator, which compares the XIDs simply as uint32 values, not
logically. KnownAssignedXidsAdd() however expects XIDs in logical order,
and calls TransactionIdFollowsOrEquals() to enforce that. If there are
XIDs for which the two orderings disagree, an error is raised and the
recovery fails/restarts.

Hitting this issue is fairly easy - you just need two transactions, one
started before the 4B limit (e.g. XID 4294967290), the other sometime
after it (e.g. XID 1000). Logically (4294967290 <= 1000) but when
compared using xidComparator we try to add them in the opposite order.
Which makes KnownAssignedXidsAdd() fail with an error like this:

  ERROR: out-of-order XID insertion in KnownAssignedXids

This only happens during replica startup, while processing RUNNING_XACTS
records to build the snapshot. Once we reach STANDBY_SNAPSHOT_READY, we
skip these records. So this does not affect already running replicas,
but if you restart (or create) a replica while there are transactions
with XIDs for which the two orderings disagree, you may hit this.

Long-running transactions and frequent replica restarts increase the
likelihood of hitting this issue. Once the replica gets into this state,
it can't be started (even if the old transactions are terminated).

Fixed by sorting the XIDs logically - this is fine because we're dealing
with normal XIDs (because it's XIDs assigned to backends) and from the
same wraparound epoch (otherwise the backends could not be running at
the same time on the primary node). So there are no problems with the
triangle inequality, which is why xidComparator compares raw values.

Investigation and root cause analysis by Abhijit Menon-Sen. Patch by me.

This issue is present in all releases since 9.4, however releases up to
9.6 are EOL already so backpatch to 10 only.

Reviewed-by: Abhijit Menon-Sen
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/36b8a501-5d73-277c-4972-f58a4dce088a%40enterprisedb.com
2022-01-27 20:13:55 +01:00
Peter Eisentraut 54637508f8 Change collate and ctype fields to type text
This changes the data type of the catalog fields datcollate, datctype,
collcollate, and collctype from name to text.  There wasn't ever a
really good reason for them to be of type name; presumably this was
just carried over from when they were fixed-size fields in pg_control,
first into the corresponding pg_database fields, and then to
pg_collation.  The values are not identifiers or object names, and we
don't ever look them up that way.

Changing to type text saves space in the typical case, since locale
names are typically only a few bytes long.  But it is also possible
that an ICU locale name with several customization options appended
could be longer than 63 bytes, so this also enables that case, which
was previously probably broken.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a@2ndquadrant.com
2022-01-27 08:54:25 +01:00
Magnus Hagander 2dbb7b9b22 Fix pg_hba_file_rules for authentication method cert
For authentication method cert, clientcert=verify-full is implied. But
the pg_hba_file_rules entry would incorrectly show clientcert=verify-ca.

Per bug #17354

Reported-By: Feike Steenbergen
Reviewed-By: Jonathan Katz
Backpatch-through: 12
2022-01-26 09:58:59 +01:00
David Rowley f9a74c1498 Consider parallel awareness when removing single-child Appends
8edd0e794 added some code to remove Append and MergeAppend nodes when they
contained a single child node.  As it turned out, this was unsafe to do
when the Append/MergeAppend was parallel_aware and the child node was not.
Removing the Append/MergeAppend, in this case, could lead to the child plan
being called multiple times by parallel workers when it was unsafe to do
so.

Here we fix this by just not removing the Append/MergeAppend when the
parallel_aware flag of the parent and child node don't match.

Reported-by: Yura Sokolov
Bug: #17335
Discussion: https://postgr.es/m/b59605fecb20ba9ea94e70ab60098c237c870628.camel%40postgrespro.ru
Backpatch-through: 12, where 8edd0e794 was first introduced
2022-01-25 21:10:03 +13:00
Michael Paquier 741bd32933 Improve errors related to incorrect TLI on checkpoint record replay
WAL replay would cause a hard crash if the timeline expected by a
XLOG_END_OF_RECOVERY, a XLOG_CHECKPOINT_ONLINE, or a
XLOG_CHECKPOINT_SHUTDOWN record is not the same as the timeline being
replayed, using the same error message for all three of them.  This
commit changes those error messages to use different wordings, adapted
to each record type, which is useful when it comes to the debugging of
an issue in this area.

Author: Amul Sul
Reviewed-by: Nathan Bossart, Robert Haas
Discussion: https://postgr.es/m/CAAJ_b97i1ZerYC_xW6o_AiDSW5n+sGi8k91Yc8KS8bKWKxjqwQ@mail.gmail.com
2022-01-25 13:37:19 +09:00
Michael Paquier 410aa248e5 Fix various typos, grammar and code style in comments and docs
This fixes a set of issues that have accumulated over the past months
(or years) in various code areas.  Most fixes are related to some recent
additions, as of the development of v15.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com
2022-01-25 09:40:04 +09:00
Tom Lane 6aa5186146 Fix limitations on what SQL commands can be issued to a walsender.
In logical replication mode, a WalSender is supposed to be able
to execute any regular SQL command, as well as the special
replication commands.  Poor design of the replication-command
parser caused it to fail in various cases, notably:

* semicolons embedded in a command, or multiple SQL commands
sent in a single message;

* dollar-quoted literals containing odd numbers of single
or double quote marks;

* commands starting with a comment.

The basic problem here is that we're trying to run repl_scanner.l
across the entire input string even when it's not a replication
command.  Since repl_scanner.l does not understand all of the
token types known to the core lexer, this is doomed to have
failure modes.

We certainly don't want to make repl_scanner.l as big as scan.l,
so instead rejigger stuff so that we only lex the first token of
a non-replication command.  That will usually look like an IDENT
to repl_scanner.l, though a comment would end up getting reported
as a '-' or '/' single-character token.  If the token is a replication
command keyword, we push it back and proceed normally with repl_gram.y
parsing.  Otherwise, we can drop out of exec_replication_command()
without examining the rest of the string.

(It's still theoretically possible for repl_scanner.l to fail on
the first token; but that could only happen if it's an unterminated
single- or double-quoted string, in which case you'd have gotten
largely the same error from the core lexer too.)

In this way, repl_gram.y isn't involved at all in handling general
SQL commands, so we can get rid of the SQLCmd node type.  (In
the back branches, we can't remove it because renumbering enum
NodeTag would be an ABI break; so just leave it sit there unused.)

I failed to resist the temptation to clean up some other sloppy
coding in repl_scanner.l while at it.  The only externally-visible
behavior change from that is it now accepts \r and \f as whitespace,
same as the core lexer.

Per bug #17379 from Greg Rychlewski.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org
2022-01-24 15:33:38 -05:00
Robert Haas 0ad8032910 Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.

To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.

At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.

Along the way, I removed the information message added by commit
5c649fe153 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.

Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4b.

This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b) and for
pg_verifybackup itself.

Patch by me, with a bug fix by Jeevan Ladhe.  The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.

Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 15:13:18 -05:00
Robert Haas aa01051418 pg_upgrade: Preserve database OIDs.
Commit 9a974cbcba arranged to preserve
relfilenodes and tablespace OIDs. For similar reasons, also arrange
to preserve database OIDs.

One problem is that, up until now, the OIDs assigned to the template0
and postgres databases have not been fixed. This could be a problem
when upgrading, because pg_upgrade might try to migrate a database
from the old cluster to the new cluster while keeping the OID and find
a different database with that OID, resulting in a failure. If it finds
a database with the same name and the same OID that's OK: it will be
dropped and recreated. But the same OID and a different name is a
problem.

To prevent that, fix the OIDs for postgres and template0 to specific
values less than 16384. To avoid running afoul of this rule, these
values should not be changed in future releases. It's not a problem
that these OIDs aren't fixed in existing releases, because the OIDs
that we're assigning here weren't used for either of these databases
in any previous release. Thus, there's no chance that an upgrade of
a cluster from any previous release will collide with the OIDs we're
assigning here. And going forward, the OIDs will always be fixed, so
the only potential collision is with a system database having the
same name and the same OID, which is OK.

This patch lets users assign a specific OID to a database as well,
provided however that it can't be less than 16384. I (rhaas) thought
it might be better not to expose this capability to users, but the
consensus was otherwise, so the syntax is documented. Letting users
assign OIDs below 16384 would not be OK, though, because a
user-created database with a low-numbered OID might collide with a
system-created database in a future release. We therefore prohibit
that.

Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.

Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
Discussion: http://postgr.es/m/CAASxf_Mnwm1Dh2vd5FAhVX6S1nwNSZUB1z12VddYtM++H2+p7w@mail.gmail.com
2022-01-24 14:23:43 -05:00
Tom Lane 3c06ec6d14 Remember to reset yy_start state when firing up repl_scanner.l.
Without this, we get odd behavior when the previous cycle of
lexing exited in a non-default exclusive state.  Every other
copy of this code is aware that it has to do BEGIN(INITIAL),
but repl_scanner.l did not get that memo.

The real-world impact of this is probably limited, since most
replication clients would abandon their connection after getting
a syntax error.  Still, it's a bug.

This mistake is old, so back-patch to all supported branches.

Discussion: https://postgr.es/m/1874781.1643035952@sss.pgh.pa.us
2022-01-24 12:09:46 -05:00
Tom Lane 353708e1fb Clean up recent Coverity complaints.
Commit 5c649fe15 introduced a memory leak into pg_basebackup's
parse_compress_options.  (I simplified nearby code while at it.)

Commit 9a974cbcb introduced a memory leak into pg_dump's
binary_upgrade_set_pg_class_oids.

Coverity also complained about a call of SnapBuildProcessChange that
ignored the result, unlike every other call of that function.  This
is evidently intentional, so add a (void) cast to indicate that.
(It's also old, dating to b89e15105; I suppose the reason it showed
up now is 7a5f6b474's recent rearrangement of nearby code.)
2022-01-23 12:51:38 -05:00
Tom Lane dc43fc9b3a Suppress variable-set-but-not-used warning from clang 13.
In the normal configuration where GEQO_DEBUG isn't defined,
recent clang versions have started to complain that geqo_main.c
accumulates the edge_failures count but never does anything
with it.  As a minimal back-patchable fix, insert a void cast
to silence this warning.  (I'd speculated about ripping out the
GEQO_DEBUG logic altogether, but I don't think we'd wish to
back-patch that.)

Per recently-established project policy, this is a candidate
for back-patching into out-of-support branches: it suppresses
an annoying compiler warning but changes no behavior.  Hence,
back-patch all the way to 9.2.

Discussion: https://postgr.es/m/CA+hUKGLTSZQwES8VNPmWO9AO0wSeLt36OCPDAZTccT1h7Q7kTQ@mail.gmail.com
2022-01-23 11:09:00 -05:00
Tomas Vondra 7b65862e22 Correct type of front_pathkey to PathKey
In sort_inner_and_outer we iterate a list of PathKey elements, but the
variable is declared as (List *). This mistake is benign, because we
only pass the pointer to lcons() and never dereference it.

This exists since ~2004, but it's confusing. So fix and backpatch to all
supported branches.

Backpatch-through: 10
Discussion: https://postgr.es/m/bf3a6ea1-a7d8-7211-0669-189d5c169374%40enterprisedb.com
2022-01-23 03:53:18 +01:00
Tomas Vondra 6d554e3fcd Check syscache result in AlterStatistics
The syscache lookup may return NULL even for valid OID, for example due
to a concurrent DROP STATISTICS, so a HeapTupleIsValid is necessary.
Without it, it may fail with a segfault.

Reported by Alexander Lakhin, patch by me. Backpatch to 13, where ALTER
STATISTICS ... SET STATISTICS was introduced.

Backpatch-through: 13
Discussion: https://postgr.es/m/17372-bf3b6e947e35ae77%40postgresql.org
2022-01-23 03:16:31 +01:00
Tom Lane 62e28097ce Remove useless inline marker.
Putting "inline" on a function that's not used anywhere in its
own file is useless unless the linker is doing global optimization,
a method we don't generally enable.  Moreover, it draws warnings
from some buildfarm members (curculio at least).

Looks like this was sloppiness in cc8b25712, which moved the
function from somewhere else where the inline marker was
more appropriate.
2022-01-22 17:11:33 -05:00
Tom Lane d8fbbb925b Flush table's relcache during ALTER TABLE ADD PRIMARY KEY USING INDEX.
Previously, unless we had to add a NOT NULL constraint to the column,
this command resulted in updating only the index's relcache entry.
That's problematic when replication behavior is being driven off the
existence of a primary key: other sessions (and ours too for that
matter) failed to recalculate their opinion of whether the table can
be replicated.  Add a relcache invalidation to fix it.

This has been broken since pg_class.relhaspkey was removed in v11.
Before that, updating the table's relhaspkey value sufficed to cause
a cache flush.  Hence, backpatch to v11.

Report and patch by Hou Zhijie

Discussion: https://postgr.es/m/OS0PR01MB5716EBE01F112C62F8F9B786947B9@OS0PR01MB5716.jpnprd01.prod.outlook.com
2022-01-22 13:32:40 -05:00
Andres Freund 1fabec7d7c fsync pg_logical/mappings in CheckPointLogicalRewriteHeap().
While individual logical rewrite files were synced to disk, the directory was
not. On some filesystems that could lead to loosing directory entries after a
crash.

Reported-By: Tom Lane <tgl@sss.pgh.pa.us>
Author: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/867F2E29-2782-4869-970E-B984C6D35A8F@amazon.com
Backpatch: 10-
2022-01-21 11:22:55 -08:00
Michael Paquier 237d1f3172 Fix one-off bug causing missing commit timestamps for subtransactions
The logic in charge of writing commit timestamps (enabled with
track_commit_timestamp) for subtransactions had a one-bug bug,
where it would be possible that commit timestamps go missing for the
last subtransaction committed.

While on it, simplify a bit the iteration logic in the loop writing the
commit timestamps, as per suggestions from Kyotaro Horiguchi and Tom
Lane, so as some variable initializations are not part of the loop
itself.

Issue introduced in 73c986a.

Analyzed-by: Alex Kingsborough
Author: Alex Kingsborough, Kyotaro Horiguchi
Discussion: https://postgr.es/m/73A66172-4050-4F2A-B7F1-13508EDA2144@amazon.com
Backpatch-through: 10
2022-01-21 14:54:04 +09:00
Robert Haas 3500ccc39b Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command.  If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.

On the server side, we now support two additional types of backup
targets.  There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.

Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.

Patch by me, with a bug fix by Jeevan Ladhe.  The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-20 10:46:33 -05:00
Robert Haas ab4fd4f868 Remove 'datlastsysoid'.
It hasn't been used for anything for a long time. Up until recently,
we still queried it when dumping very old servers, but since
commit 30e7c175b8, there's no longer any
code at all that cares about it.

Discussion: http://postgr.es/m/CA+Tgmoa14=BRq0WEd0eevjEMn9EkghDB1FZEkBw7+UAb7tF49A@mail.gmail.com
2022-01-20 09:01:12 -05:00
Peter Eisentraut b99ccd2cb2 Call pg_newlocale_from_collation() also with default collation
Previously, callers of pg_newlocale_from_collation() did not call it
if the collation was DEFAULT_COLLATION_OID and instead proceeded with
a pg_locale_t of 0.  Instead, now we call it anyway and have it return
0 if the default collation was passed.  It already did this, so we
just have to adjust the callers.  This simplifies all the call sites
and also makes future enhancements easier.

After discussion and testing, the previous comment in pg_locale.c
about avoiding this for performance reasons may have been mistaken
since it was testing a very different patch version way back when.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917e34@enterprisedb.com
2022-01-20 09:50:18 +01:00
Jeff Davis 7a5f6b4748 Make logical decoding a part of the rmgr.
Add a new rmgr method, rm_decode, and use that rather than a switch
statement.

In preparation for rmgr extensibility.

Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
Discussion: https://postgr.es/m/20220118095332.6xtlcjoyxobv6cbk@jrouhaud
2022-01-19 14:58:49 -08:00
Tom Lane 89f059bdf5 Remove redundant memory context switches in BeginCopyFrom().
This is probably a leftover from code refactoring.

Japin Li

Discussion: https://postgr.es/m/MEYP282MB16693DDABDFEC7949AC31857B6599@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2022-01-19 12:31:15 -05:00
Robert Haas 0f47e833bf Fix alignment problem with bbsink_copystream buffer.
bbsink_copystream wants to store a type byte just before the buffer,
but basebackup.c wants the buffer to be aligned so that it can call
PageIsNew() and PageGetLSN() on it. Therefore, instead of inserting
1 extra byte before the buffer, insert MAXIMUM_ALIGNOF extra bytes
and only use the last one.

On most machines this doesn't cause any problem (except perhaps for
performance) but some buildfarm machines with -fsanitize=alignment
dump core.

Discussion: http://postgr.es/m/CA+TgmoYx5=1A2K9JYV-9zdhyokU4KKTyNQ9q7CiXrX=YBBMWVw@mail.gmail.com
2022-01-19 08:12:08 -05:00
Robert Haas cc333f3233 Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.

The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.

The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.

Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 13:47:49 -05:00
Andres Freund c702d656a2 heap pruning: Only call BufferGetBlockNumber() once.
BufferGetBlockNumber() is not that cheap and obviously cannot change during
one heap_prune_page(), so only call it once. We might be able to do better and
pass the block number from the caller, but that'd be a larger change...

Discussion: https://postgr.es/m/20211211045710.ljtuu4gfloh754rs@alap3.anarazel.de
2022-01-17 15:35:11 -08:00
Robert Haas 9a974cbcba pg_upgrade: Preserve relfilenodes and tablespace OIDs.
Currently, database OIDs, relfilenodes, and tablespace OIDs can all
change when a cluster is upgraded using pg_upgrade. It seems better
to preserve them, because (1) it makes troubleshooting pg_upgrade
easier, since you don't have to do a lot of work to match up files
in the old and new clusters, (2) it allows 'rsync' to save bandwidth
when used to re-sync a cluster after an upgrade, and (3) if we ever
encrypt or sign blocks, we would likely want to use a nonce that
depends on these values.

This patch only arranges to preserve relfilenodes and tablespace
OIDs. The task of preserving database OIDs is left for another patch,
since it involves some complexities that don't exist in these cases.

Database OIDs have a similar issue, but there are some tricky points
in that case that do not apply to these cases, so that problem is left
for another patch.

Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.

Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
2022-01-17 13:40:27 -05:00
Peter Eisentraut cf925936ec Fix for new Boolean node
The token in nodeTokenType() is actually the whole rest of the string,
so we need to take into account the length to do the correct
comparison.

Without this, postgres_fdw tests fail under
-DWRITE_READ_PARSE_PLAN_TREES.
2022-01-17 13:59:46 +01:00