Commit Graph

56109 Commits

Author SHA1 Message Date
Peter Geoghegan 05a304a855 Make SP-GiST redirect cleanup more aggressive.
Commit 61b313e4 made VACUUM pass down a heaprel to index AM bulkdelete
and vacuumcleanup routines.  Although this was primarily intended as
preparation for logical decoding on standbys, it also made it easy to
correct an old deficiency in how we determine how to cleanup SP-GiST
redirect and placeholder tuples.

Pass the heaprel to GlobalVisTestFor() during cleanup of redirect and
placeholder tuples, rather than pessimistically passing NULL.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/02392033-f030-a3c8-c7d0-5c27eb529fec@gmail.com
2023-04-03 11:47:48 -07:00
Peter Geoghegan e48c817395 Recycle deleted nbtree pages more aggressively.
Commit 61b313e4 made nbtree consistently pass down a heaprel to low
level routines like _bt_getbuf().  Although this was primarily intended
as preparation for logical decoding on standbys, it also made it easy to
correct an old deficiency in how nbtree VACUUM determines whether or not
it's now safe to recycle deleted pages.

Pass the heaprel to GlobalVisTestFor() in nbtree routines that deal with
recycle safety.  nbtree now makes less pessimistic assumptions about
recycle safety within non-catalog relations.  This enhancement
complements the recycling enhancement added by commit 9dd963ae25.

nbtree remains just as pessimistic as ever when it comes to recycle
safety within indexes on catalog relations.  There is no fundamental
reason why we need to treat catalog relations differently, though.  The
behavioral inconsistency is a consequence of the way that nbtree uses
nextXID values to implement what Lanin and Shasha call "the drain
technique".  Note in particular that it has nothing to do with whether
or not index tuples might still be required for an older MVCC snapshot.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkaiDxCje0yPuH=3Uh2p1V_2pFGY==xfbZoZu7Ax_NB8g@mail.gmail.com
2023-04-03 11:31:43 -07:00
Peter Geoghegan a349b86603 Move heaprel struct field next to index rel field.
Commit 61b313e4 added a heaprel struct member to IndexVacuumInfo, but
placed it last.  Move the heaprel struct member next to the index struct
member to improve the code's readability.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznG=TV6S9d3VA=y0vBHbXwnLs9_LLdiML=aNJuHeriwxg@mail.gmail.com
2023-04-03 11:01:11 -07:00
Robert Haas e7e7da2f8d Fix possible logical replication crash.
Commit c3afe8cf5a added a new
password_required option but forgot that you need database access
to check whether an arbitrary role ID is a superuser.

Report and patch by Hou Zhijie. I added a comment. Thanks to
Alexander Lakhin for devising a way to reproduce the crash.

Discussion: http://postgr.es/m/OS0PR01MB5716BFD7EC44284C89F40808948F9@OS0PR01MB5716.jpnprd01.prod.outlook.com
2023-04-03 13:54:21 -04:00
Tom Lane a8a00124f1 When using valgrind, log the current query after an error is detected.
In USE_VALGRIND builds, add code to print the text of the current query
to the valgrind log whenever the valgrind error count has increased
during the query.  Valgrind will already have printed its report,
if the error is distinct from ones already seen, so that this works
out fairly well as a way of annotating the log.

Onur Tirtir and Tom Lane

Discussion: https://postgr.es/m/AM9PR83MB0498531E804DC8DF8CFF0E8FE9D09@AM9PR83MB0498.EURPRD83.prod.outlook.com
2023-04-03 10:18:38 -04:00
Alexander Korotkov b0b91ced16 Revert 764da7710b
Discussion: https://postgr.es/m/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de
2023-04-03 16:55:09 +03:00
Alexander Korotkov 2b65bf046d Revert 11470f544e
Discussion: https://postgr.es/m/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de
2023-04-03 16:54:31 +03:00
David Rowley 8d928e3a9f Rename BufferAccessStrategyData.ring_size to nbuffers
The new name better reflects what the field is - the size of the buffers[]
array.  ring_size sounded more like it is in units of bytes.

An upcoming commit allows a BufferAccessStrategy of custom sizes, so it
seems relevant to improve this beforehand.

Author: Melanie Plageman
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/CAAKRu_YefVbhg4rAxU2frYFdTap61UftH-kUNQZBvAs%2BYK81Zg%40mail.gmail.com
2023-04-03 23:31:16 +12:00
David Rowley 4830f10243 Disable vacuum's use of a buffer access strategy during failsafe
Traditionally, vacuum always makes use of a buffer access strategy 32
buffers in size.  This means that running vacuums tend not to cause too
many shared buffers to become dirty, however, this can cause vacuums to
run much more slowly than they otherwise could as WAL flushes will occur
more frequently due to having to flush WAL out to the LSN of the dirty
page before that page can be written to disk.

When we are performing failsafe VACUUMs (as added in 1e55e7d17), we really
want to make the vacuum work go as quickly as possible, so here we disable
the buffer access strategy when entering failsafe mode while vacuuming a
relation.

Per idea and analyis from Andres Freund.

In passing, also include some changes I had intended for 32fbe0239.

Author: Melanie Plageman
Reviewed-by: Justin Pryzby, David Rowley
Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb%40awork3.anarazel.de
2023-04-03 23:05:58 +12:00
Daniel Gustafsson 525fb0a171 Fix typo in CI README
s/fron/from/
2023-04-03 10:50:17 +02:00
David Rowley 32fbe0239b Only make buffer strategy for vacuum when it's likely needed
VACUUM FULL and VACUUM ONLY_DATABASE_STATS will not use the vacuum
strategy ring created in vacuum(), so don't waste effort making it in
those cases.

There are other conceivable cases where the buffer strategy also won't be
used, but those are probably less common and not worth troubling over too
much.  For example VACUUM (PROCESS_MAIN false, PROCESS_TOAST false).
There are other cases too, but many of these are only discovered once
inside vacuum_rel().

Author: Melanie Plageman
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/CAAKRu_ZLRuzkM3zKogiZAz2hUony37yLY4aaLb8fPf8fgqs5Og@mail.gmail.com
2023-04-03 19:19:45 +12:00
Peter Eisentraut 1980d3585e pg_basebackup: Correct type of WalSegSz
The pg_basebackup code had WalSegSz as uint32, whereas the rest of the
code has it as int.  This seems confusing, and using the extra range
wouldn't actually work.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/1bf15c7a-0acd-1864-081e-7a28814310fe%40enterprisedb.com
2023-04-03 07:21:06 +02:00
David Rowley 3f476c9534 Remove some global variables from vacuum.c
Using global variables because we don't want to pass these values around
as parameters does not really seem like a great idea, so let's remove
these two global variables and adjust a few functions to accept these
values as parameters instead.

This is part of a wider patch which intends to allow the size of the
buffer access strategy that vacuum uses to be adjusted.

Author: Melanie Plageman
Reviewed-by: Bharath Rupireddy
Discussion: https://postgr.es/m/CAAKRu_b1q_07uquUtAvLqTM%3DW9nzee7QbtzHwA4XdUo7KX_Cnw%40mail.gmail.com
2023-04-03 15:07:25 +12:00
Tom Lane 2e6ba13152 Doc: update pgindent/README.
I missed updating this when we pulled pg_bsd_indent into the tree.
2023-04-02 20:01:34 -04:00
Andres Freund 6af1793954 Add info in WAL records in preparation for logical slot conflict handling
This commit only implements one prerequisite part for allowing logical
decoding. The commit message contains an explanation of the overall design,
which later commits will refer back to.

Overall design:

1. We want to enable logical decoding on standbys, but replay of WAL
from the primary might remove data that is needed by logical decoding,
causing error(s) on the standby. To prevent those errors, a new replication
conflict scenario needs to be addressed (as much as hot standby does).

2. Our chosen strategy for dealing with this type of replication slot
is to invalidate logical slots for which needed data has been removed.

3. To do this we need the latestRemovedXid for each change, just as we
do for physical replication conflicts, but we also need to know
whether any particular change was to data that logical replication
might access. That way, during WAL replay, we know when there is a risk of
conflict and, if so, if there is a conflict.

4. We can't rely on the standby's relcache entries for this purpose in
any way, because the startup process can't access catalog contents.

5. Therefore every WAL record that potentially removes data from the
index or heap must carry a flag indicating whether or not it is one
that might be accessed during logical decoding.

Why do we need this for logical decoding on standby?

First, let's forget about logical decoding on standby and recall that
on a primary database, any catalog rows that may be needed by a logical
decoding replication slot are not removed.

This is done thanks to the catalog_xmin associated with the logical
replication slot.

But, with logical decoding on standby, in the following cases:

- hot_standby_feedback is off
- hot_standby_feedback is on but there is no a physical slot between
  the primary and the standby. Then, hot_standby_feedback will work,
  but only while the connection is alive (for example a node restart
  would break it)

Then, the primary may delete system catalog rows that could be needed
by the logical decoding on the standby (as it does not know about the
catalog_xmin on the standby).

So, it’s mandatory to identify those rows and invalidate the slots
that may need them if any. Identifying those rows is the purpose of
this commit.

Implementation:

When a WAL replay on standby indicates that a catalog table tuple is
to be deleted by an xid that is greater than a logical slot's
catalog_xmin, then that means the slot's catalog_xmin conflicts with
the xid, and we need to handle the conflict. While subsequent commits
will do the actual conflict handling, this commit adds a new field
isCatalogRel in such WAL records (and a new bit set in the
xl_heap_visible flags field), that is true for catalog tables, so as to
arrange for conflict handling.

The affected WAL records are the ones that already contain the
snapshotConflictHorizon field, namely:

- gistxlogDelete
- gistxlogPageReuse
- xl_hash_vacuum_one_page
- xl_heap_prune
- xl_heap_freeze_page
- xl_heap_visible
- xl_btree_reuse_page
- xl_btree_delete
- spgxlogVacuumRedirect

Due to this new field being added, xl_hash_vacuum_one_page and
gistxlogDelete do now contain the offsets to be deleted as a
FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignment.
It's not needed on the others struct where isCatalogRel has
been added.

This commit just introduces the WAL format changes mentioned above. Handling
the actual conflicts will follow in future commits.

Bumps XLOG_PAGE_MAGIC as the several WAL records are changed.

Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de> (in an older version)
Author: Amit Khandekar <amitdkhan.pg@gmail.com>  (in an older version)
Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
2023-04-02 12:32:19 -07:00
Noah Misch ab73291d26 Fix copy-pasto in contrib/auth_delay/meson.build variable name. 2023-04-02 09:31:10 -07:00
Noah Misch eaa1dd131c Use PG_TEST_TIMEOUT_DEFAULT in 019_replslot_limit.pl.
Oversight in f2698ea02c, which introduced
the variable.  This lowers some 1000s timeouts to the configurable
default of 180s, due to a lack of evidence for needing the longer
timeout.  The alternative was 6*PG_TEST_TIMEOUT_DEFAULT, which we can
adopt if the need arises.  Given the lack of observed trouble with these
timeouts, no back-patch.
2023-04-02 09:31:09 -07:00
Andres Freund 61b313e47e Pass down table relation into more index relation functions
This is done in preparation for logical decoding on standby, which needs to
include whether visibility affecting WAL records are about a (user) catalog
table. Which is only known for the table, not the indexes.

It's also nice to be able to pass the heap relation to GlobalVisTestFor() in
vacuumRedirectAndPlaceholder().

Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/21b700c3-eecf-2e05-a699-f8c78dd31ec7@gmail.com
2023-04-01 20:18:29 -07:00
Andres Freund a88a18b125 Assert only valid flag bits are passed to visibilitymap_set()
If visibilitymap_set() is called with flags containing a higher bit than
VISIBILITYMAP_ALL_FROZEN, the state of neighboring pages is affected. While
there was an assertion that *some* valid bits were set, it did not check
that *only* valid bits were. Change that.

Discussion: https://postgr.es/m/20230331043300.gux3s5wzrapqi4oe@awork3.anarazel.de
2023-04-01 18:00:19 -07:00
Andres Freund 14f98e0af9 hio: Release extension lock before initializing page / pinning VM
PageInit() while holding the extension lock is unnecessary after 0d1fe9f74e
started to use RBM_ZERO_AND_LOCK - nobody can look at the new page before we
release the page lock. PageInit() zeroes the page, which isn't that cheap, so
deferring it until after the extension lock is released seems like a good idea.

Doing visibilitymap_pin() while holding the extension lock, introduced in
7db0cd2145, looks like an accident. Due to the restrictions on
HEAP_INSERT_FROZEN it's unlikely to be a performance issue, but it still seems
better to move it out.  We also are doing the visibilitymap_pin() while
holding the buffer lock, which will be fixed in a separate commit.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: http://postgr.es/m/419312fd-9255-078c-c3e3-f0525f911d7f@iki.fi
2023-04-01 17:50:18 -07:00
Tomas Vondra 0070b66fef pg_dump: Use only LZ4 frame format for compression
After 0da243fed0 got committed, it was reported that in some cases the
compression ratio is rather poor - particularly for custom format with
narrow tables - due to writing the LZ4 header/footer for each row.

This commit switches to LZ4F (LZ4 frame format), eliminating most of the
overhead and greatly improving the compression ratio. This makes the
compressed size about the same for plain and custom formats (just like
for gzip, for example).

LZ4F is now used by both compression APIs, which allowed refactoring and
reusing more of the code. For consistency this also renames the LZ4File
struct to LZ4State, and a number of functions are now prefixed with
LZ4Stream_ (instead of LZ4File_).

Patch by Georgios Kokolatos, based on report and initial patch by Justin
Pryzby. Review and minor cleanups by me.

Author: Georgios Kokolatos, Justin Pryzby
Reported-by: Justin Pryzby
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20230227044910.GO1653%40telsasoft.com
2023-04-01 00:54:50 +02:00
David Rowley c8f8d0eb18 Doc: add Buffer Access Strategy to the glossary
It seems useful to add this to the glossary as there's discussion around
adding an option to VACUUM to disable and adjust the size of the buffer
access strategy that VACUUM uses.

Author: Melanie Plageman
Reviewed-by: Justin Pryzby, David Rowley
Discussion: https://postgr.es/m/ZBYDTrD1kyGg%2BHkS%40telsasoft.com
2023-04-01 10:41:27 +13:00
Peter Geoghegan df4f3ab517 Add show_data option to pg_get_wal_block_info.
Allow users to opt out of returning FPI data and block data from
pg_get_wal_block_info as an optimization.  Testing has shown that this
can make function execution over twice as fast in some cases.

When pg_get_wal_block_info is called with "show_data := false", it
always returns NULL values for its block_data and block_fpi_data bytea
output parameters.  Nothing else changes.  In particular, the function
will still return the usual per-block summary of block data/FPI space
overhead.  Use of "show_data := false" is therefore feasible with all
queries that don't specifically require these raw binary strings.

Follow-up to recent work in commit 122376f0.  There still hasn't been a
stable release with the pg_get_wal_block_info function, so no bump in
the pg_walinspect extension version.

Per suggestion from Melanie Plageman.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAAKRu_bJvbcYBRj2cN6G2xV7B7-Ja+pjTO1nEnEhRR8OXYiABA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-Wzm9shOkEDM10_+qOZkRSQhKVxwBFiehH6EHWQQRd_rDPw@mail.gmail.com
2023-03-31 14:02:52 -07:00
Alvaro Herrera 6ee30209a6
SQL/JSON: support the IS JSON predicate
This patch introduces the SQL standard IS JSON predicate. It operates
on text and bytea values representing JSON, as well as on the json and
jsonb types. Each test has IS and IS NOT variants and supports a WITH
UNIQUE KEYS flag. The tests are:

IS JSON [VALUE]
IS JSON ARRAY
IS JSON OBJECT
IS JSON SCALAR

These should be self-explanatory.

The WITH UNIQUE KEYS flag makes these return false when duplicate keys
exist in any object within the value, not necessarily directly contained
in the outermost object.

Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>

Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.

Discussion: https://postgr.es/m/CAF4Au4w2x-5LTnN_bxky-mq4=WOqsGsxSpENCzHRAzSnEd8+WQ@mail.gmail.com
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
2023-03-31 22:34:04 +02:00
Tom Lane a2a0c7c29e Further tweaking of width_bucket() edge cases.
I realized that the third overflow case I posited in commit b0e9e4d76
actually should be handled in a different way: rather than tolerating
the idea that the quotient could round to 1, we should clamp so that
the output cannot be more than "count" when we know that the operand is
less than bound2.  That being the case, we don't need an overflow-aware
increment in that code path, which leads me to revert the movement of
the pg_add_s32_overflow() call.  (The diff in width_bucket_float8
might be easier to read by comparing against b0e9e4d76^.)

What's more, width_bucket_numeric also has this problem of the quotient
potentially rounding to 1, so add a clamp there too.

As before, I'm not quite convinced that a back-patch is warranted.

Discussion: https://postgr.es/m/391415.1680268470@sss.pgh.pa.us
2023-03-31 16:29:55 -04:00
Tom Lane f0d65c0eaf Reject system columns as elements of foreign keys.
Up through v11 it was sensible to use the "oid" system column as
a foreign key column, but since that was removed there's no visible
usefulness in making any of the remaining system columns a foreign
key.  Moreover, since the TupleTableSlot rewrites in v12, such cases
actively fail because of implicit assumptions that only user columns
appear in foreign keys.  The lack of complaints about that seems
like good evidence that no one is trying to do it.  Hence, rather
than trying to repair those assumptions (of which there are at least
two, maybe more), let's just forbid the case up front.

Per this patch, a system column in either the referenced or
referencing side of a foreign key will draw this error; however,
putting one in the referenced side would have failed later anyway,
since we don't allow unique indexes to be made on system columns.

Per bug #17877 from Alexander Lakhin.  Back-patch to v12; the
case still appears to work in v11, so we shouldn't break it there.

Discussion: https://postgr.es/m/17877-4bcc658e33df6de1@postgresql.org
2023-03-31 11:18:49 -04:00
Tom Lane c2d7d679c1 Ensure acquire_inherited_sample_rows sets its output parameters.
The totalrows/totaldeadrows outputs were left uninitialized in cases
where we find no analyzable child tables of a partitioned table.  This
could lead to setting the partitioned table's pg_class.reltuples value
to garbage.  It's not clear that that would have any very bad effects
in practice, but fix it anyway because it's making valgrind unhappy.

Reported and diagnosed by Alexander Lakhin (bug #17880).

Discussion: https://postgr.es/m/17880-9282037c923d856e@postgresql.org
2023-03-31 10:08:40 -04:00
Daniel Gustafsson 558fff0adf pg_regress: Emit TAP compliant output
This converts pg_regress output format to emit TAP compliant output
while keeping it as human readable as possible for use without TAP
test harnesses. As verbose harness related information isn't really
supported by TAP this also reduces the verbosity of pg_regress runs
which makes scrolling through log output in buildfarm/CI runs a bit
easier as well.

As the meson TAP parser conumes whitespace, the leading indentation
for differentiating parallel tests from sequential tests has been
changed to a single character prefix.

This patch has been around for an extended period of time, reviewers
listed below may have been involved in reviewing a version quite
different from the version in this commit.  The original idea for
this patch was a hacking session with Jinbao Chen.

TAP format testing is also enabled in meson as of this.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Nikolay Shaplov <dhyan@nataraj.su>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/BD4B107D-7E53-4794-ACBA-275BEB4327C9@yesql.se
Discussion: https://postgr.es/m/20220221164736.rq3ornzjdkmwk2wo@alap3.anarazel.de
2023-03-31 13:00:02 +02:00
Alvaro Herrera 9b058f6b0d
Move ExecEvalJsonConstructor new function to a more natural place
Commit 7081ac46ac put it at the end of the file, but that doesn't look
very nice.
2023-03-31 12:55:25 +02:00
Alvaro Herrera 47a9709846
No need to add FORMAT to the keyword precedence list
Commit 7081ac46ac put it there.  Remove it.
2023-03-31 11:16:43 +02:00
Amit Kapila ed94e8563e Add XML ID attributes to create_publication.sgml.
This commit adds XML ID attributes to all varlistentries in
create_publication.sgml. This allows us to include links to refer to
publication options, making documents more readable.

Author: Kuroda Hayato
Reviewed-by: Peter Smith, Amit Kapila
Discussion: https://postgr.es/m/TYAPR01MB58668219FEA4EC231486A433F58E9@TYAPR01MB5866.jpnprd01.prod.outlook.com
2023-03-31 08:59:55 +05:30
Andres Freund f95c1cd6b2 Bump PGSTAT_FILE_FORMAT_ID, omitted in 8aaa04b32d
I forgot to do so in the referenced commit. While the consequences of omitting
the version change are likely to be harmless (besides discarding stats, as a
PGSTAT_FILE_FORMAT_ID bump also does), it still seems worth doing.
2023-03-30 19:48:01 -07:00
Andres Freund 8aaa04b32d Track shared buffer hits in pg_stat_io
Among other things, this should make it easier to calculate a useful cache hit
ratio by excluding buffer reads via buffer access strategies. As buffer access
strategies reuse buffers (and thus evict the prior buffer contents), it is
normal to see reads on repeated scans of the same data.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAAKRu_beMa9Hzih40%3DXPYqhDVz6tsgUGTrhZXRo%3Dunp%2Bszb%3DUA%40mail.gmail.com
2023-03-30 19:24:21 -07:00
David Rowley 6c3b697b19 Fix List memory issue in transformColumnDefinition
When calling generateSerialExtraStmts(), we would pass in the
constraint->options.  In some cases, generateSerialExtraStmts() would
modify the referenced List to remove elements from it, but doing so is
invalid without assigning the list back to all variables that point to it.
In the particular reported problem case, the List became empty, in which
cases it became NIL, but the passed in constraint->options didn't get to
find out about that and was left pointing to free'd memory.

To fix this, just perform a list_copy() inside generateSerialExtraStmts().
We could just do a list_copy() just before we perform the delete from the
list, however, that seems less robust.  Let's make sure the generated
CreateSeqStmt gets a completely different copy of the list to be safe.

Bug: #17879
Reported-by: Fei Changhong
Diagnosed-by: Fei Changhong
Discussion: https://postgr.es/m/17879-b7dfb5debee58ff5@postgresql.org
Backpatch-through: 11, all supported versions
2023-03-31 12:13:05 +13:00
Thomas Munro 11c2d6fdf5 Parallel Hash Full Join.
Full and right outer joins were not supported in the initial
implementation of Parallel Hash Join because of deadlock hazards (see
discussion).  Therefore FULL JOIN inhibited parallelism, as the other
join strategies can't do that in parallel either.

Add a new PHJ phase PHJ_BATCH_SCAN that scans for unmatched tuples on
the inner side of one batch's hash table.  For now, sidestep the
deadlock problem by terminating parallelism there.  The last process to
arrive at that phase emits the unmatched tuples, while others detach and
are free to go and work on other batches, if there are any, but
otherwise they finish the join early.

That unfairness is considered acceptable for now, because it's better
than no parallelism at all.  The build and probe phases are run in
parallel, and the new scan-for-unmatched phase, while serial, is usually
applied to the smaller of the two relations and is either limited by
some multiple of work_mem, or it's too big and is partitioned into
batches and then the situation is improved by batch-level parallelism.

Author: Melanie Plageman <melanieplageman@gmail.com>
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com
2023-03-31 11:34:03 +13:00
Andres Freund ca7b3c4c00 pg_stat_wal: Accumulate time as instr_time instead of microseconds
In instr_time.h it is stated that:

* When summing multiple measurements, it's recommended to leave the
* running sum in instr_time form (ie, use INSTR_TIME_ADD or
* INSTR_TIME_ACCUM_DIFF) and convert to a result format only at the end.

The reason for that is that converting to microseconds is not cheap, and can
loose precision.  Therefore this commit changes 'PendingWalStats' to use
'instr_time' instead of 'PgStat_Counter' while accumulating 'wal_write_time'
and 'wal_sync_time'.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/1feedb83-7aa9-cb4b-5086-598349d3f555@gmail.com
2023-03-30 14:23:14 -07:00
Peter Geoghegan 122376f028 Show record information in pg_get_wal_block_info.
Expand the output parameters in pg_walinspect's pg_get_wal_block_info
function to return additional information that was previously only
available from pg_walinspect's pg_get_wal_records_info function.  Some
of the details are attributed to individual block references, rather
than aggregated into whole-record values, since the function returns one
row per block reference per WAL record (unlike pg_get_wal_records_info,
which always returns one row per WAL record).

This structure is much easier to work with when writing queries that
track how individual blocks changed over time, or when attributing costs
to individual blocks (not WAL records) is useful.

This is the second time that pg_get_wal_block_info has been enhanced in
recent weeks.  Commit 9ecb134a expanded on the original version of the
function added in commit c31cf1c0 (where it first appeared under the
name pg_get_wal_fpi_info).  There still hasn't been a stable release
since commit c31cf1c0, so no bump in the pg_walinspect extension
version.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Kyotaro HORIGUCHI <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CALj2ACVRK5=Z+2ZVsjgTTSkfEnQzCuwny7iigpG7g1btk4Ws2A@mail.gmail.com
2023-03-30 12:26:12 -07:00
Alvaro Herrera 63cc20205c
Simplify transformJsonAggConstructor() API
There's no need for callers to pass aggregate names so that the function
can resolve them to OIDs, when callers can just pass aggregate OIDs
directly to begin with.
2023-03-30 21:07:24 +02:00
Alvaro Herrera 60966f56c3
Fix inconsistencies and style issues in new SQL/JSON code
Reported by Alexander Lakhin.

Discussion: https://postgr.es/m/60483139-5c34-851d-baee-6c0d014e1710@gmail.com
2023-03-30 21:06:31 +02:00
Alvaro Herrera 589bb81649
Fix setrefs.c code for adjusting partPruneInfos
We were transferring partPruneInfos from PlannerInfo into PlannerGlobal
wrong, essentially relying on all of them being transferred, and
adjusting their list indexes based on that.  But apparently it's
possible that some of them are skipped, so that strategy leads to a
corrupted execution tree.  Instead, adjust each Append/MergeAppend's
partpruneinfo index as we copy from one list to the other, which seems
safer anyway.  This requires adjusting the RT offset of the RTE
referenced in each partPruneInfo ahead of actually adjusting the RTE
itself, which seems a bit too ad-hoc.

This problem was introduced by commit ec38694894.  However, it may be
that we no longer require the change introduced there, so perhaps we
should revert both the present commit and that one.

Problem noticed by sqlsmith.

Author: Amit Langote <amitlangote09@gmail.com
Discussion: https://postgr.es/m/CA+HiwqG6tbc2oadsbyyy24b2AL295XHQgyLRWghmA7u_SL1K8A@mail.gmail.com
2023-03-30 21:06:31 +02:00
Andres Freund f9054b7a7c Fix format code in fd.c debugging infrastructure
These were not sufficiently adjusted in 2d4f1ba6cf.
2023-03-30 10:26:10 -07:00
Andres Freund 558cf80387 bufmgr: Fix undefined behaviour with, unrealistically, large temp_buffers
Quoting Melanie:
> Since if buffer is INT_MAX, then the -(buffer + 1) version invokes
> undefined behavior while the -buffer - 1 version doesn't.

All other places were already using the correct version. I (Andres), copied
the code into more places in a patch. Melanie caught it in review, but to
prevent more people from copying the bad code, fix it. Even if it is a
theoretical issue.

We really ought to wrap these accesses in a helper function...

As this is a theoretical issue, don't backpatch.

Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_aW2SX_LWtwHgfnqYpBrunMLfE9PD6-ioPpkh92XH0qpg@mail.gmail.com
2023-03-30 10:26:10 -07:00
Tom Lane e9d202a149 Clean up role created in new subscription test.
This oversight broke repeated runs of "make installcheck".
2023-03-30 13:07:04 -04:00
Robert Haas 9c487efe35 Fix documentation build for c3afe8cf5a.
This documentation hunk was intended to be part of that commit,
but I goofed.
2023-03-30 12:06:34 -04:00
Robert Haas c3afe8cf5a Add new predefined role pg_create_subscription.
This role can be granted to non-superusers to allow them to issue
CREATE SUBSCRIPTION. The non-superuser must additionally have CREATE
permissions on the database in which the subscription is to be
created.

Most forms of ALTER SUBSCRIPTION, including ALTER SUBSCRIPTION .. SKIP,
now require only that the role performing the operation own the
subscription, or inherit the privileges of the owner. However, to
use ALTER SUBSCRIPTION ... RENAME or ALTER SUBSCRIPTION ... OWNER TO,
you also need CREATE permission on the database. This is similar to
what we do for schemas. To change the owner of a schema, you must also
have permission to SET ROLE to the new owner, similar to what we do
for other object types.

Non-superusers are required to specify a password for authentication
and the remote side must use the password, similar to what is required
for postgres_fdw and dblink.  A superuser who wants a non-superuser to
own a subscription that does not rely on password authentication may
set the new password_required=false property on that subscription. A
non-superuser may not set password_required=false and may not modify a
subscription that already has password_required=false.

This new password_required subscription property works much like the
eponymous postgres_fdw property.  In both cases, the actual semantics
are that a password is not required if either (1) the property is set
to false or (2) the relevant user is the superuser.

Patch by me, reviewed by Andres Freund, Jeff Davis, Mark Dilger,
and Stephen Frost (but some of those people did not fully endorse
all of the decisions that the patch makes).

Discussion: http://postgr.es/m/CA+TgmoaDH=0Xj7OBiQnsHTKcF2c4L+=gzPBUKSJLh8zed2_+Dg@mail.gmail.com
2023-03-30 11:37:19 -04:00
Tom Lane b0e9e4d76c Avoid overflow in width_bucket_float8().
The original coding of this function paid little attention to the
possibility of overflow.  There were actually three different hazards:

1. The range from bound1 to bound2 could exceed DBL_MAX, which on
IEEE-compliant machines produces +Infinity in the subtraction.
At best we'd lose all precision in the result, and at worst
produce NaN due to dividing Inf/Inf.  The range can't exceed
twice DBL_MAX though, so we can fix this case by scaling all the
inputs by 0.5.

2. We computed count * (operand - bound1), which is also at risk of
float overflow, before dividing.  Safer is to do the division first,
producing a quotient that should be in [0,1), and even after allowing
for roundoff error can't be outside [0,1]; then multiplying by count
can't produce a result overflowing an int.  (width_bucket_numeric does
the multiplication first on the grounds that that improves accuracy of
its result, but I don't think that a similar argument can be made in
float arithmetic.)

3. If the division result does round to 1, and count is INT_MAX,
the final addition of 1 would overflow an int.  We took care
of that in the operand >= bound2 case but did not consider that
it could be possible in the main path.  Fix that by moving the
overflow-aware addition of 1 so it is done that way in all cases.

The fix for point 2 creates a possibility that values very close to
a bucket boundary will be rounded differently than they were before.
I'm not troubled by that for HEAD, but it is an argument against
putting this into the stable branches.  Given that the cases being
fixed here are fairly extreme and unlikely to be hit in normal use,
it seems best not to back-patch.

Mats Kindahl and Tom Lane

Discussion: https://postgr.es/m/17876-61f280d1601f978d@postgresql.org
2023-03-30 11:27:36 -04:00
Daniel Gustafsson 2fe7a6df94 Fix pointer cast for seed calculation on 32-bit systems
The fallback seed for when pg_strong_random cannot generate a high
quality seed mixes in the address of the conn object, but the cast
failed to take the word size into consideration. Fix by casting to
a uintptr_t instead. The seed calculation was added in 7f5b19817e.

The code as it stood generated the following warning on mamba and
lapwing in the buildfarm:

fe-connect.c: In function 'libpq_prng_init':
fe-connect.c:1048:11: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
1048 |  rseed = ((uint64) conn) ^
     |           ^

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/TYAPR01MB58665250EDCD551CCA9AD117F58E9@TYAPR01MB5866.jpnprd01.prod.outlook.com
2023-03-30 10:53:15 +02:00
Peter Eisentraut 261cf8962b Fix incorrect format placeholders 2023-03-30 08:33:43 +02:00
Amit Kapila da324d6cd4 Refactor pgoutput_change().
Instead of mostly-duplicate code for different operation
(insert/update/delete) types, write a common code to compute old/new
tuples, and check the row filter.

Author: Hou Zhijie
Reviewed-by: Peter Smith, Amit Kapila
Discussion: https://postgr.es/m/OS0PR01MB5716194A47FFA8D91133687D94BF9@OS0PR01MB5716.jpnprd01.prod.outlook.com
2023-03-30 11:10:38 +05:30
David Rowley 902ecd3bd4 Fix outdated comments regarding TupleTableSlots
The tts_flag is named TTS_FLAG_SHOULDFREE, so use that instead of
TTS_SHOULDFREE, which is the name of the macro that checks for that flag.

Additionally, 4da597edf got rid of the TupleTableSlot.tts_tuple field but
forgot to update a comment which referenced that field.  Fix that.

Reported-by: Zhen Mingyang <zhenmingyang@yeah.net>
Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/1a96696c.9d3.187193989c3.Coremail.zhenmingyang@yeah.net
2023-03-30 16:37:03 +13:00