Commit Graph

10282 Commits

Author SHA1 Message Date
Robert Haas 5ef1eefd76 Allow archiving via loadable modules.
Running a shell command for each file to be archived has a lot of
overhead and may not offer as much error checking as you want, or the
exact semantics that you want. So, offer the option to call a loadable
module for each file to be archived, rather than running a shell command.

Also, add a 'basic_archive' contrib module as an example implementation
that archives to a local directory.

Nathan Bossart, with a little bit of kibitzing by me.

Discussion: http://postgr.es/m/20220202224433.GA1036711@nathanxps13
2022-02-03 14:05:02 -05:00
Peter Eisentraut 94aa7cc5f7 Add UNIQUE null treatment option
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.

Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com
2022-02-03 11:48:21 +01:00
Tom Lane 4b0e37faaf Remove configure's check for rl_completion_append_character.
The comment for PGAC_READLINE_VARIABLES says "Readline versions < 2.1
don't have rl_completion_append_character".  It seems certain that such
versions are extinct in the wild, though; for sure there are none in the
buildfarm.  Libedit has had this variable for at least twenty years too.
Also, tab-complete.c's behavior without it is quite unfriendly, since
we'll emit a space even when completion fails; but we've had no
complaints about that.

Therefore, let's assume this variable is always there, and drop the
configure check to save a few build cycles.

Discussion: https://postgr.es/m/147685.1643858911@sss.pgh.pa.us
2022-02-02 23:01:56 -05:00
Peter Eisentraut 87669de72c Some cleanup for change of collate and ctype fields to type text
Some cleanup for commit 54637508f87bd5f07fb9406bac6b08240283be3b:
Reformat pg_database.dat to reflect the new field order.  Also update
the corresponding example in bki.sgml.  Reorder the way the fields are
filled in dbcommands.c to correspond to the new order.
2022-02-02 11:58:55 +01:00
John Naylor 0526f2f4c3 Fix missing undefine in sort_template.h
All parameter macros are supposed to be undefined at the end of the
header. ST_CHECK_FOR_INTERRUPTS was forgotten, so could affect later
inclusions.

Thomas Munro

The patch set of which this is a part is discussed in
https://www.postgresql.org/message-id/CA%2BhUKGLPommgNw-SVwUGkw1YmTDwmJ5vSKO0kFnZfbRHtNFW5w%40mail.gmail.com
2022-01-31 15:10:01 -05:00
Michael Paquier d10e41d423 Introduce pg_settings_get_flags() to find flags associated to a GUC
The most meaningful flags are shown, which are the ones useful for the
user and for automating and extending the set of tests supported
currently by check_guc.

This script may actually be removed in the future, but we are not
completely sure yet if and how we want to support the remaining sanity
checks performed there, that are now integrated in the main regression
test suite as of this commit.

Thanks also to Peter Eisentraut and Kyotaro Horiguchi for the
discussion.

Bump catalog version.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20211129030833.GJ17618@telsasoft.com
2022-01-31 08:56:41 +09:00
Alvaro Herrera b3d7d6e462
Remove xloginsert.h from xlog.h
xlog.h is directly and indirectly #included in a lot of places.  With
this change, xloginsert.h is no longer unnecessarily included in the
large number of them that don't need it.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CALj2ACVe-W+WM5P44N7eG9C2_FmaeM8Dq5aCnD3fHt0Ba=WR6w@mail.gmail.com
2022-01-30 12:25:24 -03:00
Tom Lane 8e2e0f7586 Fix failure to validate the result of select_common_type().
Although select_common_type() has a failure-return convention, an
apparent successful return just provides a type OID that *might* work
as a common supertype; we've not validated that the required casts
actually exist.  In the mainstream use-cases that doesn't matter,
because we'll proceed to invoke coerce_to_common_type() on each input,
which will fail appropriately if the proposed common type doesn't
actually work.  However, a few callers didn't read the (nonexistent)
fine print, and thought that if they got back a nonzero OID then the
coercions were sure to work.

This affects in particular the recently-added "anycompatible"
polymorphic types; we might think that a function/operator using
such types matches cases it really doesn't.  A likely end result
of that is unexpected "ambiguous operator" errors, as for example
in bug #17387 from James Inform.  Another, much older, case is that
the parser might try to transform an "x IN (list)" construct to
a ScalarArrayOpExpr even when the list elements don't actually have
a common supertype.

It doesn't seem desirable to add more checking to select_common_type
itself, as that'd just slow down the mainstream use-cases.  Instead,
write a separate function verify_common_type that performs the
missing checks, and add a call to that where necessary.  Likewise add
verify_common_type_from_oids to go with select_common_type_from_oids.

Back-patch to v13 where the "anycompatible" types came in.  (The
symptom complained of in bug #17387 doesn't appear till v14, but
that's just because we didn't get around to converting || to use
anycompatible till then.)  In principle the "x IN (list)" fix could
go back all the way, but I'm not currently convinced that it makes
much difference in real-world cases, so I won't bother for now.

Discussion: https://postgr.es/m/17387-5dfe54b988444963@postgresql.org
2022-01-29 11:41:18 -05:00
Robert Haas aeb4cc9ea0 Move the code to archive files via the shell to a separate file.
This is preparatory work for allowing more extensibility in this area.

Nathan Bossart

Discussion: http://postgr.es/m/668D2428-F73B-475E-87AE-F89D67942270@amazon.com
2022-01-28 13:29:32 -05:00
Peter Eisentraut 43f33dc018 Add HEADER support to COPY text format
The COPY CSV format supports the HEADER option to output a header
line.  This patch adds the same option to the default text format.  On
input, the HEADER option causes the first line to be skipped, same as
with CSV.

Author: Rémi Lapeyre <remi.lapeyre@lenstra.fr>
Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
2022-01-28 09:44:47 +01:00
Tomas Vondra f192e1bdf3 Fix ordering of XIDs in ProcArrayApplyRecoveryInfo
Commit 8431e296ea reworked ProcArrayApplyRecoveryInfo to sort XIDs
before adding them to KnownAssignedXids. But the XIDs are sorted using
xidComparator, which compares the XIDs simply as uint32 values, not
logically. KnownAssignedXidsAdd() however expects XIDs in logical order,
and calls TransactionIdFollowsOrEquals() to enforce that. If there are
XIDs for which the two orderings disagree, an error is raised and the
recovery fails/restarts.

Hitting this issue is fairly easy - you just need two transactions, one
started before the 4B limit (e.g. XID 4294967290), the other sometime
after it (e.g. XID 1000). Logically (4294967290 <= 1000) but when
compared using xidComparator we try to add them in the opposite order.
Which makes KnownAssignedXidsAdd() fail with an error like this:

  ERROR: out-of-order XID insertion in KnownAssignedXids

This only happens during replica startup, while processing RUNNING_XACTS
records to build the snapshot. Once we reach STANDBY_SNAPSHOT_READY, we
skip these records. So this does not affect already running replicas,
but if you restart (or create) a replica while there are transactions
with XIDs for which the two orderings disagree, you may hit this.

Long-running transactions and frequent replica restarts increase the
likelihood of hitting this issue. Once the replica gets into this state,
it can't be started (even if the old transactions are terminated).

Fixed by sorting the XIDs logically - this is fine because we're dealing
with normal XIDs (because it's XIDs assigned to backends) and from the
same wraparound epoch (otherwise the backends could not be running at
the same time on the primary node). So there are no problems with the
triangle inequality, which is why xidComparator compares raw values.

Investigation and root cause analysis by Abhijit Menon-Sen. Patch by me.

This issue is present in all releases since 9.4, however releases up to
9.6 are EOL already so backpatch to 10 only.

Reviewed-by: Abhijit Menon-Sen
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/36b8a501-5d73-277c-4972-f58a4dce088a%40enterprisedb.com
2022-01-27 20:13:55 +01:00
Peter Eisentraut 54637508f8 Change collate and ctype fields to type text
This changes the data type of the catalog fields datcollate, datctype,
collcollate, and collctype from name to text.  There wasn't ever a
really good reason for them to be of type name; presumably this was
just carried over from when they were fixed-size fields in pg_control,
first into the corresponding pg_database fields, and then to
pg_collation.  The values are not identifiers or object names, and we
don't ever look them up that way.

Changing to type text saves space in the typical case, since locale
names are typically only a few bytes long.  But it is also possible
that an ICU locale name with several customization options appended
could be longer than 63 bytes, so this also enables that case, which
was previously probably broken.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a@2ndquadrant.com
2022-01-27 08:54:25 +01:00
Michael Paquier 410aa248e5 Fix various typos, grammar and code style in comments and docs
This fixes a set of issues that have accumulated over the past months
(or years) in various code areas.  Most fixes are related to some recent
additions, as of the development of v15.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com
2022-01-25 09:40:04 +09:00
Tom Lane 6aa5186146 Fix limitations on what SQL commands can be issued to a walsender.
In logical replication mode, a WalSender is supposed to be able
to execute any regular SQL command, as well as the special
replication commands.  Poor design of the replication-command
parser caused it to fail in various cases, notably:

* semicolons embedded in a command, or multiple SQL commands
sent in a single message;

* dollar-quoted literals containing odd numbers of single
or double quote marks;

* commands starting with a comment.

The basic problem here is that we're trying to run repl_scanner.l
across the entire input string even when it's not a replication
command.  Since repl_scanner.l does not understand all of the
token types known to the core lexer, this is doomed to have
failure modes.

We certainly don't want to make repl_scanner.l as big as scan.l,
so instead rejigger stuff so that we only lex the first token of
a non-replication command.  That will usually look like an IDENT
to repl_scanner.l, though a comment would end up getting reported
as a '-' or '/' single-character token.  If the token is a replication
command keyword, we push it back and proceed normally with repl_gram.y
parsing.  Otherwise, we can drop out of exec_replication_command()
without examining the rest of the string.

(It's still theoretically possible for repl_scanner.l to fail on
the first token; but that could only happen if it's an unterminated
single- or double-quoted string, in which case you'd have gotten
largely the same error from the core lexer too.)

In this way, repl_gram.y isn't involved at all in handling general
SQL commands, so we can get rid of the SQLCmd node type.  (In
the back branches, we can't remove it because renumbering enum
NodeTag would be an ABI break; so just leave it sit there unused.)

I failed to resist the temptation to clean up some other sloppy
coding in repl_scanner.l while at it.  The only externally-visible
behavior change from that is it now accepts \r and \f as whitespace,
same as the core lexer.

Per bug #17379 from Greg Rychlewski.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org
2022-01-24 15:33:38 -05:00
Robert Haas 0ad8032910 Server-side gzip compression.
pg_basebackup's --compression option now lets you write either
"client-gzip" or "server-gzip" instead of just "gzip" to specify
where the compression should be performed. If you write simply
"gzip" it's taken to mean "client-gzip" unless you also use
--target, in which case it is interpreted to mean "server-gzip",
because that's the only thing that makes any sense in that case.

To make this work, the BASE_BACKUP command now takes new
COMPRESSION and COMPRESSION_LEVEL options.

At present, pg_basebackup cannot decompress .gz files, so
server-side compression will cause a failure if (1) -Ft is not
used or (2) -R is used or (3) -D- is used without --no-manifest.

Along the way, I removed the information message added by commit
5c649fe153 which occurred if you
specified no compression level and told you that the default level
had been used instead. That seemed like more output than most
people would want.

Also along the way, this adds a check to the server for
unrecognized base backup options. This repairs a bug introduced
by commit 0ba281cb4b.

This commit also adds some new test cases for pg_verifybackup.
They take a server-side backup with and without compression, and
then extract the backup if we have the OS facilities available
to do so, and then run pg_verifybackup on the extracted
directory. That is a good test of the functionality added by
this commit and also improves test coverage for the backup target
patch (commit 3500ccc39b) and for
pg_verifybackup itself.

Patch by me, with a bug fix by Jeevan Ladhe.  The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.

Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-24 15:13:18 -05:00
Robert Haas aa01051418 pg_upgrade: Preserve database OIDs.
Commit 9a974cbcba arranged to preserve
relfilenodes and tablespace OIDs. For similar reasons, also arrange
to preserve database OIDs.

One problem is that, up until now, the OIDs assigned to the template0
and postgres databases have not been fixed. This could be a problem
when upgrading, because pg_upgrade might try to migrate a database
from the old cluster to the new cluster while keeping the OID and find
a different database with that OID, resulting in a failure. If it finds
a database with the same name and the same OID that's OK: it will be
dropped and recreated. But the same OID and a different name is a
problem.

To prevent that, fix the OIDs for postgres and template0 to specific
values less than 16384. To avoid running afoul of this rule, these
values should not be changed in future releases. It's not a problem
that these OIDs aren't fixed in existing releases, because the OIDs
that we're assigning here weren't used for either of these databases
in any previous release. Thus, there's no chance that an upgrade of
a cluster from any previous release will collide with the OIDs we're
assigning here. And going forward, the OIDs will always be fixed, so
the only potential collision is with a system database having the
same name and the same OID, which is OK.

This patch lets users assign a specific OID to a database as well,
provided however that it can't be less than 16384. I (rhaas) thought
it might be better not to expose this capability to users, but the
consensus was otherwise, so the syntax is documented. Letting users
assign OIDs below 16384 would not be OK, though, because a
user-created database with a low-numbered OID might collide with a
system-created database in a future release. We therefore prohibit
that.

Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.

Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
Discussion: http://postgr.es/m/CAASxf_Mnwm1Dh2vd5FAhVX6S1nwNSZUB1z12VddYtM++H2+p7w@mail.gmail.com
2022-01-24 14:23:43 -05:00
Robert Haas 3500ccc39b Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command.  If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.

On the server side, we now support two additional types of backup
targets.  There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.

Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.

Patch by me, with a bug fix by Jeevan Ladhe.  The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-20 10:46:33 -05:00
Robert Haas ab4fd4f868 Remove 'datlastsysoid'.
It hasn't been used for anything for a long time. Up until recently,
we still queried it when dumping very old servers, but since
commit 30e7c175b8, there's no longer any
code at all that cares about it.

Discussion: http://postgr.es/m/CA+Tgmoa14=BRq0WEd0eevjEMn9EkghDB1FZEkBw7+UAb7tF49A@mail.gmail.com
2022-01-20 09:01:12 -05:00
Jeff Davis 7a5f6b4748 Make logical decoding a part of the rmgr.
Add a new rmgr method, rm_decode, and use that rather than a switch
statement.

In preparation for rmgr extensibility.

Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
Discussion: https://postgr.es/m/20220118095332.6xtlcjoyxobv6cbk@jrouhaud
2022-01-19 14:58:49 -08:00
Tom Lane a3d6264bbc interval_out() must be marked STABLE, not IMMUTABLE.
Its results vary depending on the IntervalStyle GUC, so it cannot
be considered immutable.

This is an extremely ancient bug.  AFAICT it was a sloppy mistake
in 6f58115dd, which marked it "cacheable" alongside marking several
other interval functions that way.  At the time, interval_out()
depended on DateStyle not IntervalStyle, but it was still wrong.

Back-patching this change doesn't look very practical, so I won't.
Aside from the usual difficulties of getting catalog changes
applied to existing databases, people might have indexes,
generated columns, etc that depend on interval-to-text casts
being considered immutable.  (This'd not really give them any
problem as long as they never change IntervalStyle.)  They
wouldn't appreciate us breaking such usage in minor releases.

Per bug #17371 from Marcus Gartner.

Discussion: https://postgr.es/m/17371-8f57e6e9ca5e35bf@postgresql.org
2022-01-19 17:17:55 -05:00
Robert Haas cc333f3233 Modify pg_basebackup to use a new COPY subprotocol for base backups.
In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.

The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.

The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.

Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-18 13:47:49 -05:00
Robert Haas 9a974cbcba pg_upgrade: Preserve relfilenodes and tablespace OIDs.
Currently, database OIDs, relfilenodes, and tablespace OIDs can all
change when a cluster is upgraded using pg_upgrade. It seems better
to preserve them, because (1) it makes troubleshooting pg_upgrade
easier, since you don't have to do a lot of work to match up files
in the old and new clusters, (2) it allows 'rsync' to save bandwidth
when used to re-sync a cluster after an upgrade, and (3) if we ever
encrypt or sign blocks, we would likely want to use a nonce that
depends on these values.

This patch only arranges to preserve relfilenodes and tablespace
OIDs. The task of preserving database OIDs is left for another patch,
since it involves some complexities that don't exist in these cases.

Database OIDs have a similar issue, but there are some tricky points
in that case that do not apply to these cases, so that problem is left
for another patch.

Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.

Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
2022-01-17 13:40:27 -05:00
Peter Eisentraut 941460fcf7 Add Boolean node
Before, SQL-level boolean constants were represented by a string with
a cast, and internal Boolean values in DDL commands were usually
represented by Integer nodes.  This takes the place of both of these
uses, making the intent clearer and having some amount of type safety.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/8c1a2e37-c68d-703c-5a83-7a6077f4f997@enterprisedb.com
2022-01-17 10:38:23 +01:00
Michael Paquier dc686681e0 Introduce log_destination=jsonlog
"jsonlog" is a new value that can be added to log_destination to provide
logs in the JSON format, with its output written to a file, making it
the third type of destination of this kind, after "stderr" and
"csvlog".  The format is convenient to feed logs to other applications.
There is also a plugin external to core that provided this feature using
the hook in elog.c, but this had to overwrite the output of "stderr" to
work, so being able to do both at the same time was not possible.  The
files generated by this log format are suffixed with ".json", and use
the same rotation policies as the other two formats depending on the
backend configuration.

This takes advantage of the refactoring work done previously in ac7c807,
bed6ed3, 8b76f89 and 2d77d83 for the backend parts, and 72b76f7 for the
TAP tests, making the addition of any new file-based format rather
straight-forward.

The documentation is updated to list all the keys and the values that
can exist in this new format.  pg_current_logfile() also required a
refresh for the new option.

Author: Sehrope Sarkuni, Michael Paquier
Reviewed-by: Nathan Bossart, Justin Pryzby
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
2022-01-17 10:16:53 +09:00
Tomas Vondra 269b532aef Add stxdinherit flag to pg_statistic_ext_data
Add pg_statistic_ext_data.stxdinherit flag, so that for each extended
statistics definition we can store two versions of data - one for the
relation alone, one for the whole inheritance tree. This is analogous to
pg_statistic.stainherit, but we failed to include such flag in catalogs
for extended statistics, and we had to work around it (see commits
859b3003de, 36c4bc6e72 and 20b9fa308e).

This changes the relationship between the two catalogs storing extended
statistics objects (pg_statistic_ext and pg_statistic_ext_data). Until
now, there was a simple 1:1 mapping - for each definition there was one
pg_statistic_ext_data row, and this row was inserted while creating the
statistics (and then updated during ANALYZE). With the stxdinherit flag,
we don't know how many rows there will be (child relations may be added
after the statistics object is defined), so there may be up to two rows.

We could make CREATE STATISTICS to always create both rows, but that
seems wasteful - without partitioning we only need stxdinherit=false
rows, and declaratively partitioned tables need only stxdinherit=true.
So we no longer initialize pg_statistic_ext_data in CREATE STATISTICS,
and instead make that a responsibility of ANALYZE. Which is what we do
for regular statistics too.

Patch by me, with extensive improvements and fixes by Justin Pryzby.

Author: Tomas Vondra, Justin Pryzby
Reviewed-by: Tomas Vondra, Justin Pryzby
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
2022-01-16 13:38:01 +01:00
Peter Geoghegan 49c9d9fcfa Unify VACUUM VERBOSE and autovacuum logging.
The log_autovacuum_min_duration instrumentation used its own dedicated
code for logging, which was not reused by VACUUM VERBOSE.  This was
highly duplicative, and sometimes led to each code path using slightly
different accounting for essentially the same information.

Clean things up by making VACUUM VERBOSE reuse the same instrumentation
code.  This code restructuring changes the structure of the VACUUM
VERBOSE output itself, but that seems like an overall improvement.  The
most noticeable change in VACUUM VERBOSE output is that it no longer
outputs a distinct message per index per round of index vacuuming.  Most
of the same information (about each index) is now shown in its new
per-operation summary message.  This is far more legible.

A few details are no longer displayed by VACUUM VERBOSE, but that's no
real loss in practice, especially in the common case where we don't need
multiple index scans/rounds of vacuuming.  This super fine-grained
information is still available via DEBUG2 messages, which might still be
useful in debugging scenarios.

VACUUM VERBOSE now shows new instrumentation, which is typically very
useful: all of the log_autovacuum_min_duration instrumentation that it
missed out on before now.  This includes information about WAL overhead,
buffers hit/missed/dirtied information, and I/O timing information.

VACUUM VERBOSE still retains a few INFO messages of its own.  This is
limited to output concerning the progress of heap rel truncation, as
well as some basic information about parallel workers.  These details
are still potentially quite useful.  They aren't a good fit for the log
output, which must summarize the whole operation.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzmW4Me7_qR4X4ka7pxP-jGmn7=Npma_-Z-9Y1eD0MQRLw@mail.gmail.com
2022-01-14 16:50:34 -08:00
Thomas Munro 7170f2159f Allow "in place" tablespaces.
Provide a developer-only GUC allow_in_place_tablespaces, disabled by
default.  When enabled, tablespaces can be created with an empty
LOCATION string, meaning that they should be created as a directory
directly beneath pg_tblspc.  This can be used for new testing scenarios,
in a follow-up patch.  Not intended for end-user usage, since it might
confuse backup tools that expect symlinks.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGKpRWQ9SxdxxDmTBCJoR0YnFpMBe7kyzY8SUQk%2BHeskxg%40mail.gmail.com
2022-01-15 00:09:24 +13:00
Peter Eisentraut c4cc2850f4 Rename value node fields
For the formerly-Value node types, rename the "val" field to a name
specific to the node type, namely "ival", "fval", "sval", and "bsval".
This makes some code clearer and catches mixups better.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/8c1a2e37-c68d-703c-5a83-7a6077f4f997@enterprisedb.com
2022-01-14 11:26:08 +01:00
Michael Paquier 5513dc6a30 Improve error handling of HMAC computations
This is similar to b69aba7, except that this completes the work for
HMAC with a new routine called pg_hmac_error() that would provide more
context about the type of error that happened during a HMAC computation:
- The fallback HMAC implementation in hmac.c relies on cryptohashes, so
in some code paths it is necessary to return back the error generated by
cryptohashes.
- For the OpenSSL implementation (hmac_openssl.c), the logic is very
similar to cryptohash_openssl.c, where the error context comes from
OpenSSL if one of its internal routines failed, with different error
codes if something internal to hmac_openssl.c failed or was incorrect.

Any in-core code paths that use the centralized HMAC interface are
related to SCRAM, for errors that are unlikely going to happen, with
only SHA-256.  It would be possible to see errors when computing some
HMACs with MD5 for example and OpenSSL FIPS enabled, and this commit
would help in reporting the correct errors but nothing in core uses
that.  So, at the end, no backpatch to v14 is done, at least for now.

Errors in SCRAM related to the computation of the server key, stored
key, etc. need to pass down the potential error context string across
more layers of their respective call stacks for the frontend and the
backend, so each surrounding routine is adapted for this purpose.

Reviewed-by: Sergey Shinderuk
Discussion: https://postgr.es/m/Yd0N9tSAIIkFd+qi@paquier.xyz
2022-01-13 16:17:21 +09:00
Peter Geoghegan db6736c93c Fix memory leak in indexUnchanged hint mechanism.
Commit 9dc718bd added a "logically unchanged by UPDATE" hinting
mechanism, which is currently used within nbtree indexes only (see
commit d168b666).  This mechanism determined whether or not the incoming
item is a logically unchanged duplicate (a duplicate needed only for
MVCC versioning purposes) once per row updated per non-HOT update.  This
approach led to memory leaks which were noticeable with an UPDATE
statement that updated sufficiently many rows, at least on tables that
happen to have an expression index.

On HEAD, fix the issue by adding a cache to the executor's per-index
IndexInfo struct.

Take a different approach on Postgres 14 to avoid an ABI break: simply
pass down the hint to all indexes unconditionally with non-HOT UPDATEs.
This is deemed acceptable because the hint is currently interpreted
within btinsert() as "perform a bottom-up index deletion pass if and
when the only alternative is splitting the leaf page -- prefer to delete
any LP_DEAD-set items first".  nbtree must always treat the hint as a
noisy signal about what might work, as a strategy of last resort, with
costs imposed on non-HOT updaters.  (The same thing might not be true
within another index AM that applies the hint, which is why the original
behavior is preserved on HEAD.)

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Klaudie Willis <Klaudie.Willis@protonmail.com>
Diagnosed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/261065.1639497535@sss.pgh.pa.us
Backpatch: 14-, where the hinting mechanism was added.
2022-01-12 15:41:04 -08:00
Alvaro Herrera 025b920a3d
Add index on pg_publication_rel.prpubid
This should have been added for the benefit of GetPublicationRelations;
let's add it now.

I couldn't measure a performance difference in the TAP tests, but that
may be because the tests use very few publications.

Discussion: https://postgr.es/m/202201120041.p24wvsfcsope@alvherre.pgsql
2022-01-12 16:24:26 -03:00
Michael Paquier ac7c80758a Refactor set of routines specific to elog.c
This refactors the following routines and facilities coming from
elog.c, to ease their use across multiple log destinations:
- Start timestamp, including its reset, to store when a process has been
started.
- The log timestamp, associated to an entry (the same timestamp is used
when logging across multiple destinations).
- Routine deciding if a query can be logged or not.
- The backend type names, depending on the process that logs any
information (postmaster, bgworker name or just GetBackendTypeDesc() with
a regular backend).
- Write of logs using the logging piped protocol, with the log collector
enabled.
- Error severity converted to a string.

These refactored routines will be used for some follow-up changes
to move all the csvlog logic into its own file and to potentially add
JSON as log destination, reducing the overall size of elog.c as the end
result.

Author: Michael Paquier, Sehrope Sarkuni
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
2022-01-12 14:16:59 +09:00
Thomas Munro af9e6331ae Add missing include guard to win32ntdll.h.
Oversight in commit e2f0f8ed.  Also add this file to the exclusion lists
in headerscheck and cpluscpluscheck, because Unix systems don't have a
header it includes.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2760528.1641929756%40sss.pgh.pa.us
2022-01-12 10:19:00 +13:00
Tom Lane 98e93a1fc9 Clean up messy API for src/port/thread.c.
The point of this patch is to reduce inclusion spam by not needing
to #include <netdb.h> or <pwd.h> in port.h (which is read by every
compile in our tree).  To do that, we must remove port.h's
declarations of pqGetpwuid and pqGethostbyname.

pqGethostbyname is only used, and is only ever likely to be used,
in src/port/getaddrinfo.c --- which isn't even built on most
platforms, making pqGethostbyname dead code for most people.
Hence, deal with that by just moving it into getaddrinfo.c.

To clean up pqGetpwuid, invent a couple of simple wrapper
functions with less-messy APIs.  This allows removing some
duplicate error-handling code, too.

In passing, remove thread.c from the MSVC build, since it
contains nothing we use on Windows.

Noted while working on 376ce3e40.

Discussion: https://postgr.es/m/1634252654444.90107@mit.edu
2022-01-11 13:46:20 -05:00
Michael Paquier b69aba7457 Improve error handling of cryptohash computations
The existing cryptohash facility was causing problems in some code paths
related to MD5 (frontend and backend) that relied on the fact that the
only type of error that could happen would be an OOM, as the MD5
implementation used in PostgreSQL ~13 (the in-core implementation is
used when compiling with or without OpenSSL in those older versions),
could fail only under this circumstance.

The new cryptohash facilities can fail for reasons other than OOMs, like
attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to
1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this
would cause incorrect reports to show up.

This commit extends the cryptohash APIs so as callers of those routines
can fetch more context when an error happens, by using a new routine
called pg_cryptohash_error().  The error states are stored within each
implementation's internal context data, so as it is possible to extend
the logic depending on what's suited for an implementation.  The default
implementation requires few error states, but OpenSSL could report
various issues depending on its internal state so more is needed in
cryptohash_openssl.c, and the code is shaped so as we are always able to
grab the necessary information.

The core code is changed to adapt to the new error routine, painting
more "const" across the call stack where the static errors are stored,
particularly in authentication code paths on variables that provide
log details.  This way, any future changes would warn if attempting to
free these strings.  The MD5 authentication code was also a bit blurry
about the handling of "logdetail" (LOG sent to the postmaster), so
improve the comments related that, while on it.

The origin of the problem is 87ae969, that introduced the centralized
cryptohash facility.  Extra changes are done for pgcrypto in v14 for the
non-OpenSSL code path to cope with the improvements done by this
commit.

Reported-by: Michael Mühlbeyer
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com
Backpatch-through: 14
2022-01-11 09:55:16 +09:00
Thomas Munro f3e78069db Make EXEC_BACKEND more convenient on Linux and FreeBSD.
Try to disable ASLR when building in EXEC_BACKEND mode, to avoid random
memory mapping failures while testing.  For developer use only, no
effect on regular builds.

Suggested-by: Andres Freund <andres@anarazel.de>
Tested-by: Bossart, Nathan <bossartn@amazon.com>
Discussion: https://postgr.es/m/20210806032944.m4tz7j2w47mant26%40alap3.anarazel.de
2022-01-11 00:04:33 +13:00
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Alvaro Herrera f4566345cf
Create foreign key triggers in partitioned tables too
While user-defined triggers defined on a partitioned table have
a catalog definition for both it and its partitions, internal
triggers used by foreign keys defined on partitioned tables only
have a catalog definition for its partitions.  This commit fixes
that so that partitioned tables get the foreign key triggers too,
just like user-defined triggers.  Moreover, like user-defined
triggers, partitions' internal triggers will now also have their
tgparentid set appropriately.  This is to allow subsequent commit(s)
to make the foreign key related events to be fired in some cases
using the parent table triggers instead of those of partitions'.

This also changes what tgisinternal means in some cases.  Currently,
it means either that the trigger is an internal implementation object
of a foreign key constraint, or a "child" trigger on a partition
cloned from the trigger on the parent.  This commit changes it to
only mean the former to avoid confusion.  As for the latter, it can
be told by tgparentid being nonzero, which is now true both for user-
defined and foreign key's internal triggers.

Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Arne Roland <A.Roland@index.de>
Discussion: https://postgr.es/m/CA+HiwqG7LQSK+n8Bki8tWv7piHD=PnZro2y6ysU2-28JS6cfgQ@mail.gmail.com
2022-01-05 19:00:13 -03:00
Tom Lane 9a3ddeb519 Fix index-only scan plans, take 2.
Commit 4ace45677 failed to fix the problem fully, because the
same issue of attempting to fetch a non-returnable index column
can occur when rechecking the indexqual after using a lossy index
operator.  Moreover, it broke EXPLAIN for such indexquals (which
indicates a gap in our test cases :-().

Revert the code changes of 4ace45677 in favor of adding a new field
to struct IndexOnlyScan, containing a version of the indexqual that
can be executed against the index-returned tuple without using any
non-returnable columns.  (The restrictions imposed by check_index_only
guarantee this is possible, although we may have to recompute indexed
expressions.)  Support construction of that during setrefs.c
processing by marking IndexOnlyScan.indextlist entries as resjunk
if they can't be returned, rather than removing them entirely.
(We could alternatively require setrefs.c to look up the IndexOptInfo
again, but abusing resjunk this way seems like a reasonably safe way
to avoid needing to do that.)

This solution isn't great from an API-stability standpoint: if there
are any extensions out there that build IndexOnlyScan structs directly,
they'll be broken in the next minor releases.  However, only a very
invasive extension would be likely to do such a thing.  There's no
change in the Path representation, so typical planner extensions
shouldn't have a problem.

As before, back-patch to all supported branches.

Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us
Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org
2022-01-03 15:42:27 -05:00
Tom Lane ba2bc4a7ba Use MaxLockMode symbol in more places.
As long as we have this macro, it makes sense to use it in
the LockMethodData structures.

Julien Rouhaud

Discussion: https://postgr.es/m/20220103064722.ewdv4evlez5m7mdn@jrouhaud
2022-01-03 12:24:44 -05:00
Alvaro Herrera 9623d89996
Avoid using DefElemAction in AlterPublicationStmt
Create a new enum type for it.  This allows to add new values for future
functionality without disrupting unrelated uses of DefElem.

Discussion: https://postgr.es/m/202112302021.ca7ihogysgh3@alvherre.pgsql
2022-01-03 10:48:48 -03:00
Tom Lane 4ace456776 Fix index-only scan plans when not all index columns can be returned.
If an index has both returnable and non-returnable columns, and one of
the non-returnable columns is an expression using a Var that is in a
returnable column, then a query returning that expression could result
in an index-only scan plan that attempts to read the non-returnable
column, instead of recomputing the expression from the returnable
column as intended.

To fix, redefine the "indextlist" list of an IndexOnlyScan plan node
as containing null Consts in place of any non-returnable columns.
This solves the problem by preventing setrefs.c from falsely matching
to such entries.  The executor is happy since it only cares about the
exposed types of the entries, and ruleutils.c doesn't care because a
correct plan won't reference those entries.  I considered some other
ways to prevent setrefs.c from doing the wrong thing, but this way
seems good since (a) it allows a very localized fix, (b) it makes
the indextlist structure more compact in many cases, and (c) the
indextlist is now a more faithful representation of what the index AM
will actually produce, viz. nulls for any non-returnable columns.

This is easier to hit since we introduced included columns, but it's
possible to construct failing examples without that, as per the
added regression test.  Hence, back-patch to all supported branches.

Per bug #17350 from Louis Jachiet.

Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org
2022-01-01 16:12:03 -05:00
Alvaro Herrera c9105dd366
Small cleanups related to PUBLICATION framework code
Discussion: https://postgr.es/m/202112302021.ca7ihogysgh3@alvherre.pgsql
2021-12-30 19:24:26 -03:00
Tom Lane cab5b9ab2c Revert changes about warnings/errors for placeholders.
Revert commits 5609cc01c, 2ed8a8cc5, and 75d22069e until we have
a less broken idea of how this should work in parallel workers.
Per buildfarm.

Discussion: https://postgr.es/m/1640909.1640638123@sss.pgh.pa.us
2021-12-27 16:01:10 -05:00
Tom Lane 5609cc01c6 Rename EmitWarningsOnPlaceholders() to MarkGUCPrefixReserved().
This seems like a clearer name for what it does now.

Provide a compatibility macro so that extensions don't have to convert
to the new name right away.

Discussion: https://postgr.es/m/116024.1640111629@sss.pgh.pa.us
2021-12-27 14:39:08 -05:00
Amit Kapila 8e1fae1938 Move parallel vacuum code to vacuumparallel.c.
This commit moves parallel vacuum related code to a new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM callbacks (ambulkdelete
and amvacuumcleanup) with parallel workers.

Another reason for this refactoring is that the parallel vacuum isn't
specific to heap so it doesn't make sense to keep this code in
heap/vacuumlazy.c.

Author: Masahiko Sawada, based on suggestion from Andres Freund
Reviewed-by: Hou Zhijie, Amit Kapila, Haiying Tang
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
2021-12-23 11:42:52 +05:30
Michael Paquier fc95d35b94 Correct comment and some documentation about REPLICA_IDENTITY_INDEX
catalog/pg_class.h was stating that REPLICA_IDENTITY_INDEX with a
dropped index is equivalent to REPLICA_IDENTITY_DEFAULT.  The code tells
a different story, as it is equivalent to REPLICA_IDENTITY_NOTHING.

The behavior exists since the introduction of replica identities, and
fe7fd4e even added tests for this case but I somewhat forgot to fix this
comment.

While on it, this commit reorganizes the documentation about replica
identities on the ALTER TABLE page, and a note is added about the case
of dropped indexes with REPLICA_IDENTITY_INDEX.

Author: Michael Paquier, Wei Wang
Reviewed-by: Euler Taveira
Discussion: https://postgr.es/m/OS3PR01MB6275464AD0A681A0793F56879E759@OS3PR01MB6275.jpnprd01.prod.outlook.com
Backpatch-through: 10
2021-12-22 16:37:58 +09:00
Amit Kapila cc8b25712b Move index vacuum routines to vacuum.c.
An upcoming patch moves parallel vacuum code out of vacuumlazy.c. This
code restructuring will allow both lazy vacuum and parallel vacuum to use
index vacuum functions.

Author: Masahiko Sawada
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
2021-12-22 07:55:14 +05:30
John Naylor 911588a3f8 Add fast path for validating UTF-8 text
Our previous validator used a traditional algorithm that performed
comparison and branching one byte at a time. It's useful in that
we always know exactly how many bytes we have validated, but that
precision comes at a cost. Input validation can show up prominently
in profiles of COPY FROM, and future improvements to COPY FROM such
as parallelism or faster line parsing will put more pressure on input
validation. Hence, add fast paths for both ASCII and multibyte UTF-8:

Use bitwise operations to check 16 bytes at a time for ASCII. If
that fails, use a "shift-based" DFA on those bytes to handle the
general case, including multibyte. These paths are relatively free
of branches and thus robust against all kinds of byte patterns. With
these algorithms, UTF-8 validation is several times faster, depending
on platform and the input byte distribution.

The previous coding in pg_utf8_verifystr() is retained for short
strings and for when the fast path returns an error.

Review, performance testing, and additional hacking by: Heikki
Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and
Greg Stark

Discussion:
https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com
2021-12-20 10:07:29 -04:00
Peter Eisentraut 3c6f8c011f Simplify the general-purpose 64-bit integer parsing APIs
pg_strtouint64() is a wrapper around strtoull/strtoul/_strtoui64, but
it seems no longer necessary to have this indirection.
msvc/Solution.pm claims HAVE_STRTOULL, so the "MSVC only" part seems
unnecessary.  Also, we have code in c.h to substitute alternatives for
strtoull() if not found, and that would appear to cover all currently
supported platforms, so having a further fallback in pg_strtouint64()
seems unnecessary.

Therefore, we could remove pg_strtouint64(), and use strtoull()
directly in all call sites.  However, it seems useful to keep a
separate notation for parsing exactly 64-bit integers, matching the
type definition int64/uint64.  For that, add new macros strtoi64() and
strtou64() in c.h as thin wrappers around strtol()/strtoul() or
strtoll()/stroull().  This makes these functions available everywhere
instead of just in the server code, and it makes the function naming
notably different from the pg_strtointNN() functions in numutils.c,
which have a different API.

Discussion: https://www.postgresql.org/message-id/flat/a3df47c9-b1b4-29f2-7e91-427baf8b75a3%40enterprisedb.com
2021-12-17 06:32:07 +01:00
Thomas Munro a13db0e164 Change ProcSendSignal() to take pgprocno.
Instead of referring to target backends by pid, use pgprocno.  This
means that we don't have to scan the ProcArray and we can drop some
special case code for dealing with the startup process.

Discussion: https://postgr.es/m/CA%2BhUKGLYRyDaneEwz5Uya_OgFLMx5BgJfkQSD%3Dq9HmwsfRRb-w%40mail.gmail.com
Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Reviewed-by: Ashwin Agrawal <ashwinstar@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
2021-12-16 15:56:03 +13:00
Tom Lane 189699dd36 Remove unimplemented/undocumented geometric functions & operators.
Nobody has filled in these stubs for upwards of twenty years,
so it's time to drop the idea that they might get implemented
any day now.  The associated pg_operator and pg_proc entries
are just confusing wastes of space.

Per complaint from Anton Voloshin.

Discussion: https://postgr.es/m/3426566.1638832718@sss.pgh.pa.us
2021-12-13 18:08:28 -05:00
Robert Haas fa0e03c15a Remove InitXLOGAccess().
It's not great that RecoveryInProgress() calls InitXLOGAccess(),
because a status inquiry function typically shouldn't have the side
effect of performing initializations. We could fix that by calling
InitXLOGAccess() from some other place, but instead, let's remove it
altogether.

One thing InitXLogAccess() did is initialize wal_segment_size, but it
doesn't need to do that. In the postmaster, PostmasterMain() calls
LocalProcessControlFile(), and all child processes will inherit that
value -- except in EXEC_BACKEND bulds, but then each backend runs
SubPostmasterMain() which also calls LocalProcessControlFile().

The other thing InitXLOGAccess() did is update RedoRecPtr and
doPageWrites, but that's not critical, because all code that uses
them will just retry if it turns out that they've changed. The
only difference is that most code will now see an initial value that
is definitely invalid instead of one that might have just been way
out of date, but that will only happen once per backend lifetime,
so it shouldn't be a big deal.

Patch by me, reviewed by Nathan Bossart, Michael Paquier, Andres
Freund, Heikki Linnakangas, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com
2021-12-13 09:58:36 -05:00
Tom Lane 07eee5a0dc Create a new type category for "internal use" types.
Historically we've put type "char" into the S (String) typcategory,
although calling it a string is a stretch considering it can only
store one byte.  (In our actual usage, it's more like an enum.)
This choice now seems wrong in view of the special heuristics
that parse_func.c and parse_coerce.c have for TYPCATEGORY_STRING:
it's not a great idea for "char" to have those preferential casting
behaviors.

Worse than that, recent patches inventing special-purpose types
like pg_node_tree have assigned typcategory S to those types,
meaning they also get preferential casting treatment that's designed
on the assumption that they can hold arbitrary text.

To fix, invent a new category TYPCATEGORY_INTERNAL for internal-use
types, and assign that to all these types.  I used code 'Z' for
lack of a better idea ('I' was already taken).

This change breaks one query in psql/describe.c, which now needs to
explicitly cast a catalog "char" column to text before concatenating
it with an undecorated literal.  Also, a test case in contrib/citext
now needs an explicit cast to convert citext to "char".  Since the
point of this change is to not have "char" be a surprisingly-available
cast target, these breakages seem OK.

Per report from Ian Campbell.

Discussion: https://postgr.es/m/2216388.1638480141@sss.pgh.pa.us
2021-12-11 14:10:51 -05:00
Thomas Munro e2f0f8ed25 Check for STATUS_DELETE_PENDING on Windows.
1.  Update our open() wrapper to check for NT's STATUS_DELETE_PENDING
and translate it to Unix-like errors.  This is done with
RtlGetLastNtStatus(), which is dynamically loaded from ntdll.  A new
file win32ntdll.c centralizes lookup of NT functions, in case we decide
to add more in the future.

2.  Remove non-working code that was trying to do something similar for
stat(), and just reuse the open() wrapper code.  As a side effect,
stat() also gains resilience against "sharing violation" errors.

3.  Since stat() is used very early in process startup, remove the
requirement that the Win32 signal event has been created before
pgwin32_open_handle() is reached.  Instead, teach pg_usleep() to fall
back to a non-interruptible sleep if reached before the signal event is
available.

This could be back-patched, but for now it's in master only.  The
problem has apparently been with us for a long time and generated only a
few complaints.  Proposed patches trigger it more often, which led to
this investigation and fix.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJz_pZTF9mckn6XgSv69%2BjGwdgLkxZ6b3NWGLBCVjqUZA%40mail.gmail.com
2021-12-10 16:19:43 +13:00
Peter Geoghegan bcf60585e6 Standardize cleanup lock terminology.
The term "super-exclusive lock" is a synonym for "buffer cleanup lock"
that first appeared in nbtree many years ago.  Standardize things by
consistently using the term cleanup lock.  This finishes work started by
commit 276db875.

There is no good reason to have two terms.  But there is a good reason
to only have one: to avoid confusion around why VACUUM acquires a full
cleanup lock (not just an ordinary exclusive lock) in index AMs, during
ambulkdelete calls.  This has nothing to do with protecting the physical
index data structure itself.  It is needed to implement a locking
protocol that ensures that TIDs pointing to the heap/table structure
cannot get marked for recycling by VACUUM before it is safe (which is
somewhat similar to how VACUUM uses cleanup locks during its first heap
pass).  Note that it isn't strictly necessary for index AMs to implement
this locking protocol -- several index AMs use an MVCC snapshot as their
sole interlock to prevent unsafe TID recycling.

In passing, update the nbtree README.  Cleanly separate discussion of
the aforementioned index vacuuming locking protocol from discussion of
the "drop leaf page pin" optimization added by commit 2ed5b87f.  We now
structure discussion of the latter by describing how individual index
scans may safely opt out of applying the standard locking protocol (and
so can avoid blocking progress by VACUUM).  Also document why the
optimization is not safe to apply during nbtree index-only scans.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzngHgQa92tz6NQihf4nxJwRzCV36yMJO_i8dS+2mgEVKw@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkHPgsBBvGWjz=8PjNhDefy7XRkDKiT5NxMs-n5ZCf2dA@mail.gmail.com
2021-12-08 17:24:45 -08:00
Peter Eisentraut d6f96ed94e Allow specifying column list for foreign key ON DELETE SET actions
Extend the foreign key ON DELETE actions SET NULL and SET DEFAULT by
allowing the specification of a column list, like

    CREATE TABLE posts (
        ...
        FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL (author_id)
    );

If a column list is specified, only those columns are set to
null/default, instead of all the columns in the foreign-key
constraint.

This is useful for multitenant or sharded schemas, where the tenant or
shard ID is included in the primary key of all tables but shouldn't be
set to null.

Author: Paul Martinez <paulmtz@google.com>
Discussion: https://www.postgresql.org/message-id/flat/CACqFVBZQyMYJV=njbSMxf+rbDHpx=W=B7AEaMKn8dWn9OZJY7w@mail.gmail.com
2021-12-08 11:13:57 +01:00
Amit Kapila 1a2aaeb0db Fix changing the ownership of ALL TABLES IN SCHEMA publication.
Ensure that the new owner of ALL TABLES IN SCHEMA publication must be a
superuser. The same is already ensured during CREATE PUBLICATION.

Author: Vignesh C
Reviewed-by: Nathan Bossart, Greg Nancarrow, Michael Paquier, Haiying Tang
Discussion: https://postgr.es/m/CALDaNm0E5U-RqxFuFrkZrQeG7ae5trGa=xs=iRtPPHULtT4zOw@mail.gmail.com
2021-12-08 11:31:16 +05:30
Peter Eisentraut bba962f0c0 Update snowball
Update to snowball tag v2.2.0.  Minor changes only.
2021-12-07 07:04:05 +01:00
Tom Lane 0c9d84427f Rethink pg_dump's handling of object ACLs.
Throw away most of the existing logic for this, as it was very
inefficient thanks to expensive sub-selects executed to collect
ACL data that we very possibly would have no interest in dumping.
Reduce the ACL handling in the initial per-object-type queries
to be just collection of the catalog ACL fields, as it was
originally.  Fetch pg_init_privs data separately in a single
scan of that catalog, and do the merging calculations on the
client side.  Remove the separate code path used for pre-9.6
source servers; there is no good reason to treat them differently
from newer servers that happen to have empty pg_init_privs.

Discussion: https://postgr.es/m/2273648.1634764485@sss.pgh.pa.us
Discussion: https://postgr.es/m/7d7eb6128f40401d81b3b7a898b6b4de@W2012-02.nidsa.loc
2021-12-06 12:39:45 -05:00
Peter Eisentraut 37b2764593 Some RELKIND macro refactoring
Add more macros to group some RELKIND_* macros:

- RELKIND_HAS_PARTITIONS()
- RELKIND_HAS_TABLESPACE()
- RELKIND_HAS_TABLE_AM()

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/a574c8f1-9c84-93ad-a9e5-65233d6fc00f%40enterprisedb.com
2021-12-03 14:08:19 +01:00
Tom Lane a7da419810 Add configure probe for rl_variable_bind().
Some exceedingly ancient readline libraries lack this function, causing
commit 3d858af07 to fail.  Per buildfarm (via Michael Paquier).

Discussion: https://postgr.es/m/E1msTLm-0007Cm-Ri@gemulon.postgresql.org
2021-12-02 13:06:27 -05:00
Tomas Vondra 5753d4ee32 Ignore BRIN indexes when checking for HOT udpates
When determining whether an index update may be skipped by using HOT, we
can ignore attributes indexed only by BRIN indexes. There are no index
pointers to individual tuples in BRIN, and the page range summary will
be updated anyway as it relies on visibility info.

This also removes rd_indexattr list, and replaces it with rd_attrsvalid
flag. The list was not used anywhere, and a simple flag is sufficient.

Patch by Josef Simanek, various fixes and improvements by me.

Author: Josef Simanek
Reviewed-by: Tomas Vondra, Alvaro Herrera
Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com
2021-11-30 20:04:38 +01:00
Amit Kapila 8d74fc96db Add a view to show the stats of subscription workers.
This commit adds a new system view pg_stat_subscription_workers, that
shows information about any errors which occur during the application of
logical replication changes as well as during performing initial table
synchronization. The subscription statistics entries are removed when the
corresponding subscription is removed.

It also adds an SQL function pg_stat_reset_subscription_worker() to reset
single subscription errors.

The contents of this view can be used by an upcoming patch that skips the
particular transaction that conflicts with the existing data on the
subscriber.

This view can be extended in the future to track other xact related
statistics like the number of xacts committed/aborted for subscription
workers.

Author: Masahiko Sawada
Reviewed-by: Greg Nancarrow, Hou Zhijie, Tang Haiying, Vignesh C, Dilip Kumar, Takamichi Osumi, Amit Kapila
Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2021-11-30 08:54:30 +05:30
Michael Paquier 98105e53e0 Fix typos
Author: Lingjie Qiang
Discussion: https://postgr.es/m/OSAPR01MB71654E773F62AC88DC1FC8CC80669@OSAPR01MB7165.jpnprd01.prod.outlook.com
2021-11-30 11:05:15 +09:00
Tom Lane e04a8059a7 Simplify declaring variables exported from libpgcommon and libpgport.
This reverts commits c2d1eea9e and 11b500072, as well as similar hacks
elsewhere, in favor of setting up the PGDLLIMPORT macro so that it can
just be used unconditionally.  That can work because in frontend code,
we need no marking in either the defining or consuming files for a
variable exported from these libraries; and frontend code has no need
to access variables exported from the core backend, either.

While at it, write some actual documentation about the PGDLLIMPORT
and PGDLLEXPORT macros.

Patch by me, based on a suggestion from Robert Haas.

Discussion: https://postgr.es/m/1160385.1638165449@sss.pgh.pa.us
2021-11-29 11:00:00 -05:00
Tom Lane 11b500072e Portability hack for pg_global_prng_state.
PGDLLIMPORT is only appropriate for variables declared in the backend,
not when the variable is coming from a library included in frontend code.
(This isn't a particularly nice fix, but for now, use the same method
employed elsewhere.)

Discussion: https://postgr.es/m/E1mrWUD-000235-Hq@gemulon.postgresql.org
2021-11-29 00:04:45 -05:00
Tom Lane 3804539e48 Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code.  In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.

xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy.  It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random().  It is not cryptographically strong, but neither
are the functions it replaces.

Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself

Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
2021-11-28 21:33:07 -05:00
Alvaro Herrera f744519326
Harden be-gssapi-common.h for headerscheck
Surround the contents with a test that the feature is enabled by
configure, to silence header checking tools on systems without GSSAPI
installed.

Backpatch to 12, where the file appeared.

Discussion: https://postgr.es/m/202111161709.u3pbx5lxdimt@alvherre.pgsql
2021-11-26 17:00:29 -03:00
David Rowley 411137a429 Flush Memoize cache when non-key parameters change, take 2
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node.  If this parameter changes then cache entries
may become out-dated due to the new parameter value.

Previously Memoize was mistakenly not aware of this.  We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.

Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
2021-11-24 23:29:14 +13:00
David Rowley dad20ad470 Revert "Flush Memoize cache when non-key parameters change"
This reverts commit 1050048a31.
2021-11-24 15:27:43 +13:00
David Rowley 1050048a31 Flush Memoize cache when non-key parameters change
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node.  If this parameter changes then cache entries
may become out-dated due to the new parameter value.

Previously Memoize was mistakenly not aware of this.  We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.

Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
2021-11-24 14:56:18 +13:00
David Rowley e502150f7d Allow Memoize to operate in binary comparison mode
Memoize would always use the hash equality operator for the cache key
types to determine if the current set of parameters were the same as some
previously cached set.  Certain types such as floating points where -0.0
and +0.0 differ in their binary representation but are classed as equal by
the hash equality operator may cause problems as unless the join uses the
same operator it's possible that whichever join operator is being used
would be able to distinguish the two values.  In which case we may
accidentally return in the incorrect rows out of the cache.

To fix this here we add a binary mode to Memoize to allow it to the
current set of parameters to previously cached values by comparing
bit-by-bit rather than logically using the hash equality operator.  This
binary mode is always used for LATERAL joins and it's used for normal
joins when any of the join operators are not hashable.

Reported-by: Tom Lane
Author: David Rowley
Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us
Backpatch-through: 14, where Memoize was added
2021-11-24 10:06:59 +13:00
Michael Paquier 1922d7c6e1 Add SQL functions to monitor the directory contents of replication slots
This commit adds a set of functions able to look at the contents of
various paths related to replication slots:
- pg_ls_logicalsnapdir, for pg_logical/snapshots/
- pg_ls_logicalmapdir, for pg_logical/mappings/
- pg_ls_replslotdir, for pg_replslot/<slot_name>/

These are intended to be used by monitoring tools.  Unlike pg_ls_dir(),
execution permission can be granted to non-superusers.  Roles members of
pg_monitor gain have access to those functions.

Bump catalog version.

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Justin Pryzby
Discussion: https://postgr.es/m/CALj2ACWsfizZjMN6bzzdxOk1ADQQeSw8HhEjhmVXn_Pu+7VzLw@mail.gmail.com
2021-11-23 19:29:42 +09:00
Peter Eisentraut d6d1dfcc99 Add ABI extra field to fmgr magic block
This allows derived products to intentionally make their fmgr ABI
incompatible, with a clean error message.

Discussion: https://www.postgresql.org/message-id/flat/55215fda-db31-a045-d6b7-d6f2d2dc9920%40enterprisedb.com
2021-11-22 08:00:14 +01:00
Fujii Masao 1b06d7bac9 Report wait events for local shell commands like archive_command.
This commit introduces new wait events for archive_command,
archive_cleanup_command, restore_command and recovery_end_command.

Author: Fujii Masao
Reviewed-by: Bharath Rupireddy, Michael Paquier
Discussion: https://postgr.es/m/4ca4f920-6b48-638d-08b2-93598356f5d3@oss.nttdata.com
2021-11-22 10:28:21 +09:00
Amit Kapila 0f0cfb4940 Fix parallel operations that prevent oldest xmin from advancing.
While determining xid horizons, we skip over backends that are running
Vacuum. We also ignore Create Index Concurrently, or Reindex Concurrently
for the purposes of computing Xmin for Vacuum. But we were not setting the
flags corresponding to these operations when they are performed in
parallel which was preventing Xid horizon from advancing.

The optimization related to skipping Create Index Concurrently, or Reindex
Concurrently operations was implemented in PG-14 but the fix is the same
for the Parallel Vacuum as well so back-patched till PG-13.

Author: Masahiko Sawada
Reviewed-by: Amit Kapila
Backpatch-through: 13
Discussion: https://postgr.es/m/CAD21AoCLQqgM1sXh9BrDFq0uzd3RBFKi=Vfo6cjjKODm0Onr5w@mail.gmail.com
2021-11-19 09:04:40 +05:30
Tom Lane 5f1148224b Provide a variant of simple_prompt() that can be interrupted by ^C.
Up to now, you couldn't escape out of psql's \password command
by typing control-C (or other local spelling of SIGINT).  This
is pretty user-unfriendly, so improve it.  To do so, we have to
modify the functions provided by pg_get_line.c; but we don't
want to mess with psql's SIGINT handler setup, so provide an
API that lets that handler cause the cancel to occur.

This relies on the assumption that we won't do any major harm by
longjmp'ing out of fgets().  While that's obviously a little shaky,
we've long had the same assumption in the main input loop, and few
issues have been reported.

psql has some other simple_prompt() calls that could usefully
be improved the same way; for now, just deal with \password.

Nathan Bossart, minor tweaks by me

Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us
2021-11-17 19:09:54 -05:00
Tom Lane a148f8bc04 Add a planner support function for starts_with().
This fills in some gaps in planner support for starts_with() and
the equivalent ^@ operator:

* A condition such as "textcol ^@ constant" can now use a regular
btree index, not only an SP-GiST index, so long as the index's
collation is C.  (This works just like "textcol LIKE 'foo%'".)

* "starts_with(textcol, constant)" can be optimized the same as
"textcol ^@ constant".

* Fixed-prefix LIKE and regex patterns are now more like starts_with()
in another way: if you apply one to an SPGiST-indexed column, you'll
get an index condition using ^@ rather than two index conditions with
>= and <.

Per a complaint from Shay Rojansky.  Patch by me; thanks to
Nathan Bossart for review.

Discussion: https://postgr.es/m/232599.1633800229@sss.pgh.pa.us
2021-11-17 16:54:12 -05:00
Alvaro Herrera ad26ee2825
Fix headerscheck failure in replication/worker_internal.h
Broken by 31c389d8de
2021-11-16 13:30:37 -03:00
Peter Geoghegan b0f7425ec2 Explain pruning pgstats accounting subtleties.
Add a comment explaining why the pgstats accounting used during
opportunistic heap pruning operations (to maintain the current number of
dead tuples in the relation) needs to compensate by subtracting away the
number of new LP_DEAD items.  This is needed so it can avoid completely
forgetting about tuples that become LP_DEAD items during pruning -- they
should still count.

It seems more natural to discuss this issue at the only relevant call
site (opportunistic pruning), since the same issue does not apply to the
only other caller (the VACUUM call site).  Move everything there too.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzm7f+A6ej650gi_ifTgbhsadVW5cujAL3punpupHff5Yg@mail.gmail.com
2021-11-12 19:45:58 -08:00
Robert Haas beb4e9ba16 Improve performance of pgarch_readyXlog() with many status files.
Presently, the archive_status directory was scanned for each file to
archive.  When there are many status files, say because archive_command
has been failing for a long time, these directory scans can get very
slow.  With this change, the archiver remembers several files to archive
during each directory scan, speeding things up.

To ensure timeline history files are archived as quickly as possible,
XLogArchiveNotify() forces the archiver to do a new directory scan as
soon as the .ready file for one is created.

Nathan Bossart, per a long discussion involving many people. It is
not clear to me exactly who out of all those people reviewed this
particular patch.

Discussion: http://postgr.es/m/CA+TgmobhAbs2yabTuTRkJTq_kkC80-+jw=pfpypdOJ7+gAbQbw@mail.gmail.com
Discussion: http://postgr.es/m/620F3CE1-0255-4D66-9D87-0EADE866985A@amazon.com
2021-11-11 15:20:26 -05:00
Tom Lane 01ec41a5fe Fall back to unsigned int, not int, for socklen_t.
It's a coin toss which of these is a better default assumption.
However, of the machines we have in the buildfarm, the only ones
relying on the fallback socklen_t definition are ancient HPUX,
and on that platform unsigned int is the right choice.  Minor
tweak to ee3a1a5b6.

Discussion: https://postgr.es/m/1440792.1636558888@sss.pgh.pa.us
2021-11-11 10:36:39 -05:00
Jeff Davis 4168a47454 Add pg_checkpointer predefined role for CHECKPOINT command.
Any user with the privileges of pg_checkpointer can issue a CHECKPOINT
command.

Reviewed-by: Stephen Frost
Discussion: https://postgr.es/m/67a1d667e8ec228b5e07f232184c80348c5d93f4.camel%40j-davis.com
2021-11-09 16:59:14 -08:00
Peter Eisentraut ee3a1a5b63 Remove check for accept() argument types
This check was used to accommodate a staggering variety in particular
in the type of the third argument of accept().  This is no longer of
concern on currently supported systems.  We can just use socklen_t in
the code and put in a simple check that substitutes int for socklen_t
if it's missing, to cover the few stragglers.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/3538f4c4-1886-64f2-dcff-aaad8267fb82@enterprisedb.com
2021-11-09 15:35:26 +01:00
Amit Kapila b3812d0b9b Rename some enums to use TABLE instead of REL.
Commit 5a2832465f introduced some enums to represent all tables in schema
publications and used REL in their names. Use TABLE instead of REL in
those enums to avoid confusion with other objects like SEQUENCES that can
be part of a publication in the future.

In the passing, (a) Change one of the newly introduced error messages to
make it consistent for Create and Alter commands, (b) add missing alias in
one of the SQL Statements that is used to print publications associated
with the table.

Reported-by: Tomas Vondra, Peter Smith
Author: Vignesh C
Reviewed-by: Hou Zhijie, Peter Smith
Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ%40mail.gmail.com
2021-11-09 08:39:33 +05:30
Tom Lane 28e2412554 Reject extraneous data after SSL or GSS encryption handshake.
The server collects up to a bufferload of data whenever it reads data
from the client socket.  When SSL or GSS encryption is requested
during startup, any additional data received with the initial
request message remained in the buffer, and would be treated as
already-decrypted data once the encryption handshake completed.
Thus, a man-in-the-middle with the ability to inject data into the
TCP connection could stuff some cleartext data into the start of
a supposedly encryption-protected database session.

This could be abused to send faked SQL commands to the server,
although that would only work if the server did not demand any
authentication data.  (However, a server relying on SSL certificate
authentication might well not do so.)

To fix, throw a protocol-violation error if the internal buffer
is not empty after the encryption handshake.

Our thanks to Jacob Champion for reporting this problem.

Security: CVE-2021-23214
2021-11-08 11:01:43 -05:00
David Rowley 39a3105678 Fix incorrect hash equality operator bug in Memoize
In v14, because we don't have a field in RestrictInfo to cache both the
left and right type's hash equality operator, we just restrict the scope
of Memoize to only when the left and right types of a RestrictInfo are the
same.

In master we add another field to RestrictInfo and cache both hash
equality operators.

Reported-by: Jaime Casanova
Author: David Rowley
Discussion: https://postgr.es/m/20210929185544.GB24346%40ahch-to
Backpatch-through: 14
2021-11-08 14:40:33 +13:00
Robert Haas 4a92a1c3d1 Change ThisTimeLineID from a global variable to a local variable.
StartupXLOG() still has ThisTimeLineID as a local variable, but the
remaining code in xlog.c now needs to the relevant TimeLineID by some
other means. Mostly, this means that we now pass it as a function
parameter to a bunch of functions where we didn't previously.
However, a few cases require special handling:

- In functions that might be called by outside callers who
  wouldn't necessarily know what timeline to specify, we get
  the timeline ID from shared memory. XLogCtl->ThisTimeLineID
  can be used in most cases since recovery is known to have
  completed by the time those functions are called.  In
  xlog_redo(), we can use XLogCtl->replayEndTLI.

- XLogFileClose() needs to know the TLI of the open logfile.
  Do that with a new global variable openLogTLI. While
  someone could argue that this is just trading one global
  variable for another, the new one has a far more narrow
  purposes and is referenced in just a few places.

- read_backup_label() now returns the TLI that it obtains
  by parsing the backup_label file. Previously, ReadRecord()
  could be called to parse the checkpoint record without
  ThisTimeLineID having been initialized. Now, the timeline
  is passed down, and I didn't want to pass an uninitialized
  variable; this change lets us avoid that. The old coding
  didn't seem to have any practical consequences that we need
  to worry about, but this is cleaner.

- In BootstrapXLOG(), it's just a constant.

Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and
Álvaro Herrera.

Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
2021-11-05 12:53:15 -04:00
Robert Haas e997a0c642 Remove all use of ThisTimeLineID global variable outside of xlog.c
All such code deals with this global variable in one of three ways.
Sometimes the same functions use it in more than one of these ways
at the same time.

First, sometimes it's an implicit argument to one or more functions
being called in xlog.c or elsewhere, and must be set to the
appropriate value before calling those functions lest they
misbehave. In those cases, it is now passed as an explicit argument
instead.

Second, sometimes it's used to obtain the current timeline after
the end of recovery, i.e. the timeline to which WAL is being
written and flushed. Such code now calls GetWALInsertionTimeLine()
or relies on the new out parameter added to GetFlushRecPtr().

Third, sometimes it's used during recovery to store the current
replay timeline. That can change, so such code must generally
update the value before each use. It can still do that, but must
now use a local variable instead.

The net effect of these changes is to reduce by a fair amount the
amount of code that is directly accessing this global variable.
That's good, because history has shown that we don't always think
clearly about which timeline ID it's supposed to contain at any
given point in time, or indeed, whether it has been or needs to
be initialized at any given point in the code.

Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and
Álvaro Herrera.

Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
2021-11-05 12:50:01 -04:00
Robert Haas bef47ff85d Introduce 'bbsink' abstraction to modularize base backup code.
The base backup code has accumulated a healthy number of new
features over the years, but it's becoming increasingly difficult
to maintain and further enhance that code because there's no
real separation of concerns. For example, the code that
understands knows the details of how we send data to the client
using the libpq protocol is scattered throughout basebackup.c,
rather than being centralized in one place.

To try to improve this situation, introduce a new 'bbsink' object
which acts as a recipient for archives generated during the base
backup progress and also for the backup manifest. This commit
introduces three types of bbsink: a 'copytblspc' bbsink forwards the
backup to the client using one COPY OUT operation per tablespace and
another for the manifest, a 'progress' bbsink performs command
progress reporting, and a 'throttle' bbsink performs rate-limiting.
The 'progress' and 'throttle' bbsink types also forward the data to a
successor bbsink; at present, the last bbsink in the chain will
always be of type 'copytblspc'. There are plans to add more types
of 'bbsink' in future commits.

This abstraction is a bit leaky in the case of progress reporting,
but this still seems cleaner than what we had before.

Patch by me, reviewed and tested by Andres Freund, Sumanta Mukherjee,
Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger,
and Jeevan Ladhe.

Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-11-05 10:08:30 -04:00
Peter Geoghegan e7428a99a1 Add hardening to catch invalid TIDs in indexes.
Add hardening to the heapam index tuple deletion path to catch TIDs in
index pages that point to a heap item that index tuples should never
point to.  The corruption we're trying to catch here is particularly
tricky to detect, since it typically involves "extra" (corrupt) index
tuples, as opposed to the absence of required index tuples in the index.

For example, a heap TID from an index page that turns out to point to an
LP_UNUSED item in the heap page has a good chance of being caught by one
of the new checks.  There is a decent chance that the recently fixed
parallel VACUUM bug (see commit 9bacec15) would have been caught had
that particular check been in place for Postgres 14.  No backpatch of
this extra hardening for now, though.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wzk-4_raTzawWGaiqNvkpwDXxv3y1AQhQyUeHfkU=tFCeA@mail.gmail.com
2021-11-04 19:54:05 -07:00
Peter Geoghegan 9bacec15b6 Don't overlook indexes during parallel VACUUM.
Commit b4af70cb, which simplified state managed by VACUUM, performed
refactoring of parallel VACUUM in passing.  Confusion about the exact
details of the tasks that the leader process is responsible for led to
code that made it possible for parallel VACUUM to miss a subset of the
table's indexes entirely.  Specifically, indexes that fell under the
min_parallel_index_scan_size size cutoff were missed.  These indexes are
supposed to be vacuumed by the leader (alongside any parallel unsafe
indexes), but weren't vacuumed at all.  Affected indexes could easily
end up with duplicate heap TIDs, once heap TIDs were recycled for new
heap tuples.  This had generic symptoms that might be seen with almost
any index corruption involving structural inconsistencies between an
index and its table.

To fix, make sure that the parallel VACUUM leader process performs any
required index vacuuming for indexes that happen to be below the size
cutoff.  Also document the design of parallel VACUUM with these
below-size-cutoff indexes.

It's unclear how many users might be affected by this bug.  There had to
be at least three indexes on the table to hit the bug: a smaller index,
plus at least two additional indexes that themselves exceed the size
cutoff.  Cases with just one additional index would not run into
trouble, since the parallel VACUUM cost model requires two
larger-than-cutoff indexes on the table to apply any parallel
processing.  Note also that autovacuum was not affected, since it never
uses parallel processing.

Test case based on tests from a larger patch to test parallel VACUUM by
Masahiko Sawada.

Many thanks to Kamigishi Rei for her invaluable help with tracking this
problem down.

Author: Peter Geoghegan <pg@bowt.ie>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-By: Kamigishi Rei <iijima.yun@koumakan.jp>
Reported-By: Andrew Gierth <andrew@tao11.riddles.org.uk>
Diagnosed-By: Andres Freund <andres@anarazel.de>
Bug: #17245
Discussion: https://postgr.es/m/17245-ddf06aaf85735f36@postgresql.org
Discussion: https://postgr.es/m/20211030023740.qbnsl2xaoh2grq3d@alap3.anarazel.de
Backpatch: 14-, where the refactoring commit appears.
2021-11-02 12:06:17 -07:00
Tom Lane 65c6cab136 Avoid O(N^2) behavior in SyncPostCheckpoint().
As in commits 6301c3ada and e9d9ba2a4, avoid doing repetitive
list_delete_first() operations, since that would be expensive when
there are many files waiting to be unlinked.  This is a slightly
larger change than in those cases.  We have to keep the list state
valid for calls to AbsorbSyncRequests(), so it's necessary to invent a
"canceled" field instead of immediately deleting PendingUnlinkEntry
entries.  Also, because we might not be able to process all the
entries, we need a new list primitive list_delete_first_n().

list_delete_first_n() is almost list_copy_tail(), but it modifies the
input List instead of making a new copy.  I found a couple of existing
uses of the latter that could profitably use the new function.  (There
might be more, but the other callers look like they probably shouldn't
overwrite the input List.)

As before, back-patch to v13.

Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
2021-11-02 11:31:54 -04:00
Amit Kapila 71db6459e6 Replace XLOG_INCLUDE_XID flag with a more localized flag.
Commit 0bead9af48 introduced XLOG_INCLUDE_XID flag to indicate that the
WAL record contains subXID-to-topXID association. It uses that flag later
to mark in CurrentTransactionState that top-xid is logged so that we
should not try to log it again with the next WAL record in the current
subtransaction. However, we can use a localized variable to pass that
information.

In passing, change the related function and variable names to make them
consistent with what the code is actually doing.

Author: Dilip Kumar
Reviewed-by: Alvaro Herrera, Amit Kapila
Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
2021-11-02 08:35:29 +05:30
Jeff Davis 77ea4f9439 Grant memory views to pg_read_all_stats.
Grant privileges on views pg_backend_memory_contexts and
pg_shmem_allocations to the role pg_read_all_stats. Also grant on the
underlying functions that those views depend on.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/CALj2ACWAZo3Ar_EVsn2Zf9irG+hYK3cmh1KWhZS_Od45nd01RA@mail.gmail.com
2021-10-27 14:06:30 -07:00
Amit Kapila 5a2832465f Allow publishing the tables of schema.
A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows
one or more schemas to be specified, whose tables are selected by the
publisher for sending the data to the subscriber.

The new syntax allows specifying both the tables and schemas. For example:
CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;

A new system table "pg_publication_namespace" has been added, to maintain
the schemas that the user wants to publish through the publication.
Modified the output plugin (pgoutput) to publish the changes if the
relation is part of schema publication.

Updates pg_dump to identify and dump schema publications. Updates the \d
family of commands to display schema publications and \dRp+ variant will
now display associated schemas if any.

Author: Vignesh C, Hou Zhijie, Amit Kapila
Syntax-Suggested-by: Tom Lane, Alvaro Herrera
Reviewed-by: Greg Nancarrow, Masahiko Sawada, Hou Zhijie, Amit Kapila, Haiying Tang, Ajin Cherian, Rahila Syed, Bharath Rupireddy, Mark Dilger
Tested-by: Haiying Tang
Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ@mail.gmail.com
2021-10-27 07:44:52 +05:30
Jeff Davis f0b051e322 Allow GRANT on pg_log_backend_memory_contexts().
Remove superuser check, allowing any user granted permissions on
pg_log_backend_memory_contexts() to log the memory contexts of any
backend.

Note that this could allow a privileged non-superuser to log the
memory contexts of a superuser backend, but as discussed, that does
not seem to be a problem.

Reviewed-by: Nathan Bossart, Bharath Rupireddy, Michael Paquier, Kyotaro Horiguchi, Andres Freund
Discussion: https://postgr.es/m/e5cf6684d17c8d1ef4904ae248605ccd6da03e72.camel@j-davis.com
2021-10-26 13:31:38 -07:00
Robert Haas 9ce346eabf Report progress of startup operations that take a long time.
Users sometimes get concerned whe they start the server and it
emits a few messages and then doesn't emit any more messages for
a long time. Generally, what's happening is either that the
system is taking a long time to apply WAL, or it's taking a
long time to reset unlogged relations, or it's taking a long
time to fsync the data directory, but it's not easy to tell
which is the case.

To fix that, add a new 'log_startup_progress_interval' setting,
by default 10s. When an operation that is known to be potentially
long-running takes more than this amount of time, we'll log a
status update each time this interval elapses.

To avoid undesirable log chatter, don't log anything about WAL
replay when in standby mode.

Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath
Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera.

Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com
Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com
2021-10-25 11:51:57 -04:00
Robert Haas 732e6677a6 Add enable_timeout_every() to fire the same timeout repeatedly.
enable_timeout_at() and enable_timeout_after() can still be used
when you want to fire a timeout just once.

Patch by me, per a suggestion from Tom Lane.

Discussion: http://postgr.es/m/2992585.1632938816@sss.pgh.pa.us
Discussion: http://postgr.es/m/CA+TgmoYqSF5sCNrgTom9r3Nh=at4WmYFD=gsV-omStZ60S0ZUQ@mail.gmail.com
2021-10-25 11:33:44 -04:00
Michael Paquier b4ada4e19f Add replication command READ_REPLICATION_SLOT
The command is supported for physical slots for now, and returns the
type of slot, its restart_lsn and its restart_tli.

This will be useful for an upcoming patch related to pg_receivewal, to
allow the tool to be able to stream from the position of a slot, rather
than the last WAL position flushed by the backend (as reported by
IDENTIFY_SYSTEM) if the archive directory is found as empty, which would
be an advantage in the case of switching to a different archive
locations with the same slot used to avoid holes in WAL segment
archives.

Author: Ronan Dunklau
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/18708360.4lzOvYHigE@aivenronan
2021-10-25 07:40:42 +09:00
Noah Misch 3cd9c3b921 Fix CREATE INDEX CONCURRENTLY for the newest prepared transactions.
The purpose of commit 8a54e12a38 was to
fix this, and it sufficed when the PREPARE TRANSACTION completed before
the CIC looked for lock conflicts.  Otherwise, things still broke.  As
before, in a cluster having used CIC while having enabled prepared
transactions, queries that use the resulting index can silently fail to
find rows.  It may be necessary to reindex to recover from past
occurrences; REINDEX CONCURRENTLY suffices.  Fix this for future index
builds by making CIC wait for arbitrarily-recent prepared transactions
and for ordinary transactions that may yet PREPARE TRANSACTION.  As part
of that, have PREPARE TRANSACTION transfer locks to its dummy PGPROC
before it calls ProcArrayClearTransaction().  Back-patch to 9.6 (all
supported versions).

Andrey Borodin, reviewed (in earlier versions) by Andres Freund.

Discussion: https://postgr.es/m/01824242-AA92-4FE9-9BA7-AEBAFFEA3D0C@yandex-team.ru
2021-10-23 18:36:38 -07:00
Noah Misch fdd965d074 Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start.  That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index.  Queries that use the
resulting index can silently fail to find rows.  Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation.  It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).

Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.

Discussion: https://postgr.es/m/20210730022548.GA1940096@gust.leadboat.com
2021-10-23 18:36:38 -07:00
Tom Lane 974aedcea4 Fix frontend version of sh_error() in simplehash.h.
The code does not expect sh_error() to return, but the patch
that made this header usable in frontend didn't get that memo.

While here, plaster unlikely() on the tests that decide whether
to invoke sh_error(), and add our standard copyright notice.

Noted by Andres Freund.  Back-patch to v13 where this frontend
support came in.

Discussion: https://postgr.es/m/0D54435C-1199-4361-9D74-2FBDCF8EA164@anarazel.de
2021-10-22 16:43:38 -04:00
Tom Lane b1ce6c2843 Doc: clarify a critical and undocumented aspect of simplehash.h.
I just got burnt by trying to use pg_malloc instead of pg_malloc0
with this.  Save the next hacker some time by not leaving this
API detail undocumented.
2021-10-21 17:08:53 -04:00
Amit Kapila 1607cd0b6c Remove unused wait events.
Commit 464824323e introduced the wait events which were neither used by
that commit nor by follow-up commits for that work.

Author: Masahiro Ikeda
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/ff077840-3ab2-04dd-bbe4-4f5dfd2ad481@oss.nttdata.com
2021-10-21 08:01:25 +05:30
Heikki Linnakangas c4649cce39 Refactor LogicalTapeSet/LogicalTape interface.
All the tape functions, like LogicalTapeRead and LogicalTapeWrite, now
take a LogicalTape as argument, instead of LogicalTapeSet+tape number.
You can create any number of LogicalTapes in a single LogicalTapeSet, and
you don't need to decide the number upfront, when you create the tape set.

This makes the tape management in hash agg spilling in nodeAgg.c simpler.

Discussion: https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83%40iki.fi
Reviewed-by: Peter Geoghegan, Zhihong Yu, John Naylor
2021-10-18 14:46:01 +03:00
Michael Paquier 409f9ca447 Reset properly snapshot export state during transaction abort
During a replication slot creation, an ERROR generated in the same
transaction as the one creating a to-be-exported snapshot would have
left the backend in an inconsistent state, as the associated static
export snapshot state was not being reset on transaction abort, but only
on the follow-up command received by the WAL sender that created this
snapshot on replication slot creation.  This would trigger inconsistency
failures if this session tried to export again a snapshot, like during
the creation of a replication slot.

Note that a snapshot export cannot happen in a transaction block, so
there is no need to worry resetting this state for subtransaction
aborts.  Also, this inconsistent state would very unlikely show up to
users.  For example, one case where this could happen is an
out-of-memory error when building the initial snapshot to-be-exported.
Dilip found this problem while poking at a different patch, that caused
an error in this code path for reasons unrelated to HEAD.

Author: Dilip Kumar
Reviewed-by: Michael Paquier, Zhihong Yu
Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com
Backpatch-through: 9.6
2021-10-18 11:55:42 +09:00
Robert Haas 46846433a0 shm_mq: Update mq_bytes_written less often.
Do not update shm_mq's mq_bytes_written until we have written
an amount of data greater than 1/4th of the ring size, unless
the caller of shm_mq_send(v) requests a flush at the end of
the message. This reduces the number of calls to SetLatch(),
and also the number of CPU cache misses, considerably, and thus
makes shm_mq significantly faster.

Dilip Kumar, reviewed by Zhihong Yu and Tomas Vondra. Some
minor cosmetic changes by me.

Discussion: http://postgr.es/m/CAFiTN-tVXqn_OG7tHNeSkBbN+iiCZTiQ83uakax43y1sQb2OBA@mail.gmail.com
2021-10-14 16:13:36 -04:00
Andres Freund 8162464a25 windows: Define WIN32_LEAN_AND_MEAN to make compilation faster.
windows.h includes a lot of other headers, slowing down compilation
significantly. WIN32_LEAN_AND_MEAN reduces that a bit. It'd be better to
remove the include of windows.h (as well as indirect inclusions of it) from such
a central place, but until then...

Discussion: https://postgr.es/m/20210921193035.pqzay43vpyv7in43@alap3.anarazel.de
2021-10-04 13:16:12 -07:00
Daniel Gustafsson 7111e332c5 Fix duplicate words in comments
Remove accidentally duplicated words in code comments.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/87bl45t0co.fsf@wibble.ilmari.org
2021-10-04 15:12:57 +02:00
Tom Lane a0558cfa39 Fix checking of query type in plpgsql's RETURN QUERY command.
Prior to v14, we insisted that the query in RETURN QUERY be of a type
that returns tuples.  (For instance, INSERT RETURNING was allowed,
but not plain INSERT.)  That happened indirectly because we opened a
cursor for the query, so spi.c checked SPI_is_cursor_plan().  As a
consequence, the error message wasn't terribly on-point, but at least
it was there.

Commit 2f48ede08 lost this detail.  Instead, plain RETURN QUERY
insisted that the query be a SELECT (by checking for SPI_OK_SELECT)
while RETURN QUERY EXECUTE failed to check the query type at all.
Neither of these changes was intended.

The only convenient place to check this in the EXECUTE case is inside
_SPI_execute_plan, because we haven't done parse analysis until then.
So we need to pass down a flag saying whether to enforce that the
query returns tuples.  Fortunately, we can squeeze another boolean
into struct SPIExecuteOptions without an ABI break, since there's
padding space there.  (It's unlikely that any extensions would
already be using this new struct, but preserving ABI in v14 seems
like a smart idea anyway.)

Within spi.c, it seemed like _SPI_execute_plan's parameter list
was already ridiculously long, and I didn't want to make it longer.
So I thought of passing SPIExecuteOptions down as-is, allowing that
parameter list to become much shorter.  This makes the patch a bit
more invasive than it might otherwise be, but it's all internal to
spi.c, so that seems fine.

Per report from Marc Bachmann.  Back-patch to v14 where the
faulty code came in.

Discussion: https://postgr.es/m/1F2F75F0-27DF-406F-848D-8B50C7EEF06A@gmail.com
2021-10-03 13:21:20 -04:00
Tom Lane 7b5d4c29ed Fix Portal snapshot tracking to handle subtransactions properly.
Commit 84f5c2908 forgot to consider the possibility that
EnsurePortalSnapshotExists could run inside a subtransaction with
lifespan shorter than the Portal's.  In that case, the new active
snapshot would be popped at the end of the subtransaction, leaving
a dangling pointer in the Portal, with mayhem ensuing.

To fix, make sure the ActiveSnapshot stack entry is marked with
the same subtransaction nesting level as the associated Portal.
It's certainly safe to do so since we won't be here at all unless
the stack is empty; hence we can't create an out-of-order stack.

Let's also apply this logic in the case where PortalRunUtility
sets portalSnapshot, just to be sure that path can't cause similar
problems.  It's slightly less clear that that path can't create
an out-of-order stack, so add an assertion guarding it.

Report and patch by Bertrand Drouvot (with kibitzing by me).
Back-patch to v11, like the previous commit.

Discussion: https://postgr.es/m/ff82b8c5-77f4-3fe7-6028-fcf3303e82dd@amazon.com
2021-10-01 11:10:12 -04:00
David Rowley 16239c5fdf Ensure interleaved_parts field is always initialized
This field was recently added in db632fbca, however that commit missed one
place where it should have initialized the new field to NULL.  The missed
location is where the PartitionBoundInfo is created for partition-wise
join relations.  Technically there could be interleaved partitions in a
partition-wise join relation, but currently the only optimization we use
this field for only does so for base rels and other member rels.  So just
document that we don't populate this field for join rels.

Reported-by: Amit Langote
Author: Amit Langote, David Rowley
Reviewed-by: Amit Langote, David Rowley
Discussion: https://postgr.es/m/CA+HiwqE76Rps24kwHsd2Cr82Ua07tJC9t9reG0c7ScX9n_xrEA@mail.gmail.com
2021-10-01 15:09:49 +13:00
Tom Lane b484ddf4d2 Treat ETIMEDOUT as indicating a non-recoverable connection failure.
Add ETIMEDOUT to ALL_CONNECTION_FAILURE_ERRNOS' list of "errnos that
identify hard failure of a previously-established network connection".
While one could imagine that this is sometimes recoverable, the same
could be said of other entries such as ENETDOWN.

In support of this, handle ETIMEDOUT on par with other socket errors
in relevant infrastructure, such as TranslateSocketError().
(I made a couple of cosmetic adjustments in TranslateSocketError(),
too.)  The code now assumes that ETIMEDOUT is defined everywhere,
which it should be given that POSIX has required it since SUSv2.

Perhaps this should be back-patched, but I'm hesitant to do so given
the lack of previous complaints, and the hazard that there's a small
ABI break on Windows from redefining the symbol.  Even if we decide
to do that, it'd be prudent to let this bake awhile in HEAD first.

Jelte Fennema

Discussion: https://postgr.es/m/AM5PR83MB01782BFF2978505F6D6C559AF7AA9@AM5PR83MB0178.EURPRD83.prod.outlook.com
2021-09-30 14:16:08 -04:00
Alvaro Herrera ff9f111bce
Fix WAL replay in presence of an incomplete record
Physical replication always ships WAL segment files to replicas once
they are complete.  This is a problem if one WAL record is split across
a segment boundary and the primary server crashes before writing down
the segment with the next portion of the WAL record: WAL writing after
crash recovery would happily resume at the point where the broken record
started, overwriting that record ... but any standby or backup may have
already received a copy of that segment, and they are not rewinding.
This causes standbys to stop following the primary after the latter
crashes:
  LOG:  invalid contrecord length 7262 at A8/D9FFFBC8
because the standby is still trying to read the continuation record
(contrecord) for the original long WAL record, but it is not there and
it will never be.  A workaround is to stop the replica, delete the WAL
file, and restart it -- at which point a fresh copy is brought over from
the primary.  But that's pretty labor intensive, and I bet many users
would just give up and re-clone the standby instead.

A fix for this problem was already attempted in commit 515e3d84a0, but
it only addressed the case for the scenario of WAL archiving, so
streaming replication would still be a problem (as well as other things
such as taking a filesystem-level backup while the server is down after
having crashed), and it had performance scalability problems too; so it
had to be reverted.

This commit fixes the problem using an approach suggested by Andres
Freund, whereby the initial portion(s) of the split-up WAL record are
kept, and a special type of WAL record is written where the contrecord
was lost, so that WAL replay in the replica knows to skip the broken
parts.  With this approach, we can continue to stream/archive segment
files as soon as they are complete, and replay of the broken records
will proceed across the crash point without a hitch.

Because a new type of WAL record is added, users should be careful to
upgrade standbys first, primaries later. Otherwise they risk the standby
being unable to start if the primary happens to write such a record.

A new TAP test that exercises this is added, but the portability of it
is yet to be seen.

This has been wrong since the introduction of physical replication, so
backpatch all the way back.  In stable branches, keep the new
XLogReaderState members at the end of the struct, to avoid an ABI
break.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql
2021-09-29 11:21:51 -03:00
Peter Eisentraut 0b947c3101 Fix incorrect format placeholder 2021-09-29 08:12:23 +02:00
Alexander Korotkov b92f9f7443 Split macros from visibilitymap.h into a separate header
That allows to include just visibilitymapdefs.h from file.c, and in turn,
remove include of postgres.h from relcache.h.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210913232614.czafiubr435l6egi%40alap3.anarazel.de
Author: Alexander Korotkov
Reviewed-by: Andres Freund, Tom Lane, Alvaro Herrera
Backpatch-through: 13
2021-09-23 19:59:03 +03:00
Amit Kapila 4548c76738 Invalidate all partitions for a partitioned table in publication.
Updates/Deletes on a partition were allowed even without replica identity
after the parent table was added to a publication. This would later lead
to an error on subscribers. The reason was that we were not invalidating
the partition's relcache and the publication information for partitions
was not getting rebuilt. Similarly, we were not invalidating the
partitions' relcache after dropping a partitioned table from a publication
which will prohibit Updates/Deletes on its partition without replica
identity even without any publication.

Reported-by: Haiying Tang
Author: Hou Zhijie and Vignesh C
Reviewed-by: Vignesh C and Amit Kapila
Backpatch-through: 13
Discussion: https://postgr.es/m/OS0PR01MB6113D77F583C922F1CEAA1C3FBD29@OS0PR01MB6113.jpnprd01.prod.outlook.com
2021-09-22 08:00:54 +05:30
Peter Geoghegan dd94c2852e Fix "single value strategy" index deletion issue.
It is not appropriate for deduplication to apply single value strategy
when triggered by a bottom-up index deletion pass.  This wastes cycles
because later bottom-up deletion passes will overinterpret older
duplicate tuples that deduplication actually just skipped over "by
design".  It also makes bottom-up deletion much less effective for low
cardinality indexes that happen to cross a meaningless "index has single
key value per leaf page" threshold.

To fix, slightly narrow the conditions under which deduplication's
single value strategy is considered.  We already avoided the strategy
for a unique index, since our high level goal must just be to buy time
for VACUUM to run (not to buy space).  We'll now also avoid it when we
just had a bottom-up pass that reported failure.  The two cases share
the same high level goal, and already overlapped significantly, so this
approach is quite natural.

Oversight in commit d168b666, which added bottom-up index deletion.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznaOvM+Gyj-JQ0X=JxoMDxctDTYjiEuETdAGbF5EUc3MA@mail.gmail.com
Backpatch: 14-, where bottom-up deletion was introduced.
2021-09-21 18:57:32 -07:00
Alvaro Herrera ade24dab97
Document XLOG_INCLUDE_XID a little better
I noticed that commit 0bead9af48 left this flag undocumented in
XLogSetRecordFlags, which led me to discover that the flag doesn't
actually do what the one comment on it said it does.  Improve the
situation by adding some more comments.

Backpatch to 14, where the aforementioned commit appears.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/202109212119.c3nhfp64t2ql@alvherre.pgsql
2021-09-21 19:47:53 -03:00
Michael Paquier 43c1c4f65e Introduce GUC shared_memory_size_in_huge_pages
This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area, taking advantage of the
work done in 0c39c29 and 0bd305e.  This is useful for users to estimate
the amount of huge pages required for a server as it becomes possible to
do an estimation without having to start the server and potentially
allocate a large chunk of shared memory.

The number of huge pages is calculated based on the existing GUC
huge_page_size if set, or by using the system's default by looking at
/proc/meminfo on Linux.  There is nothing new here as this commit reuses
the existing calculation methods, and just exposes this information
directly to the user.  The routine calculating the huge page size is
refactored to limit the number of files with platform-specific flags.

This new GUC's name was the most popular choice based on the discussion
done.  This is only supported on Linux.

I have taken the time to test the change on Linux, Windows and MacOS,
though for the last two ones large pages are not supported.  The first
one calculates correctly the number of pages depending on the existing
GUC huge_page_size or the system's default.

Thanks to Andres Freund, Robert Haas, Kyotaro Horiguchi, Tom Lane,
Justin Pryzby (and anybody forgotten here) for the discussion.

Author: Nathan Bossart
Discussion: https://postgr.es/m/F2772387-CE0F-46BF-B5F1-CC55516EB885@amazon.com
2021-09-21 10:31:58 +09:00
Andres Freund 6b9501660c pgstat: Prepare to use mechanism for truncated rels also for droppped rels.
The upcoming shared memory stats patch drops stats for dropped objects in a
transactional manner, rather than removing them later as part of vacuum. This
means that stats for DROP inside a transaction needs to handle aborted
(sub-)transactions similar to TRUNCATE: The stats up to the DROP should be
restored.

Rename the existing infrastructure in preparation.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de
2021-09-20 14:02:48 -07:00
Alvaro Herrera d3014fff4c
Doc: add glossary term for "auxiliary process"
Add entries for existing processes not documented, too, and adjust
existing definitions for consistency.

Per question from Bharath Rupireddy.

Author: Justin Pryzby <pryzby@telsasoft.com>
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CALj2ACVpYCT0M+k8zqrAa4ZQZV+ce5s6G=yajwoS1m=h-jj8NQ@mail.gmail.com
2021-09-20 12:22:02 -03:00
Andres Freund 7c83a3bf51 process startup: Split single user code out of PostgresMain().
It was harder than necessary to understand PostgresMain() because the code for
a normal backend was interspersed with single-user mode specific code. Split
most of the single-user mode code into its own function
PostgresSingleUserMain(), that does all the necessary setup for single-user
mode, and then hands off after that to PostgresMain().

There still is some single-user mode code in InitPostgres(), and it'd likely
be worth moving at least some of it out. But that's for later.

Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
2021-09-17 19:56:47 -07:00
Andres Freund 37a9aa6591 Fix performance regression from session statistics.
Session statistics, as introduced by 960869da08, had several shortcomings:

- an additional GetCurrentTimestamp() call that also impaired the accuracy of
  the data collected

  This can be avoided by passing the current timestamp we already have in
  pgstat_report_stat().

- an additional statistics UDP packet sent every 500ms

  This is solved by adding the new statistics to PgStat_MsgTabstat.
  This is conceptually ugly, because session statistics are not
  table statistics.  But the struct already contains data unrelated
  to tables, so there is not much damage done.

  Connection and disconnection are reported in separate messages, which
  reduces the number of additional messages to two messages per session and a
  slight increase in PgStat_MsgTabstat size (but the same number of table
  stats fit).

- Session time computation could overflow on systems where long is 32 bit.

Reported-By: Andres Freund <andres@anarazel.de>
Author: Andres Freund <andres@anarazel.de>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/20210801205501.nyxzxoelqoo4x2qc%40alap3.anarazel.de
Backpatch: 14-, where the feature was introduced.
2021-09-16 02:05:50 -07:00
Michael Paquier 0c39c29207 Support "postgres -C" with runtime-computed GUCs
Until now, the -C option of postgres was handled before a small subset
of GUCs computed at runtime are initialized, leading to incorrect
results as GUC machinery would fall back to default values for such
parameters.

For example, data_checksums could report "off" for a cluster as the
control file is not loaded yet.  Or wal_segment_size would show a
segment size at 16MB even if initdb --wal-segsize used something else.
Worse, the command would fail to properly report the recently-introduced
shared_memory, that requires to load shared_preload_libraries as these
could ask for a chunk of shared memory.

Support for runtime GUCs comes with a limitation, as the operation is
now allowed on a running server.  One notable reason for this is that
_PG_init() functions of loadable libraries are called before all
runtime-computed GUCs are initialized, and this is not guaranteed to be
safe to do on running servers.  For the case of shared_memory_size,
where we want to know how much memory would be used without allocating
it, this limitation is fine.  Another case where this will help is for
huge pages, with the introduction of a different GUC to evaluate the
amount of huge pages required for a server before starting it, without
having to allocate large chunks of memory.

This feature is controlled with a new GUC flag, and four parameters are
classified as runtime-computed as of this change:
- data_checksums
- shared_memory_size
- data_directory_mode
- wal_segment_size

Some TAP tests are added to provide some coverage here, using
data_checksums in the tests of pg_checksums.

Per discussion with Andres Freund, Justin Pryzby, Magnus Hagander and
more.

Author: Nathan Bossart
Discussion: https://postgr.es/m/F2772387-CE0F-46BF-B5F1-CC55516EB885@amazon.com
2021-09-16 10:59:26 +09:00
Tom Lane e3ec3c00d8 Remove arbitrary 64K-or-so limit on rangetable size.
Up to now the size of a query's rangetable has been limited by the
constants INNER_VAR et al, which mustn't be equal to any real
rangetable index.  65000 doubtless seemed like enough for anybody,
and it still is orders of magnitude larger than the number of joins
we can realistically handle.  However, we need a rangetable entry
for each child partition that is (or might be) processed by a query.
Queries with a few thousand partitions are getting more realistic,
so that the day when that limit becomes a problem is in sight,
even if it's not here yet.  Hence, let's raise the limit.

Rather than just increase the values of INNER_VAR et al, this patch
adopts the approach of making them small negative values, so that
rangetables could theoretically become as long as INT_MAX.

The bulk of the patch is concerned with changing Var.varno and some
related variables from "Index" (unsigned int) to plain "int".  This
is basically cosmetic, with little actual effect other than to help
debuggers print their values nicely.  As such, I've only bothered
with changing places that could actually see INNER_VAR et al, which
the parser and most of the planner don't.  We do have to be careful
in places that are performing less/greater comparisons on varnos,
but there are very few such places, other than the IS_SPECIAL_VARNO
macro itself.

A notable side effect of this patch is that while it used to be
possible to add INNER_VAR et al to a Bitmapset, that will now
draw an error.  I don't see any likelihood that it wouldn't be a
bug to include these fake varnos in a bitmapset of real varnos,
so I think this is all to the good.

Although this touches outfuncs/readfuncs, I don't think a catversion
bump is required, since stored rules would never contain Vars
with these fake varnos.

Andrey Lepikhov and Tom Lane, after a suggestion by Peter Eisentraut

Discussion: https://postgr.es/m/43c7f2f5-1e27-27aa-8c65-c91859d15190@postgrespro.ru
2021-09-15 14:11:21 -04:00
Peter Eisentraut 6fe0eb963d Add Cardinality typedef
Similar to Cost and Selectivity, this is just a double, which can be
used in path and plan nodes to give some hint about the meaning of a
field.

Discussion: https://www.postgresql.org/message-id/c091e5cd-45f8-69ee-6a9b-de86912cc7e7@enterprisedb.com
2021-09-15 18:56:13 +02:00
Peter Eisentraut f7e56f1f54 Update Unicode data to Unicode 14.0.0 2021-09-15 08:16:44 +02:00
Tom Lane 2e4eae87d0 Send NOTIFY signals during CommitTransaction.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client.  It therefore
had to run during the main-loop processing that occurs just before
going idle.  This arrangement had two big disadvantages:

* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).

* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop.  We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.

To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.

We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely.  That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify.  We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.

The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages.  To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely.  Removal
will occur in v15.

Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.

Per bug #15293 from Michael Powers (and similar gripes from others).

Artur Zakirov and Tom Lane

Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
2021-09-14 17:18:25 -04:00
Tom Lane e8638d78a2 Fix planner error with multiple copies of an AlternativeSubPlan.
It's possible for us to copy an AlternativeSubPlan expression node
into multiple places, for example the scan quals of several
partition children.  Then it's possible that we choose a different
one of the alternatives as optimal in each place.  Commit 41efb8340
failed to consider this scenario, so its attempt to remove "unused"
subplans could remove subplans that were still used elsewhere.

Fix by delaying the removal logic until we've examined all the
AlternativeSubPlans in a given query level.  (This does assume that
AlternativeSubPlans couldn't get copied to other query levels, but
for the foreseeable future that's fine; cf qual_is_pushdown_safe.)

Per report from Rajkumar Raghuwanshi.  Back-patch to v14
where the faulty logic came in.

Discussion: https://postgr.es/m/CAKcux6==O3NNZC3bZ2prRYv3cjm3_Zw1GfzmOjEVqYN4jub2+Q@mail.gmail.com
2021-09-14 15:11:21 -04:00
Peter Eisentraut 8539929197 Remove T_Expr
This is an abstract node that shouldn't have a node tag defined.

Reviewed-by: Jacob Champion <pchampion@vmware.com>
Discussion: https://www.postgresql.org/message-id/c091e5cd-45f8-69ee-6a9b-de86912cc7e7@enterprisedb.com
2021-09-14 10:27:29 +02:00
Andres Freund edb4d95ddf jit: Do not try to shut down LLVM state in case of LLVM triggered errors.
If an allocation failed within LLVM it is not safe to call back into LLVM as
LLVM is not generally safe against exceptions / stack-unwinding. Thus errors
while in LLVM code are promoted to FATAL. However llvm_shutdown() did call
back into LLVM even in such cases, while llvm_release_context() was careful
not to do so.

We cannot generally skip shutting down LLVM, as that can break profiling. But
it's OK to do so if there was an error from within LLVM.

Reported-By: Jelte Fennema <Jelte.Fennema@microsoft.com>
Author: Andres Freund <andres@anarazel.de>
Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/AM5PR83MB0178C52CCA0A8DEA0207DC14F7FF9@AM5PR83MB0178.EURPRD83.prod.outlook.com
Backpatch: 11-, where jit was introduced
2021-09-13 18:26:15 -07:00
Michael Paquier 026ed8efd6 Remove code duplication for permission checks with replication slots
Two functions, both named check_permissions(), used the same checks to
verify if a user had required privileges to work on replication slots.
This commit removes the duplication, and moves the function doing the
checks to slot.c to be centralized.

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Euler Taveira
Discussion: https://postgr.es/m/CALj2ACUPpVw1u7sQocFVWrSs0n10pt_G_4NPZKSxXK6cW1dErw@mail.gmail.com
2021-09-14 10:15:49 +09:00
Michael Paquier 2d77d83540 Refactor the syslogger pipe protocol to use a bitmask for its options
The previous protocol expected a set of matching characters to check if
a message sent was the last one or not, that changed depending on the
destination wanted:
- 't' and 'f' tracked the last message of a log sent to stderr.
- 'T' and 'F' tracked the last message of a log sent to csvlog.

This could be extended with more characters when introducing new
destinations, but using a bitmask is much more elegant.  This commit
changes the protocol so as a bitmask is used in the header of a log
chunk message sent to the syslogger, with the following options
available for now:
- log_destination as stderr.
- log_destination as csvlog.
- if a message is the last chunk of a message.

Sehrope found this issue in a patch set to introduce JSON as an option
for log_destination, but his patch made the size of the protocol header
larger.  This commit keeps the same size as the original, and adapts the
protocol as wanted.

Thanks also to Andrew Dunstan and Greg Stark for the discussion.

Author: Michael Paquier, Sehrope Sarkuni
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
2021-09-13 09:03:45 +09:00
Noah Misch b073c3ccd0 Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.
This switches the default ACL to what the documentation has recommended
since CVE-2018-1058.  Upgrades will carry forward any old ownership and
ACL.  Sites that declined the 2018 recommendation should take a fresh
look.  Recipes for commissioning a new database cluster from scratch may
need to create a schema, grant more privileges, etc.  Out-of-tree test
suites may require such updates.

Reviewed by Peter Eisentraut.

Discussion: https://postgr.es/m/20201031163518.GB4039133@rfd.leadboat.com
2021-09-09 23:38:09 -07:00
Peter Eisentraut 639a86e36a Remove Value node struct
The Value node struct is a weird construct.  It is its own node type,
but most of the time, it actually has a node type of Integer, Float,
String, or BitString.  As a consequence, the struct name and the node
type don't match most of the time, and so it has to be treated
specially a lot.  There doesn't seem to be any value in the special
construct.  There is very little code that wants to accept all Value
variants but nothing else (and even if it did, this doesn't provide
any convenient way to check it), and most code wants either just one
particular node type (usually String), or it accepts a broader set of
node types besides just Value.

This change removes the Value struct and node type and replaces them
by separate Integer, Float, String, and BitString node types that are
proper node types and structs of their own and behave mostly like
normal node types.

Also, this removes the T_Null node tag, which was previously also a
possible variant of Value but wasn't actually used outside of the
Value contained in A_Const.  Replace that by an isnull field in
A_Const.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5ba6bc5b-3f95-04f2-2419-f8ddb4c046fb@enterprisedb.com
2021-09-09 08:36:53 +02:00
Tom Lane 490798451a Fix misleading comments about TOAST access macros.
Seems to have been my error in commit aeb1631ed.
Noted by Christoph Berg.

Discussion: https://postgr.es/m/YTeLipdnSOg4NNcI@msg.df7cb.de
2021-09-08 14:11:35 -04:00
Peter Eisentraut a3d2b1bbe9 Disable anonymous record hash support except in special cases
Commit 01e658fa74 added hash support for row types.  This also added
support for hashing anonymous record types, using the same approach
that the type cache uses for comparison support for record types: It
just reports that it works, but it might fail at run time if a
component type doesn't actually support the operation.  We get away
with that for comparison because most types support that.  But some
types don't support hashing, so the current state can result in
failures at run time where the planner chooses hashing over sorting,
whereas that previously worked if only sorting was an option.

We do, however, want the record hashing support for path tracking in
recursive unions, and the SEARCH and CYCLE clauses built on that.  In
that case, hashing is the only plan option.  So enable that, this
commit implements the following approach: The type cache does not
report that hashing is available for the record type.  This undoes
that part of 01e658fa74.  Instead, callers that require hashing no
matter what can override that result themselves.  This patch only
touches the callers to make the aforementioned recursive query cases
work, namely the parse analysis of unions, as well as the hash_array()
function.

Reported-by: Sait Talha Nisanci <sait.nisanci@microsoft.com>
Bug: #17158
Discussion: https://www.postgresql.org/message-id/flat/17158-8a2ba823982537a4%40postgresql.org
2021-09-08 09:55:04 +02:00
Amit Kapila 8bd5342740 Invalidate relcache for publications defined for all tables.
Updates/Deletes on a relation were allowed even without replica identity
after we define the publication for all tables. This would later lead to
an error on subscribers. The reason was that for such publications we were
not invalidating the relcache and the publication information for
relations was not getting rebuilt. Similarly, we were not invalidating the
relcache after dropping of such publications which will prohibit
Updates/Deletes without replica identity even without any publication.

Author: Vignesh C and Hou Zhijie
Reviewed-by: Hou Zhijie, Kyotaro Horiguchi, Amit Kapila
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/CALDaNm0pF6zeWqCA8TCe2sDuwFAy8fCqba=nHampCKag-qLixg@mail.gmail.com
2021-09-08 11:50:37 +05:30
Michael Paquier bd1788051b Introduce GUC shared_memory_size
This runtime-computed GUC shows the size of the server's main shared
memory area, taking into account the amount of shared memory allocated
by extensions as this is calculated after processing
shared_preload_libraries.

Author: Nathan Bossart
Discussion: https://postgr.es/m/F2772387-CE0F-46BF-B5F1-CC55516EB885@amazon.com
2021-09-08 12:02:30 +09:00
Alvaro Herrera 0c6828fa98
Add PublicationTable and PublicationRelInfo structs
These encapsulate a relation when referred from replication DDL.
Currently they don't do anything useful (they're just wrappers around
RangeVar and Relation respectively) but in the future they'll be used to
carry column lists.

Extracted from a larger patch by Rahila Syed.

Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com
2021-09-06 14:24:50 -03:00
Tom Lane 89dba59590 Fix actively-misleading comments about the contents of struct pg_tm.
pgtime.h documented the PG interpretation of tm_mon right alongside
the POSIX interpretation of tm_year, with no hint that neither
comment was correct throughout our code.

Perhaps someday we ought to switch to using two separate struct
definitions to provide a clearer indication of which semantics are
in use where.  But I fear the tedium-versus-safety-gain tradeoff
would not be very good.

Discussion: https://postgr.es/m/CAJ7c6TOMG8zSNEZtCn5SPe+cCk3Lfxb71ZaQwT2F4T7PJ_t=KA@mail.gmail.com
2021-09-06 11:43:44 -04:00
Tom Lane 388e71af88 Make timetz_zone() stable, and correct a bug for DYNTZ abbreviations.
Historically, timetz_zone() has used time(NULL) as the reference point
for deciding whether DST is active.  That means its result can change
intra-statement, requiring it to be marked VOLATILE (cf. 35979e6c3).
But that definition is pretty inconsistent with the way we deal with
timestamps elsewhere.  Let's make it use the transaction start time
("now()") as the reference point instead.  That lets it be marked
STABLE, and also saves a kernel call per invocation.

While at it, remove the function's use of pg_time_t and pg_localtime.
Those are inconsistent with the other code in this area, which indeed
created a bug: timetz_zone() delivered completely wrong answers if
the zone was specified by a dynamic TZ abbreviation.  (We need to do
something about that in the back branches, but the fix will look
different from this.)

Aleksander Alekseev and Tom Lane

Discussion: https://postgr.es/m/CAJ7c6TOMG8zSNEZtCn5SPe+cCk3Lfxb71ZaQwT2F4T7PJ_t=KA@mail.gmail.com
2021-09-06 11:03:56 -04:00
Michael Paquier 0bd305ee1d Move the shared memory size calculation to its own function
This change refactors the shared memory size calculation in
CreateSharedMemoryAndSemaphores() to its own function.  This is intended
for use in a future change related to the setup of huge pages and shared
memory with some GUCs, while useful on its own for extensions.

Author: Nathan Bossart
Discussion: https://postgr.es/m/F2772387-CE0F-46BF-B5F1-CC55516EB885@amazon.com
2021-09-06 10:59:20 +09:00
Alvaro Herrera 96b665083e
Revert "Avoid creating archive status ".ready" files too early"
This reverts commit 515e3d84a0 and equivalent commits in back
branches.  This solution to the problem has a number of problems, so
we'll try again with a different approach.

Per note from Andres Freund

Discussion: https://postgr.es/m/20210831042949.52eqp5xwbxgrfank@alap3.anarazel.de
2021-09-04 12:14:30 -04:00
John Naylor 0c6a6a0ab7 Set the volatility of the timestamptz version of date_bin() back to immutable
543f36b43d was too hasty in thinking that the volatility of date_bin()
had to match date_trunc(), since only the latter references
session_timezone.

Bump catversion

Per feedback from Aleksander Alekseev
Backpatch to v14, as the former commit was
2021-09-03 13:39:16 -04:00
Fujii Masao e04267844a Enhance pg_stat_reset_single_table_counters function.
This commit allows pg_stat_reset_single_table_counters() to reset statistics
for a single relation shared across all databases in the cluster to zero.

Bump catalog version.

Author: B Sadhu Prasad Patro
Reviewed-by: Mahendra Singh Thalor, Himanshu Upadhyaya, Dilip Kumar, Fujii Masao
Discussion: https://postgr.es/m/CAFF0-CGy7EHeF=AqqkGMF85cySPQBgDcvNk73G2O0vL94O5U5A@mail.gmail.com
2021-09-02 14:01:06 +09:00
Amit Kapila 31c389d8de Optimize fileset usage in apply worker.
Use one fileset for the entire worker lifetime instead of using
separate filesets for each streaming transaction. Now, the
changes/subxacts files for every streaming transaction will be
created under the same fileset and the files will be deleted
after the transaction is completed.

This patch extends the BufFileOpenFileSet and BufFileDeleteFileSet
APIs to allow users to specify whether to give an error on missing
files.

Author: Dilip Kumar, based on suggestion by Thomas Munro
Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
2021-09-02 08:13:46 +05:30
John Naylor 543f36b43d Mark the timestamptz variant of date_bin() as stable
Previously, it was immutable by lack of marking. This is not
correct, since the time zone could change.

Bump catversion

Discussion: https://www.postgresql.org/message-id/CAFBsxsG2UHk8mOWL0tca%3D_cg%2B_oA5mVRNLhDF0TBw980iOg5NQ%40mail.gmail.com
Backpatch to v14, when this function came in
2021-08-31 15:18:26 -04:00
Amit Kapila dcac5e7ac1 Refactor sharedfileset.c to separate out fileset implementation.
Move fileset related implementation out of sharedfileset.c to allow its
usage by backends that don't want to share filesets among different
processes. After this split, fileset infrastructure is used by both
sharedfileset.c and worker.c for the named temporary files that survive
across transactions.

Author: Dilip Kumar, based on suggestion by Andres Freund
Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
2021-08-30 08:48:15 +05:30
Amit Kapila abc0910e2e Add logical change details to logical replication worker errcontext.
Previously, on the subscriber, we set the error context callback for the
tuple data conversion failures. This commit replaces the existing error
context callback with a comprehensive one so that it shows not only the
details of data conversion failures but also the details of logical change
being applied by the apply worker or table sync worker. The additional
information displayed will be the command, transaction id, and timestamp.

The error context is added to an error only when applying a change but not
while doing other work like receiving data etc.

This will help users in diagnosing the problems that occur during logical
replication. It also can be used for future work that allows skipping a
particular transaction on the subscriber.

Author: Masahiko Sawada
Reviewed-by: Hou Zhijie, Greg Nancarrow, Haiying Tang, Amit Kapila
Tested-by: Haiying Tang
Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2021-08-27 08:30:23 +05:30
John Naylor 5bc429aacb Extend collection of Unicode combining characters to beyond the BMP
The former limit was perhaps a carryover from an older hand-coded
table. Since commit bab982161 we have enough space in mbinterval to
store larger codepoints, so collect all combining characters.

Discussion: https://www.postgresql.org/message-id/49ad1fa0-174e-c901-b14c-c484b60907f1%40enterprisedb.com
2021-08-26 13:07:34 -04:00
John Naylor bab982161e Update display widths as part of updating Unicode
The hardcoded "wide character" set in ucs_wcwidth() was last updated
around the Unicode 5.0 era.  This led to misalignment when printing
emojis and other codepoints that have since been designated
wide or full-width.

To fix and keep up to date, extend update-unicode to download the list
of wide and full-width codepoints from the offical sources.

In passing, remove some comments about non-spacing characters that
haven't been accurate since we removed the former hardcoded logic.

Jacob Champion

Reported and reviewed by Pavel Stehule
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRCeX21O69YHxmykYySYyprZAqrKWWg0KoGKdjgqcGyygg@mail.gmail.com
2021-08-26 10:53:56 -04:00
John Naylor 1563ecbc1b Revert "Rename unicode_combining_table to unicode_width_table"
This reverts commit eb0d0d2c73.

After I had committed eb0d0d2c7 and 78ab944cd, I decided to add
a sanity check for a "can't happen" scenario just to be cautious.
It turned out that it already happened in the official Unicode source
data, namely that a character can be both wide and a combining
character. This fact renders the aforementioned commits unnecessary,
so revert both of them.

Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com
2021-08-26 10:06:12 -04:00
John Naylor f8c8a8bccc Revert "Change mbbisearch to return the character range"
This reverts commit 78ab944cd4.

After I had committed eb0d0d2c7 and 78ab944cd, I decided to add
a sanity check for a "can't happen" scenario just to be cautious.
It turned out that it already happened in the official Unicode source
data, namely that a character can be both wide and a combining
character. This fact renders the aforementioned commits unnecessary,
so revert both of them.

Discussion:
https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com
2021-08-26 09:58:28 -04:00
John Naylor 78ab944cd4 Change mbbisearch to return the character range
Add a width field to mbinterval and have mbbisearch return a
pointer to the found range rather than just bool for success.
A future commit will add another width besides zero, and this
will allow that to use the same search.

Reviewed by Jacob Champion
Discussion: https://www.postgresql.org/message-id/CAFBsxsGOCpzV7c-f3a8ADsA1n4uZ%3D8puCctQp%2Bx7W0vgkv%3Dw%2Bg%40mail.gmail.com
2021-08-25 13:08:11 -04:00
John Naylor eb0d0d2c73 Rename unicode_combining_table to unicode_width_table
No functional changes. A future commit will use this table for
other purposes besides combining characters.
2021-08-25 13:01:35 -04:00
Peter Eisentraut bb9ff46bc4 Fix typo 2021-08-25 10:14:51 +02:00
Amit Kapila 29b5905470 Fix toast rewrites in logical decoding.
Commit 325f2ec555 introduced pg_class.relwrite to skip operations on
tables created as part of a heap rewrite during DDL. It links such
transient heaps to the original relation OID via this new field in
pg_class but forgot to do anything about toast tables. So, logical
decoding was not able to skip operations on internally created toast
tables. This leads to an error when we tried to decode the WAL for the
next operation for which it appeared that there is a toast data where
actually it didn't have any toast data.

To fix this, we set pg_class.relwrite for internally created toast tables
as well which allowed skipping operations on them during logical decoding.

Author: Bertrand Drouvot
Reviewed-by: David Zhang, Amit Kapila
Backpatch-through: 11, where it was introduced
Discussion: https://postgr.es/m/b5146fb1-ad9e-7d6e-f980-98ed68744a7c@amazon.com
2021-08-25 09:53:07 +05:30
Alvaro Herrera 515e3d84a0
Avoid creating archive status ".ready" files too early
WAL records may span multiple segments, but XLogWrite() does not
wait for the entire record to be written out to disk before
creating archive status files.  Instead, as soon as the last WAL page of
the segment is written, the archive status file is created, and the
archiver may process it.  If PostgreSQL crashes before it is able to
write and flush the rest of the record (in the next WAL segment), the
wrong version of the first segment file lingers in the archive, which
causes operations such as point-in-time restores to fail.

To fix this, keep track of records that span across segments and ensure
that segments are only marked ready-for-archival once such records have
been completely written to disk.

This has always been wrong, so backpatch all the way back.

Author: Nathan Bossart <bossartn@amazon.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505@amazon.com
2021-08-23 15:50:35 -04:00
David Rowley 22c4e88ebf Allow parallel DISTINCT
We've supported parallel aggregation since e06a38965.  At the time, we
didn't quite get around to also adding parallel DISTINCT. So, let's do
that now.

This is implemented by introducing a two-phase DISTINCT.  Phase 1 is
performed on parallel workers, rows are made distinct there either by
hashing or by sort/unique.  The results from the parallel workers are
combined and the final distinct phase is performed serially to get rid of
any duplicate rows that appear due to combining rows for each of the
parallel workers.

Author: David Rowley
Reviewed-by: Zhihong Yu
Discussion: https://postgr.es/m/CAApHDvrjRxVKwQN0he79xS+9wyotFXL=RmoWqGGO2N45Farpgw@mail.gmail.com
2021-08-22 23:31:16 +12:00
Tom Lane 8d2d6ec770 Avoid trying to lock OLD/NEW in a rule with FOR UPDATE.
transformLockingClause neglected to exclude the pseudo-RTEs for
OLD/NEW when processing a rule's query.  This led to odd errors
or even crashes later on.  This bug is very ancient, but it's
not terribly surprising that nobody noticed, since the use-case
for SELECT FOR UPDATE in a non-view rule is somewhere between
thin and non-existent.  Still, crashing is not OK.

Per bug #17151 from Zhiyong Wu.  Thanks to Masahiko Sawada
for analysis of the problem.

Discussion: https://postgr.es/m/17151-c03a3e6e4ec9aadb@postgresql.org
2021-08-19 12:12:35 -04:00
Amit Kapila 4cd7a18968 Rename LOGICAL_REP_MSG_STREAM_END to LOGICAL_REP_MSG_STREAM_STOP.
In the code, most places used the term "Stream Stop" for the logical
stream message. This commit improves consistency by renaming LogicalRepMsgType
"LOGICAL_REP_MSG_STREAM_END" to "LOGICAL_REP_MSG_STREAM_STOP".

Author: Masahiko Sawada
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2021-08-19 09:34:26 +05:30
Michael Paquier 2576dcfb76 Revert refactoring of hex code to src/common/
This is a combined revert of the following commits:
- c3826f8, a refactoring piece that moved the hex decoding code to
src/common/.  This code was cleaned up by aef8948, as it originally
included no overflow checks in the same way as the base64 routines in
src/common/ used by SCRAM, making it unsafe for its purpose.
- aef8948, a more advanced refactoring of the hex encoding/decoding code
to src/common/ that added sanity checks on the result buffer for hex
decoding and encoding.  As reported by Hans Buschmann, those overflow
checks are expensive, and it is possible to see a performance drop in
the decoding/encoding of bytea or LOs the longer they are.  Simple SQLs
working on large bytea values show a clear difference in perf profile.
- ccf4e27, a cleanup made possible by aef8948.

The reverts of all those commits bring back the performance of hex
decoding and encoding back to what it was in ~13.  Fow now and
post-beta3, this is the simplest option.

Reported-by: Hans Buschmann
Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net
Backpatch-through: 14
2021-08-19 09:20:13 +09:00
Tom Lane 6b71c925cb Prevent ALTER TYPE/DOMAIN/OPERATOR from changing extension membership.
If recordDependencyOnCurrentExtension is invoked on a pre-existing,
free-standing object during an extension update script, that object
will become owned by the extension.  In our current code this is
possible in three cases:

* Replacing a "shell" type or operator.
* CREATE OR REPLACE overwriting an existing object.
* ALTER TYPE SET, ALTER DOMAIN SET, and ALTER OPERATOR SET.

The first of these cases is intentional behavior, as noted by the
existing comments for GenerateTypeDependencies.  It seems like
appropriate behavior for CREATE OR REPLACE too; at least, the obvious
alternatives are not better.  However, the fact that it happens during
ALTER is an artifact of trying to share code (GenerateTypeDependencies
and makeOperatorDependencies) between the CREATE and ALTER cases.
Since an extension script would be unlikely to ALTER an object that
didn't already belong to the extension, this behavior is not very
troubling for the direct target object ... but ALTER TYPE SET will
recurse to dependent domains, and it is very uncool for those to
become owned by the extension if they were not already.

Let's fix this by redefining the ALTER cases to never change extension
membership, full stop.  We could minimize the behavioral change by
only changing the behavior when ALTER TYPE SET is recursing to a
domain, but that would complicate the code and it does not seem like
a better definition.

Per bug #17144 from Alex Kozhemyakin.  Back-patch to v13 where ALTER
TYPE SET was added.  (The other cases are older, but since they only
affect the directly-named object, there's not enough of a problem to
justify changing the behavior further back.)

Discussion: https://postgr.es/m/17144-e67d7a8f049de9af@postgresql.org
2021-08-17 14:29:22 -04:00
Alvaro Herrera 6f8127b739
Revert analyze support for partitioned tables
This reverts the following commits:
1b5617eb84 Describe (auto-)analyze behavior for partitioned tables
0e69f705cc Set pg_class.reltuples for partitioned tables
41badeaba8 Document ANALYZE storage parameters for partitioned tables
0827e8af70 autovacuum: handle analyze for partitioned tables

There are efficiency issues in this code when handling databases with
large numbers of partitions, and it doesn't look like there isn't any
trivial way to handle those.  There are some other issues as well.  It's
now too late in the cycle for nontrivial fixes, so we'll have to let
Postgres 14 users continue to manually deal with ANALYZE their
partitioned tables, and hopefully we can fix the issues for Postgres 15.

I kept [most of] be280cdad2 ("Don't reset relhasindex for partitioned
tables on ANALYZE") because while we added it due to 0827e8af70, it is
a good bugfix in its own right, since it affects manual analyze as well
as autovacuum-induced analyze, and there's no reason to revert it.

I retained the addition of relkind 'p' to tables included by
pg_stat_user_tables, because reverting that would require a catversion
bump.
Also, in pg14 only, I keep a struct member that was added to
PgStat_TabStatEntry to avoid breaking compatibility with existing stat
files.

Backpatch to 14.

Discussion: https://postgr.es/m/20210722205458.f2bug3z6qzxzpx2s@alap3.anarazel.de
2021-08-16 17:27:52 -04:00
John Naylor 4864c8e8f1 Use direct function calls for pg_popcount{32,64} on non-x86 platforms
Previously, all pg_popcount{32,64} calls were indirected through
a function pointer, even though we had no fast implementation for
non-x86 platforms. Instead, for those platforms use wrappers around
the pg_popcount{32,64}_slow functions.

Review and additional hacking by David Rowley
Reviewed by Álvaro Herrera

Discussion: https://www.postgresql.org/message-id/flat/CAFBsxsE7otwnfA36Ly44zZO%2Bb7AEWHRFANxR1h1kxveEV%3DghLQ%40mail.gmail.com
2021-08-16 11:51:15 -04:00
Tom Lane c32fcac56a Add RISC-V spinlock support in s_lock.h.
Like the ARM case, just use gcc's __sync_lock_test_and_set();
that will compile into AMOSWAP.W.AQ which does what we need.

At some point it might be worth doing some work on atomic ops
for RISC-V, but this should be enough for a creditable port.

Back-patch to all supported branches, just in case somebody
wants to try them on RISC-V.

Marek Szuba

Discussion: https://postgr.es/m/dea97b6d-f55f-1f6d-9109-504aa7dfa421@gentoo.org
2021-08-13 13:59:43 -04:00
Andres Freund 80a8f95b3b Remove support for background workers without BGWORKER_SHMEM_ACCESS.
Background workers without shared memory access have been broken on
EXEC_BACKEND / windows builds since shortly after background workers have been
introduced, without that being reported. Clearly they are not commonly used.

The problem is that bgworker startup requires to be attached to shared memory
in EXEC_BACKEND child processes. StartBackgroundWorker() detaches from shared
memory for unconnected workers, but at that point we already have initialized
subsystems referencing shared memory.

Fixing this problem is not entirely trivial, so removing the option to not be
connected to shared memory seems the best way forward. In most use cases the
advantages of being connected to shared memory far outweigh the disadvantages.

As there have been no reports about this issue so far, we have decided that it
is not worth trying to address the problem in the back branches.

Per discussion with Alvaro Herrera, Robert Haas and Tom Lane.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210802065116.j763tz3vz4egqy3w@alap3.anarazel.de
2021-08-13 05:49:26 -07:00
David Rowley 37450f2ca9 Fix incorrect hash table resizing code in simplehash.h
This fixes a bug in simplehash.h which caused an incorrect size mask to be
used when the hash table grew to SH_MAX_SIZE (2^32).  The code was
incorrectly setting the size mask to 0 when the hash tables reached the
maximum possible number of buckets.  This would result always trying to
use the 0th bucket causing an  infinite loop of trying to grow the hash
table due to there being too many collisions.

Seemingly it's not that common for simplehash tables to ever grow this big
as this bug dates back to v10 and nobody seems to have noticed it before.
However, probably the most likely place that people would notice it would
be doing a large in-memory Hash Aggregate with something close to at least
2^31 groups.

After this fix, the code now works correctly with up to within 98% of 2^32
groups and will fail with the following error when trying to insert any
more items into the hash table:

ERROR:  hash table size exceeded

However, the work_mem (or hash_mem_multiplier in newer versions) settings
will generally cause Hash Aggregates to spill to disk long before reaching
that many groups.  The minimal test case I did took a work_mem setting of
over 192GB to hit the bug.

simplehash hash tables are used in a few other places such as Bitmap Index
Scans, however, again the size that the hash table can become there is
also limited to work_mem and it would take a relation of around 16TB
(2^31) pages and a very large work_mem setting to hit this.  With smaller
work_mem values the table would become lossy and never grow large enough
to hit the problem.

Author: Yura Sokolov
Reviewed-by: David Rowley, Ranier Vilela
Discussion: https://postgr.es/m/b1f7f32737c3438136f64b26f4852b96@postgrespro.ru
Backpatch-through: 10, where simplehash.h was added
2021-08-13 16:41:26 +12:00
Tom Lane 18bac60ede Let regexp_replace() make use of REG_NOSUB when feasible.
If the replacement string doesn't contain \1...\9, then we don't
need sub-match locations, so we can use the REG_NOSUB optimization
here too.  There's already a pre-scan of the replacement string
to look for backslashes, so extend that to check for digits, and
refactor to allow that to happen before we compile the regexp.

While at it, try to speed up the pre-scan by using memchr() instead
of a handwritten loop.  It's likely that this is lost in the noise
compared to the regexp processing proper, but maybe not.  In any
case, this coding is shorter.

Also, add some test cases to improve the poor coverage of
appendStringInfoRegexpSubstr().

Discussion: https://postgr.es/m/3534632.1628536485@sss.pgh.pa.us
2021-08-09 20:53:25 -04:00
Tom Lane 0e6aa8747d Avoid determining regexp subexpression matches, when possible.
Identifying the precise match locations for parenthesized subexpressions
is a fairly expensive task given the way our regexp engine works, both
at regexp compile time (where we must create an optimized NFA for each
parenthesized subexpression) and at runtime (where determining exact
match locations requires laborious search).

Up to now we've made little attempt to optimize this situation.  This
patch identifies cases where we know at compile time that we won't
need to know subexpression match locations, and teaches the regexp
compiler to not bother creating per-subexpression regexps for
parenthesis pairs that are not referenced by backrefs elsewhere in
the regexp.  (To preserve semantics, we obviously still have to
pin down the match locations of backref references.)  Users could
have obtained the same results before this by being careful to
write "non capturing" parentheses wherever possible, but few people
bother with that.

Discussion: https://postgr.es/m/2219936.1628115334@sss.pgh.pa.us
2021-08-09 11:26:34 -04:00
Peter Eisentraut 18fea737b5 Change NestPath node to contain JoinPath node
This makes the structure of all JoinPath-derived nodes the same,
independent of whether they have additional fields.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-08-08 18:46:34 +02:00
Peter Eisentraut 2226b4189b Change SeqScan node to contain Scan node
This makes the structure of all Scan-derived nodes the same,
independent of whether they have additional fields.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-08-08 18:46:34 +02:00
David Rowley 75a2d132ea Remove unused function declaration
It appears that check_track_commit_timestamp was declared but has never
been defined in our code base.  Likely this is just leftover cruft from
a development version of the original patch to add commit timestamps.

Let's just remove the useless declaration.  The inclusion of guc.h also
seems surplus to requirements.

Author: Andrey Lepikhov
Discussion: https://postgr.es/m/f49aefb5-edbb-633a-af07-3e777023a94d@postgrespro.ru
2021-08-08 23:27:57 +12:00
Andres Freund 675c945394 Move temporary file cleanup to before_shmem_exit().
As reported by a few OSX buildfarm animals there exist at least one path where
temporary files exist during AtProcExit_Files() processing. As temporary file
cleanup causes pgstat reporting, the assertions added in ee3f8d3d3a caused
failures.

This is not an OSX specific issue, we were just lucky that timing on OSX
reliably triggered the problem.  The known way to cause this is a FATAL error
during perform_base_backup() with a MANIFEST used - adding an elog(FATAL)
after InitializeBackupManifest() reliably reproduces the problem in isolation.

The problem is that the temporary file created in InitializeBackupManifest()
is not cleaned up via resource owner cleanup as WalSndResourceCleanup()
currently is only used for non-FATAL errors. That then allows to reach
AtProcExit_Files() with existing temporary files, causing the assertion
failure.

To fix this problem, move temporary file cleanup to a before_shmem_exit() hook
and add assertions ensuring that no temporary files are created before / after
temporary file management has been initialized / shut down. The cleanest way
to do so seems to be to split fd.c initialization into two, one for plain file
access and one for temporary file access.

Right now there's no need to perform further fd.c cleanup during process exit,
so I just renamed AtProcExit_Files() to BeforeShmemExit_Files(). Alternatively
we could perform another pass through the files to check that no temporary
files exist, but the added assertions seem to provide enough protection
against that.

It might turn out that the assertions added in ee3f8d3d3a will cause too much
noise - in that case we'll have to downgrade them to a WARNING, at least
temporarily.

This commit is not necessarily the best approach to address this issue, but it
should resolve the buildfarm failures. We can revise later.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210807190131.2bm24acbebl4wl6i@alap3.anarazel.de
2021-08-07 19:20:47 -07:00
Peter Eisentraut 256909c6c1 Remove T_MemoryContext
This is an abstract node that shouldn't have a node tag defined.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-08-07 23:21:24 +02:00
Andres Freund b406478b87 process startup: Always call Init[Auxiliary]Process() before BaseInit().
For EXEC_BACKEND InitProcess()/InitAuxiliaryProcess() needs to have been
called well before we call BaseInit(), as SubPostmasterMain() needs LWLocks to
work. Having the order of initialization differ between platforms makes it
unnecessarily hard to understand the system and to add initialization points
for new subsystems without a lot of duplication.

To be able to change the order, BaseInit() cannot trigger
CreateSharedMemoryAndSemaphores() anymore - obviously that needs to have
happened before we can call InitProcess(). It seems cleaner to create shared
memory explicitly in single user/bootstrap mode anyway.

After this change the separation of bufmgr initialization into
InitBufferPoolAccess() / InitBufferPoolBackend() is not meaningful anymore so
the latter is removed.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
2021-08-05 15:36:59 -07:00
Andres Freund f8dd4ecb0b process startup: Remove bootstrap / checker modes from AuxProcType.
Neither is actually initialized as an auxiliary process, so it does not really
make sense to reserve a PGPROC etc for them.

This keeps checker mode implemented by exiting partway through bootstrap
mode. That might be worth changing at some point, perhaps if we ever extend
checker mode to be a more general tool.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
2021-08-05 12:18:15 -07:00
Andres Freund 0a692109dc process startup: Move AuxiliaryProcessMain into its own file.
After the preceding commits the auxprocess code is independent from
bootstrap.c - so a dedicated file seems less confusing.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
2021-08-05 12:12:11 -07:00
Andres Freund 5aa4a9d207 process startup: Separate out BootstrapModeMain from AuxiliaryProcessMain.
There practically was no shared code between the two, once all the ifs are
removed. And it was quite confusing that aux processes weren't actually
started by the call to AuxiliaryProcessMain() in main().

There's more to do, AuxiliaryProcessMain() should move out of bootstrap.c, and
BootstrapModeMain() shouldn't use/be part of AuxProcType.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
2021-08-05 12:03:30 -07:00
Andres Freund 1bc8e7b099 pgstat: split reporting/fetching of bgwriter and checkpointer stats.
These have been unrelated since bgwriter and checkpointer were split into two
processes in 806a2aee37. As there several pending patches (shared memory
stats, extending the set of tracked IO / buffer statistics) that are made a
bit more awkward by the grouping, split them. Done separately to make
reviewing easier.

This does *not* change the contents of pg_stat_bgwriter or move fields out of
bgwriter/checkpointer stats that arguably do not belong in either. However
pgstat_fetch_global() was renamed and split into
pgstat_fetch_stat_checkpointer() and pgstat_fetch_stat_bgwriter().

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de
2021-08-04 19:16:04 -07:00
Amit Kapila 63cf61cdeb Add prepare API support for streaming transactions in logical replication.
Commit a8fd13cab0 added support for prepared transactions to built-in
logical replication via a new option "two_phase" for a subscription. The
"two_phase" option was not allowed with the existing streaming option.

This commit permits the combination of "streaming" and "two_phase"
subscription options. It extends the pgoutput plugin and the subscriber
side code to add the prepare API for streaming transactions which will
apply the changes accumulated in the spool-file at prepare time.

Author: Peter Smith and Ajin Cherian
Reviewed-by: Vignesh C, Amit Kapila, Greg Nancarrow
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru
Discussion: https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
2021-08-04 07:47:06 +05:30
Tom Lane 6424337073 Add assorted new regexp_xxx SQL functions.
This patch adds new functions regexp_count(), regexp_instr(),
regexp_like(), and regexp_substr(), and extends regexp_replace()
with some new optional arguments.  All these functions follow
the definitions used in Oracle, although there are small differences
in the regexp language due to using our own regexp engine -- most
notably, that the default newline-matching behavior is different.
Similar functions appear in DB2 and elsewhere, too.  Aside from
easing portability, these functions are easier to use for certain
tasks than our existing regexp_match[es] functions.

Gilles Darold, heavily revised by me

Discussion: https://postgr.es/m/fc160ee0-c843-b024-29bb-97b5da61971f@darold.net
2021-08-03 13:08:49 -04:00
David Rowley db632fbca3 Allow ordered partition scans in more cases
959d00e9d added the ability to make use of an Append node instead of a
MergeAppend when we wanted to perform a scan of a partitioned table and
the required sort order was the same as the partitioned keys and the
partitioned table was defined in such a way that earlier partitions were
guaranteed to only contain lower-order values than later partitions.
However, previously we didn't allow these ordered partition scans for
LIST partitioned table when there were any partitions that allowed
multiple Datums.  This was a very cheap check to make and we could likely
have done a little better by checking if there were interleaved
partitions, but at the time we didn't have visibility about which
partitions were pruned, so we still may have disallowed cases where all
interleaved partitions were pruned.

Since 475dbd0b7, we now have knowledge of pruned partitions, we can do a
much better job inside partitions_are_ordered().

Here we pass which partitions survived partition pruning into
partitions_are_ordered() and, for LIST partitioning, have it check to see
if any live partitions exist that are also in the new "interleaved_parts"
field defined in PartitionBoundInfo.

For RANGE partitioning we can relax the code which caused the partitions
to be unordered if a DEFAULT partition existed.  Since we now know which
partitions were pruned, partitions_are_ordered() now returns true when the
DEFAULT partition was pruned.

Reviewed-by: Amit Langote, Zhihong Yu
Discussion: https://postgr.es/m/CAApHDvrdoN_sXU52i=QDXe2k3WAo=EVry29r2+Tq2WYcn2xhEA@mail.gmail.com
2021-08-03 12:25:52 +12:00
David Rowley 475dbd0b71 Track a Bitmapset of non-pruned partitions in RelOptInfo
For partitioned tables with large numbers of partitions where queries are
able to prune all but a very small number of partitions, the time spent in
the planner looping over RelOptInfo.part_rels checking for non-NULL
RelOptInfos could become a large portion of the overall planning time.

Here we add a Bitmapset that records the non-pruned partitions.  This
allows us to more efficiently skip the pruned partitions by looping over
the Bitmapset.

This will cause a very slight slow down in cases where no or not many
partitions could be pruned, however, those cases are already slow to plan
anyway and the overhead of looping over the Bitmapset would be
unmeasurable when compared with the other tasks such as path creation for
a large number of partitions.

Reviewed-by: Amit Langote, Zhihong Yu
Discussion: https://postgr.es/m/CAApHDvqnPx6JnUuPwaf5ao38zczrAb9mxt9gj4U1EKFfd4AqLA@mail.gmail.com
2021-08-03 11:47:24 +12:00
Thomas Munro 7ff23c6d27 Run checkpointer and bgwriter in crash recovery.
Start up the checkpointer and bgwriter during crash recovery (except in
--single mode), as we do for replication.  This wasn't done back in
commit cdd46c76 out of caution.  Now it seems like a better idea to make
the environment as similar as possible in both cases.  There may also be
some performance advantages.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ8NRsqgkZEnsnRc2MFROBV-jCnacbYvtpptK2A9YYp9Q%40mail.gmail.com
2021-08-02 17:32:44 +12:00
Heikki Linnakangas 317632f307 Move InRecovery and standbyState global vars to xlogutils.c.
They are used in code that runs both during normal operation and during
WAL replay, and needs to behave differently during replay. Move them to
xlogutils.c, because that's where we have other helper functions used by
redo routines.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi
2021-07-31 09:50:26 +03:00
Michael Paquier b0483263dd Add support for SET ACCESS METHOD in ALTER TABLE
The logic used to support a change of access method for a table is
similar to changes for tablespace or relation persistence, requiring a
table rewrite with an exclusive lock of the relation changed.  Table
rewrites done in ALTER TABLE already go through the table AM layer when
scanning tuples from the old relation and inserting them into the new
one, making this implementation straight-forward.

Note that partitioned tables are not supported as these have no access
methods defined.

Author: Justin Pryzby, Jeff Davis
Reviewed-by: Michael Paquier, Vignesh C
Discussion: https://postgr.es/m/20210228222530.GD20769@telsasoft.com
2021-07-28 10:10:44 +09:00
Dean Rasheed 085f931f52 Allow numeric scale to be negative or greater than precision.
Formerly, when specifying NUMERIC(precision, scale), the scale had to
be in the range [0, precision], which was per SQL spec. This commit
extends the range of allowed scales to [-1000, 1000], independent of
the precision (whose valid range remains [1, 1000]).

A negative scale implies rounding before the decimal point. For
example, a column might be declared with a scale of -3 to round values
to the nearest thousand. Note that the display scale remains
non-negative, so in this case the display scale will be zero, and all
digits before the decimal point will be displayed.

A scale greater than the precision supports fractional values with
zeros immediately after the decimal point.

Take the opportunity to tidy up the code that packs, unpacks and
validates the contents of a typmod integer, encapsulating it in a
small set of new inline functions.

Bump the catversion because the allowed contents of atttypmod have
changed for numeric columns. This isn't a change that requires a
re-initdb, but negative scale values in the typmod would confuse old
backends.

Dean Rasheed, with additional improvements by Tom Lane. Reviewed by
Tom Lane.

Discussion: https://postgr.es/m/CAEZATCWdNLgpKihmURF8nfofP0RFtAKJ7ktY6GcZOPnMfUoRqA@mail.gmail.com
2021-07-26 14:13:47 +01:00
Tom Lane 28d936031a Get rid of artificial restriction on hash table sizes on Windows.
The point of introducing the hash_mem_multiplier GUC was to let users
reproduce the old behavior of hash aggregation, i.e. that it could use
more than work_mem at need.  However, the implementation failed to get
the job done on Win64, where work_mem is clamped to 2GB to protect
various places that calculate memory sizes using "long int".  As
written, the same clamp was applied to hash_mem.  This resulted in
severe performance regressions for queries requiring a bit more than
2GB for hash aggregation, as they now spill to disk and there's no
way to stop that.

Getting rid of the work_mem restriction seems like a good idea, but
it's a big job and could not conceivably be back-patched.  However,
there's only a fairly small number of places that are concerned with
the hash_mem value, and it turns out to be possible to remove the
restriction there without too much code churn or any ABI breaks.
So, let's do that for now to fix the regression, and leave the
larger task for another day.

This patch does introduce a bit more infrastructure that should help
with the larger task, namely pg_bitutils.h support for working with
size_t values.

Per gripe from Laurent Hasson.  Back-patch to v13 where the
behavior change came in.

Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us
Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
2021-07-25 14:02:27 -04:00
Tom Lane 678f5448c2 Fix failure of some headers to compile "standalone".
Recently-added references to ParseState weren't covered by #include
references, creating unwanted ordering dependencies for users of
these headers.

Oversight in commit 2bfb50b3d.  Per headerscheck/cpluspluscheck.
2021-07-24 11:34:33 -04:00
Michael Paquier 6f164e6d17 Unify parsing logic for command-line integer options
Most of the integer options for command-line binaries now make use of a
single routine able to do the job, fixing issues with the detection of
sloppy values caused for example by the use of atoi(), that fails on
strings beginning with numerical characters with junk trailing
characters.

This commit cuts down the number of strings requiring translation by 26
per my count, switching the code to have two error types for invalid and
out-of-range values instead.

Much more could be done here, with float or even int64 options, but
int32 was the most appealing case as it is possible to rely on strtol()
to do the job reliably.  Note that there are some exceptions for now,
like pg_ctl or pg_upgrade that use their own logging logic.  A couple of
negative TAP tests required some adjustments for the new errors
generated.

pg_dump and pg_restore tracked the maximum number of parallel jobs
within the option parsing.  The code is refactored a bit to track that
in the code dedicated to parallelism instead.

Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: David Rowley, Álvaro Herrera
Discussion: https://postgr.es/m/CALj2ACXqdG9WhqVoJ9zYf-iZt7sgK7Szv5USs=he6NnWQ2ofTA@mail.gmail.com
2021-07-24 18:35:03 +09:00
David Rowley 91e9e89dcc Make nodeSort.c use Datum sorts for single column sorts
Datum sorts can be significantly faster than tuple sorts, especially when
the data type being sorted is a pass-by-value type.  Something in the
region of 50-70% performance improvements appear to be possible.

Just in case there's any confusion; the Datum sort is only used when the
targetlist of the Sort node contains a single column, not when there's a
single column in the sort key and multiple items in the target list.

Author: Ronan Dunklau
Reviewed-by: James Coleman, David Rowley, Ranier Vilela, Hou Zhijie
Tested-by: John Naylor
Discussion: https://postgr.es/m/3177670.itZtoPt7T5@aivenronan
2021-07-22 14:03:19 +12:00
Peter Eisentraut 983bdc4fac Add missing enum tags in enums used in nodes
Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-07-21 11:03:25 +02:00
Peter Eisentraut d9a38c52ce Rename NodeTag of ExprState
Rename from tag to type, for consistency with all other node structs.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-07-21 08:48:33 +02:00
Thomas Munro 2dbe890571 Support direct I/O on macOS.
Macs don't understand O_DIRECT, but they can disable caching with a
separate fcntl() call.  Extend the file opening functions in fd.c to
handle this for us if the caller passes in PG_O_DIRECT.

For now, this affects only WAL data and even then only if you set:

  max_wal_senders=0
  wal_level=minimal

This is not expected to be very useful on its own, but later proposed
patches will make greater use of direct I/O, and it'll be useful for
testing if developers on Macs can see the effects.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BADiyyHe0cun2wfT%2BSVnFVqNYPxoO6J9zcZkVO7%2BNGig%40mail.gmail.com
2021-07-19 11:01:01 +12:00
Alexander Korotkov f157db8622 Forgotten catversion bump for 9e3c217bd9 2021-07-18 21:08:44 +03:00