Commit Graph

21940 Commits

Author SHA1 Message Date
Tomas Vondra 73b96bad4a Fix alignment in BRIN minmax-multi deserialization
The deserialization failed to ensure correct alignment, as it assumed it
can simply point into the serialized value. The serialization however
ignores alignment and copies just the significant bytes in order to make
the result as small as possible. This caused failures on systems that
are sensitive to mialigned addresses, like sparc, or with address
sanitizer enabled.

Fixed by copying the serialized data to ensure proper alignment. While
at it, fix an issue with serialization on big endian machines, using the
same store_att_byval/fetch_att trick as extended statistics.

Discussion: https://postgr.es/0c8c3304-d3dd-5e29-d5ac-b50589a23c8c%40enterprisedb.com
2021-03-26 16:48:36 +01:00
Tomas Vondra ab596105b5 BRIN minmax-multi indexes
Adds BRIN opclasses similar to the existing minmax, except that instead
of summarizing the page range into a single [min,max] range, the summary
consists of multiple ranges and/or points, allowing gaps. This allows
more efficient handling of data with poor correlation to physical
location within the table and/or outlier values, for which the regular
minmax opclassed tend to work poorly.

It's possible to specify the number of values kept for each page range,
either as a single point or an interval boundary.

  CREATE TABLE t (a int);
  CREATE INDEX ON t
   USING brin (a int4_minmax_multi_ops(values_per_range=16));

When building the summary, the values are combined into intervals with
the goal to minimize the "covering" (sum of interval lengths), using a
support procedure computing distance between two values.

Bump catversion, due to various catalog changes.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Sokolov Yura <y.sokolov@postgrespro.ru>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
Discussion: https://postgr.es/m/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com
2021-03-26 13:54:30 +01:00
Tomas Vondra 77b88cd1bb BRIN bloom indexes
Adds a BRIN opclass using a Bloom filter to summarize the range. Indexes
using the new opclasses allow only equality queries (similar to hash
indexes), but that works fine for data like UUID, MAC addresses etc. for
which range queries are not very common. This also means the indexes
work for data that is not well correlated to physical location within
the table, or perhaps even entirely random (which is a common issue with
existing BRIN minmax opclasses).

It's possible to specify opclass parameters with the usual Bloom filter
parameters, i.e. the desired false-positive rate and the expected number
of distinct values per page range.

  CREATE TABLE t (a int);
  CREATE INDEX ON t
   USING brin (a int4_bloom_ops(false_positive_rate = 0.05,
                                n_distinct_per_range = 100));

The opclasses do not operate on the indexed values directly, but compute
a 32-bit hash first, and the Bloom filter is built on the hash value.
Collisions should not be a huge issue though, as the number of distinct
values in a page ranges is usually fairly small.

Bump catversion, due to various catalog changes.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Sokolov Yura <y.sokolov@postgrespro.ru>
Reviewed-by: Nico Williams <nico@cryptonector.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
Discussion: https://postgr.es/m/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com
2021-03-26 13:35:32 +01:00
Tomas Vondra a681e3c107 Support the old signature of BRIN consistent function
Commit a1c649d889 changed the signature of the BRIN consistent function
by adding a new required parameter.  Treating the parameter as optional,
which would make the change backwards incompatibile, was rejected with
the justification that there are few out-of-core extensions, so it's not
worth adding making the code more complex, and it's better to deal with
that in the extension.

But after further thought, that would be rather problematic, because
pg_upgrade simply dumps catalog contents and the same version of an
extension needs to work on both PostgreSQL versions. Supporting both
variants of the consistent function (with 3 or 4 arguments) makes that
possible.

The signature is not the only thing that changed, as commit 72ccf55cb9
moved handling of IS [NOT] NULL keys from the support procedures. But
this change is backward compatible - handling the keys in exension is
unnecessary, but harmless. The consistent function will do a bit of
unnecessary work, but it should be very cheap.

This also undoes most of the changes to the existing opclasses (minmax
and inclusion), making them use the old signature again. This should
make backpatching simpler.

Catversion bump, because of changes in pg_amproc.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
2021-03-26 13:17:58 +01:00
Robert Haas 5db1fd7823 Fix interaction of TOAST compression with expression indexes.
Before, trying to compress a value for insertion into an expression
index would crash.

Dilip Kumar, with some editing by me. Report by Jaime Casanova.

Discussion: http://postgr.es/m/CAJKUy5gcs0zGOp6JXU2mMVdthYhuQpFk=S3V8DOKT=LZC1L36Q@mail.gmail.com
2021-03-25 19:55:32 -04:00
Alvaro Herrera 71f4c8c6f7
ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY
Allow a partition be detached from its partitioned table without
blocking concurrent queries, by running in two transactions and only
requiring ShareUpdateExclusive in the partitioned table.

Because it runs in two transactions, it cannot be used in a transaction
block.  This is the main reason to use dedicated syntax: so that users
can choose to use the original mode if they need it.  But also, it
doesn't work when a default partition exists (because an exclusive lock
would still need to be obtained on it, in order to change its partition
constraint.)

In case the second transaction is cancelled or a crash occurs, there's
ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final
steps.

The main trick to make this work is the addition of column
pg_inherits.inhdetachpending, initially false; can only be set true in
the first part of this command.  Once that is committed, concurrent
transactions that use a PartitionDirectory will include or ignore
partitions so marked: in optimizer they are ignored if the row is marked
committed for the snapshot; in executor they are always included.  As a
result, and because of the way PartitionDirectory caches partition
descriptors, queries that were planned before the detach will see the
rows in the detached partition and queries that are planned after the
detach, won't.

A CHECK constraint is created that duplicates the partition constraint.
This is probably not strictly necessary, and some users will prefer to
remove it afterwards, but if the partition is re-attached to a
partitioned table, the constraint needn't be rechecked.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-03-25 18:00:28 -03:00
Alvaro Herrera cc121d5596
Add comments for AlteredTableInfo->rel
The prior commit which introduced it was pretty squalid in terms of
code documentation, so add some comments.
2021-03-25 16:07:15 -03:00
Alvaro Herrera cd03c6e94b
Let ALTER TABLE Phase 2 routines manage the relation pointer
Struct AlteredRelationInfo gains a new Relation member, to be used only
by Phase 2 (ATRewriteCatalogs); this allows ATExecCmd() subroutines open
and close the relation internally.

A future commit will use this facility to implement an ALTER TABLE
subcommand that closes and reopens the relation across transaction
boundaries.

(It is possible to keep the relation open past phase 2 to be used by
phase 3 instead of having to reopen it that point, but there are some
minor complications with that; it's not clear that there is much to be
won from doing that, though.)

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-03-25 15:56:11 -03:00
Alvaro Herrera a24ae3d7b9
Remove StoreSingleInheritance reimplementation
I introduced this duplicate code in commit 8b08f7d482 for no good
reason.  Remove it, and backpatch to 11 where it was introduced.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
2021-03-25 10:47:38 -03:00
Peter Eisentraut f2c7ce64ae Trim some extra whitespace in parser file 2021-03-25 10:17:52 +01:00
Peter Eisentraut 91d1f2d302 Rename a parse node to be more general
A WHERE clause will be used for row filtering in logical replication.
We already have a similar node: 'WHERE (condition here)'.  Let's
rename the node to a generic name and use it for row filtering too.

Author: Euler Taveira <euler.taveira@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X+mK_DitLXF25B=jE6xyNCH4YOwM860JR7HarGQ@mail.gmail.com
2021-03-25 10:06:32 +01:00
Michael Paquier a1999a01bb Sanitize the term "combo CID" in code comments
Combo CIDs were referred in the code comments using different terms
across various places of the code, so unify a bit the term used with
what is currently in use in some of the READMEs.

Author: "Hou, Zhijie"
Discussion: https://postgr.es/m/1d42865c91404f46af4562532fdbea31@G08CNEXMBPEKD05.g08.fujitsu.local
2021-03-25 16:08:03 +09:00
Fujii Masao 438fc4a39c Fix bug in WAL replay of COMMIT_TS_SETTS record.
Previously the WAL replay of COMMIT_TS_SETTS record called
TransactionTreeSetCommitTsData() with the argument write_xlog=true,
which generated and wrote new COMMIT_TS_SETTS record.
This should not be acceptable because it's during recovery.

This commit fixes the WAL replay of COMMIT_TS_SETTS record
so that it calls TransactionTreeSetCommitTsData() with write_xlog=false
and doesn't generate new WAL during recovery.

Back-patch to all supported branches.

Reported-by: lx zou <zoulx1982@163.com>
Author: Fujii Masao
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/16931-620d0f2fdc6108f1@postgresql.org
2021-03-25 11:23:30 +09:00
Fujii Masao df9384492b Improve connection denied error message during recovery.
Previously when an archive recovery or a standby was starting and
reached the consistent recovery state but hot_standby was configured
to off, the error message when a client connectted was "the database
system is starting up", which was needless confusing and not really
all that accurate either.

This commit improves the connection denied error message during
recovery, as follows, so that the users immediately know that their
servers are configured to deny those connections.

- If hot_standby is disabled, the error message "the database system
  is not accepting connections" and the detail message "Hot standby
  mode is disabled." are output when clients connect while an archive
  recovery or a standby is running.

- If hot_standby is enabled, the error message "the database system
  is not yet accepting connections" and the detail message
  "Consistent recovery state has not been yet reached." are output
  when clients connect until the consistent recovery state is reached
  and postmaster starts accepting read only connections.

This commit doesn't change the connection denied error message of
"the database system is starting up" during normal server startup and
crash recovery. Because it's still suitable for those situations.

Author: James Coleman
Reviewed-by: Alvaro Herrera, Andres Freund, David Zhang, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/CAAaqYe8h5ES_B=F_zDT+Nj9XU7YEwNhKhHA2RE4CFhAQ93hfig@mail.gmail.com
2021-03-25 10:41:28 +09:00
Peter Eisentraut 37c99d304d Fix stray double semicolons
Reported-by: John Naylor <john.naylor@enterprisedb.com>
2021-03-24 20:42:51 +01:00
Stephen Frost bbcc4eb2e0 Change checkpoint_completion_target default to 0.9
Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele,
Nathan Bossart
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
2021-03-24 13:07:51 -04:00
Robert Haas e5595de03e Tidy up more loose ends related to configurable TOAST compression.
Change the default_toast_compression GUC to be an enum rather than
a string. Earlier, uncommitted versions of the patch supported using
CREATE ACCESS METHOD to add new compression methods to a running
system, but that idea was dropped before commit. So, we can simplify
the GUC handling as well, which has the nice side effect of improving
the error messages.

While updating the documentation to reflect the new GUC type, also
move it back to the right place in the list. I moved this while
revising what became commit 24f0e395ac,
but apparently the intended ordering is "alphabetical" rather than
"whatever Robert thinks looks nice."

Rejigger things to avoid having access/toast_compression.h depend on
utils/guc.h, so that we don't end up with every file that includes
it also depending on something largely unrelated. Move a few
inline functions back into the C source file partly to help reduce
dependencies and partly just to avoid clutter. A few very minor
cosmetic fixes.

Original patch by Justin Pryzby, but very heavily edited by me,
and reverse reviewed by him and also reviewed by by Tom Lane.

Discussion: http://postgr.es/m/CA+TgmoYp=GT_ztUCeZg2i4hkHAQv8o=-nVJ1-TKWTG1zQOmOpg@mail.gmail.com
2021-03-24 12:36:08 -04:00
Peter Eisentraut 49ab61f0bd Add date_bin function
Similar to date_trunc, but allows binning by an arbitrary interval
rather than just full units.

Author: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: David Fetter <david@fetter.org>
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Artur Zakirov <zaartur@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACPNZCt4buQFRgy6DyjuZS-2aPDpccRkrJBmgUfwYc1KiaXYxg@mail.gmail.com
2021-03-24 16:18:24 +01:00
Peter Eisentraut 1509c6fc29 Improve an error message
Make it the same as another nearby message.
2021-03-24 08:02:06 +01:00
Amit Kapila 26acb54a13 Revert "Enable parallel SELECT for "INSERT INTO ... SELECT ..."."
To allow inserts in parallel-mode this feature has to ensure that all the
constraints, triggers, etc. are parallel-safe for the partition hierarchy
which is costly and we need to find a better way to do that. Additionally,
we could have used existing cached information in some cases like indexes,
domains, etc. to determine the parallel-safety.

List of commits reverted, in reverse chronological order:

ed62d3737c Doc: Update description for parallel insert reloption.
c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".

Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org
2021-03-24 11:29:15 +05:30
Fujii Masao 84007043fc Rename wait event WalrcvExit to WalReceiverExit.
Commit de829ddf23 added wait event WalrcvExit. But its name is not
consistent with other wait events like WalReceiverMain or
WalReceiverWaitStart, etc. So this commit renames WalrcvExit to
WalReceiverExit.

Author: Fujii Masao
Reviewed-by: Thomas Munro
Discussion: https://postgr.es/m/cced9995-8fa2-7b22-9d91-3f22a2b8c23c@oss.nttdata.com
2021-03-24 10:37:54 +09:00
Fujii Masao 7fbcee1b2d Log when GetNewOidWithIndex() fails to find unused OID many times.
GetNewOidWithIndex() generates a new OID one by one until it finds
one not in the relation. If there are very long runs of consecutive
existing OIDs, GetNewOidWithIndex() needs to iterate many times
in the loop to find unused OID. Since TOAST table can have a large
number of entries and there can be such long runs of OIDs, there is
the case where it takes so many iterations to find new OID not in
TOAST table. Furthermore if all (i.e., 2^32) OIDs are already used,
GetNewOidWithIndex() enters something like busy loop and repeats
the iterations until at least one OID is marked as unused.

There are some reported troubles caused by a large number of
iterations in GetNewOidWithIndex(). For example, when inserting
a billion of records into the table, all the backends doing that
insertion operation got hang with 100% CPU usage at some point.

Previously there was no easy way to detect that GetNewOidWithIndex()
failed to find unused OID many times. So, for example, gdb full
backtrace of hanged backends needed to be taken, in order to
investigate that trouble. This is inconvenient and may not be
available in some production environments.

To provide easy way for that, this commit makes GetNewOidWithIndex()
log that it iterates more than GETNEWOID_LOG_THRESHOLD but have
not yet found OID unused in the relation. Also this commit makes
it repeat logging with exponentially increasing intervals until
it iterates more than GETNEWOID_LOG_MAX_INTERVAL, and makes it
finally repeat logging every GETNEWOID_LOG_MAX_INTERVAL unless
an unused OID is found. Those macro variables are used not to
fill up the server log with the similar messages.

In the discusion at pgsql-hackers, there was another idea to report
the lots of iterations in GetNewOidWithIndex() via wait event.
But since GetNewOidWithIndex() traverses indexes to find unused
OID and which will do I/O, acquire locks, etc, which will overwrite
the wait event and reset it to nothing once done. So that idea
doesn't work well, and we didn't adopt it.

Author: Tomohiro Hiramitsu
Reviewed-by: Tatsuhito Kasahara, Kyotaro Horiguchi, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/16722-93043fb459a41073@postgresql.org
2021-03-24 10:36:56 +09:00
Michael Paquier 99dd75fb99 Reword slightly logs generated for index stats in autovacuum
Using "remain" is confusing, as it implies that the index file can
shrink.  Instead, use "in total".

Per discussion with Peter Geoghegan.

Discussion: https://postgr.es/m/CAH2-WzkYgHZzpGOwR14CScJsjaQpvJrEkEfkh_=wGhzLb=yVdQ@mail.gmail.com
2021-03-24 09:36:03 +09:00
Tomas Vondra 79f6a942bd Allow composite types in catalog bootstrap
When resolving types during catalog bootstrap, try to reload the pg_type
contents if a type is not found. That allows catalogs to contain
composite types, e.g. row types for other catalogs.

Author: Justin Pryzby
Reviewed-by: Dean Rasheed, Tomas Vondra
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-24 00:47:52 +01:00
Tomas Vondra e1a5e65703 Convert Typ from array to list in bootstrap
It's a bit easier and more convenient to free and reload a List,
compared to a plain array. This will be helpful when allowing catalogs
to contain composite types.

Author: Justin Pryzby
Reviewed-by: Dean Rasheed, Tomas Vondra
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-24 00:47:40 +01:00
Peter Geoghegan 5b861baa55 nbtree VACUUM: Cope with buggy opclasses.
Teach nbtree VACUUM to press on with vacuuming in the event of a page
deletion attempt that fails to "re-find" a downlink for its child/target
page.

There is no good reason to treat this as an irrecoverable error.  But
there is a good reason not to: pressing on at this point removes any
question of VACUUM not making progress solely due to misbehavior from
user-defined operator class code.

Discussion: https://postgr.es/m/CAH2-Wzma5G9CTtMjbrXTwOym+U=aWg-R7=-htySuztgoJLvZXg@mail.gmail.com
2021-03-23 16:09:51 -07:00
Tom Lane 9d523119fd Avoid possible crash while finishing up a heap rewrite.
end_heap_rewrite was not careful to ensure that the target relation
is open at the smgr level before performing its final smgrimmedsync.
In ordinary cases this is no problem, because it would have been
opened earlier during the rewrite.  However a crash can be reproduced
by re-clustering an empty table with CLOBBER_CACHE_ALWAYS enabled.

Although that exact scenario does not crash in v13, I think that's
a chance result of unrelated planner changes, and the problem is
likely still reachable with other test cases.  The true proximate
cause of this failure is commit c6b92041d, which replaced a call to
heap_sync (which was careful about opening smgr) with a direct call
to smgrimmedsync.  Hence, back-patch to v13.

Amul Sul, per report from Neha Sharma; cosmetic changes
and test case by me.

Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com
2021-03-23 11:24:16 -04:00
Peter Eisentraut a6715af1e7 Add bit_count SQL function
This function for bit and bytea counts the set bits in the bit or byte
string.  Internally, we use the existing popcount functionality.

For the name, after some discussion, we settled on bit_count, which
also exists with this meaning in MySQL, Java, and Python.

Author: David Fetter <david@fetter.org>
Discussion: https://www.postgresql.org/message-id/flat/20201230105535.GJ13234@fetter.org
2021-03-23 10:13:58 +01:00
Michael Paquier 5aed6a1fc2 Add per-index stats information in verbose logs of autovacuum
Once a relation's autovacuum is completed, the logs include more
information about this relation state if the threshold of
log_autovacuum_min_duration (or its relation option) is reached, with
for example contents about the statistics of the VACUUM operation for
the relation, WAL and system usage.

This commit adds more information about the statistics of the relation's
indexes, with one line of logs generated for each index.  The index
stats were already calculated, but not printed in the context of
autovacuum yet.  While on it, some refactoring is done to keep track of
the index statistics directly within LVRelStats, simplifying some
routines related to parallel VACUUMs.

Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Euler Taveira
Discussion: https://postgr.es/m/CAD21AoAy6SxHiTivh5yAPJSUE4S=QRPpSZUdafOSz0R+fRcM6Q@mail.gmail.com
2021-03-23 13:25:14 +09:00
Amit Kapila 4b82ed6eca Fix dangling pointer reference in stream_cleanup_files.
We can't access the entry after it is removed from dynahash.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Ps-pL++f6CJwPx2+vUqXuew=Xt-9Bi-6kCyxn+Fwi2M7w@mail.gmail.com
2021-03-23 09:43:33 +05:30
Tomas Vondra a5f002ad9a Use correct spelling of statistics kind
A couple error messages and comments used 'statistic kind', not the
correct 'statistics kind'. Fix and backpatch all the way back to 10,
where extended statistics were introduced.

Backpatch-through: 10
2021-03-23 05:01:35 +01:00
Fujii Masao 1e3e8b51bd Change the type of WalReceiverWaitStart wait event from Client to IPC.
Previously the type of this wait event was Client. But while this
wait event is being reported, walreceiver process is waiting for
the startup process to set initial data for streaming replication.
It's not waiting for any activity on a socket connected to a user
application or walsender. So this commit changes the type for
WalReceiverWaitStart wait event to IPC.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/cdacc27c-37ff-f1a4-20e2-ce19933abfcc@oss.nttdata.com
2021-03-23 10:09:42 +09:00
Bruce Momjian 95d77149c5 Add macro RelationIsPermanent() to report relation permanence
Previously, to check relation permanence, the Relation's Form_pg_class
structure member relpersistence was compared to the value
RELPERSISTENCE_PERMANENT ("p"). This commit adds the macro
RelationIsPermanent() and is used in appropirate places to simplify the
code.  This matches other RelationIs* macros.

This macro will be used in more places in future cluster file encryption
patches.

Discussion: https://postgr.es/m/20210318153134.GH20766@tamriel.snowman.net
2021-03-22 20:23:52 -04:00
Tomas Vondra 8e4b332e88 Optimize allocations in bringetbitmap
The bringetbitmap function allocates memory for various purposes, which
may be quite expensive, depending on the number of scan keys. Instead of
allocating them separately, allocate one bit chunk of memory an carve it
into smaller pieces as needed - all the pieces have the same lifespan,
and it saves quite a bit of CPU and memory overhead.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
2021-03-23 00:47:09 +01:00
Tomas Vondra 72ccf55cb9 Move IS [NOT] NULL handling from BRIN support functions
The handling of IS [NOT] NULL clauses is independent of an opclass, and
most of the code was exactly the same in both minmax and inclusion. So
instead move the code from support procedures to the AM.

This simplifies the code - especially the support procedures - quite a
bit, as they don't need to care about NULL values and flags at all. It
also means the IS [NOT] NULL clauses can be evaluated without invoking
the support procedure.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
2021-03-23 00:45:42 +01:00
Tomas Vondra a1c649d889 Pass all scan keys to BRIN consistent function at once
This commit changes how we pass scan keys to BRIN consistent function.
Instead of passing them one by one, we now pass all scan keys for a
given attribute at once. That makes the consistent function a bit more
complex, as it has to loop through the keys, but it does allow more
elaborate opclasses that can use multiple keys to eliminate ranges much
more effectively.

The existing BRIN opclasses (minmax, inclusion) don't really benefit
from this change. The primary purpose is to allow future opclases to
benefit from seeing all keys at once.

This does change the BRIN API, because the signature of the consistent
function changes (a new parameter with number of scan keys). So this
breaks existing opclasses, and will require supporting two variants of
the code for different PostgreSQL versions. We've considered supporting
two variants of the consistent, but we've decided not to do that.
Firstly, there's another patch that moves handling of NULL values from
the opclass, which means the opclasses need to be updated anyway.
Secondly, we're not aware of any out-of-core BRIN opclasses, so it does
not seem worth the extra complexity.

Bump catversion, because of pg_proc changes.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
2021-03-23 00:45:03 +01:00
Tomas Vondra bfa2cee784 Move bsearch_arg to src/port
Until now the bsearch_arg function was used only in extended statistics
code, so it was defined in that code.  But we already have qsort_arg in
src/port, so let's move it next to it.
2021-03-23 00:11:22 +01:00
Tom Lane 063dd37ebc Short-circuit slice requests that are for more than the object's size.
substring(), and perhaps other callers, isn't careful to pass a
slice length that is no more than the datum's true size.  Since
toast_decompress_datum_slice's children will palloc the requested
slice length, this can waste memory.  Also, close study of the liblz4
documentation suggests that it is dependent on the caller to not ask
for more than the correct amount of decompressed data; this squares
with observed misbehavior with liblz4 1.8.3.  Avoid these problems
by switching to the normal full-decompression code path if the
slice request is >= datum's decompressed size.

Tom Lane and Dilip Kumar

Discussion: https://postgr.es/m/507597.1616370729@sss.pgh.pa.us
2021-03-22 14:01:20 -04:00
Tom Lane aeb1631ed2 Mostly-cosmetic adjustments of TOAST-related macros.
The authors of bbe0a81db hadn't quite got the idea that macros named
like SOMETHING_4B_C were only meant for internal endianness-related
details in postgres.h.  Choose more legible names for macros that are
intended to be used elsewhere.  Rearrange postgres.h a bit to clarify
the separation between those internal macros and ones intended for
wider use.

Also, avoid using the term "rawsize" for true decompressed size;
we've used "extsize" for that, because "rawsize" generally denotes
total Datum size including header.  This choice seemed particularly
unfortunate in tests that were comparing one of these meanings to
the other.

This patch includes a couple of not-purely-cosmetic changes: be
sure that the shifts aligning compression methods are unsigned
(not critical today, but will be when compression method 2 exists),
and fix broken definition of VARATT_EXTERNAL_GET_COMPRESSION (now
VARATT_EXTERNAL_GET_COMPRESS_METHOD), whose callers worked only
accidentally.

Discussion: https://postgr.es/m/574197.1616428079@sss.pgh.pa.us
2021-03-22 13:43:10 -04:00
Robert Haas a4d5284a10 Error on invalid TOAST compression in CREATE or ALTER TABLE.
The previous coding treated an invalid compression method name as
equivalent to the default, which is certainly not right.

Justin Pryzby

Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com
2021-03-22 10:57:08 -04:00
Robert Haas 226e2be387 More code cleanup for configurable TOAST compression.
Remove unused macro. Fix confusion about whether a TOAST compression
method is identified by an OID or a char.

Justin Pryzby

Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com
2021-03-22 09:21:37 -04:00
Michael Paquier 909b449e00 Fix concurrency issues with WAL segment recycling on Windows
This commit is mostly a revert of aaa3aed, that switched the routine
doing the internal renaming of recycled WAL segments to use on Windows a
combination of CreateHardLinkA() plus unlink() instead of rename().  As
reported by several users of Postgres 13, this is causing concurrency
issues when manipulating WAL segments, mostly in the shape of the
following error:
LOG:  could not rename file "pg_wal/000000XX000000YY000000ZZ":
Permission denied

This moves back to a logic where a single rename() (well, pgrename() for
Windows) is used.  This issue has proved to be hard to hit when I tested
it, facing it only once with an archive_command that was not able to do
its work, so it is environment-sensitive.  The reporters of this issue
have been able to confirm that the situation improved once we switched
back to a single rename().  In order to check things, I have provided to
the reporters a patched build based on 13.2 with aaa3aed reverted, to
test if the error goes away, and an unpatched build of 13.2 to test if
the error still showed up (just to make sure that I did not mess up my
build process).

Extra thanks to Fujii Masao for pointing out what looked like the
culprit commit, and to all the reporters for taking the time to test
what I have sent them.

Reported-by: Andrus, Guy Burgess, Yaroslav Pashinsky, Thomas Trenz
Reviewed-by: Tom Lane, Andres Freund
Discussion: https://postgr.es/m/3861ff1e-0923-7838-e826-094cc9bef737@hot.ee
Discussion: https://postgr.es/m/16874-c3eecd319e36a2bf@postgresql.org
Discussion: https://postgr.es/m/095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz
Discussion: https://postgr.es/m/16927-67c570d968c99567%40postgresql.org
Discussion: https://postgr.es/m/YFBcRbnBiPdGZvfW@paquier.xyz
Backpatch-through: 13
2021-03-22 14:02:26 +09:00
Michael Paquier 595b9cba2a Fix timeline assignment in checkpoints with 2PC transactions
Any transactions found as still prepared by a checkpoint have their
state data read from the WAL records generated by PREPARE TRANSACTION
before being moved into their new location within pg_twophase/.  While
reading such records, the WAL reader uses the callback
read_local_xlog_page() to read a page, that is shared across various
parts of the system.  This callback, since 1148e22a, has introduced an
update of ThisTimeLineID when reading a record while in recovery, which
is potentially helpful in the context of cascading WAL senders.

This update of ThisTimeLineID interacts badly with the checkpointer if a
promotion happens while some 2PC data is read from its record, as, by
changing ThisTimeLineID, any follow-up WAL records would be written to
an timeline older than the promoted one.  This results in consistency
issues.  For instance, a subsequent server restart would cause a failure
in finding a valid checkpoint record, resulting in a PANIC, for
instance.

This commit changes the code reading the 2PC data to reset the timeline
once the 2PC record has been read, to prevent messing up with the static
state of the checkpointer.  It would be tempting to do the same thing
directly in read_local_xlog_page().  However, based on the discussion
that has led to 1148e22a, users may rely on the updates of
ThisTimeLineID when a WAL record page is read in recovery, so changing
this callback could break some cases that are working currently.

A TAP test reproducing the issue is added, relying on a PITR to
precisely trigger a promotion with a prepared transaction still
tracked.

Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao
and myself.

Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap
Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com
Backpatch-through: 10
2021-03-22 08:30:53 +09:00
Tom Lane ac897c4834 Fix assorted silliness in ATExecSetCompression().
It's not okay to scribble directly on a syscache entry.
Nor to continue accessing said entry after releasing it.

Also get rid of not-used local variables.

Per valgrind testing.
2021-03-21 18:43:07 -04:00
Peter Geoghegan 9dd963ae25 Recycle nbtree pages deleted during same VACUUM.
Maintain a simple array of metadata about pages that were deleted during
nbtree VACUUM's current btvacuumscan() call.  Use this metadata at the
end of btvacuumscan() to attempt to place newly deleted pages in the FSM
without further delay.  It might not yet be safe to place any of the
pages in the FSM by then (they may not be deemed recyclable), but we
have little to lose and plenty to gain by trying.  In practice there is
a very good chance that this will work out when vacuuming larger
indexes, where scanning the index naturally takes quite a while.

This commit doesn't change the page recycling invariants; it merely
improves the efficiency of page recycling within the confines of the
existing design.  Recycle safety is a part of nbtree's implementation of
what Lanin & Shasha call "the drain technique".  The design happens to
use transaction IDs (they're stored in deleted pages), but that in
itself doesn't align the cutoff for recycle safety to any of the
XID-based cutoffs used by VACUUM (e.g., OldestXmin).  All that matters
is whether or not _other_ backends might be able to observe various
inconsistencies in the tree structure (that they cannot just detect and
recover from by moving right).  Recycle safety is purely a question of
maintaining the consistency (or the apparent consistency) of a physical
data structure.

Note that running a simple serial test case involving a large range
DELETE followed by a VACUUM VERBOSE will probably show that any newly
deleted nbtree pages are not yet reusable/recyclable.  This is expected
in the absence of even one concurrent XID assignment.  It is an old
implementation restriction.  In practice it's unlikely to be the thing
that makes recycling remain unsafe, at least with larger indexes, where
recycling newly deleted pages during the same VACUUM actually matters.

An important high-level goal of this commit (as well as related recent
commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup
operations in index AMs rare in general.  If index vacuuming frequently
depends on the next VACUUM operation finishing off work that the current
operation started, then the general behavior of index vacuuming is hard
to predict.  This is relevant to ongoing work that adds a vacuumlazy.c
mechanism to skip index vacuuming in certain cases.  Anything that makes
the real world behavior of index vacuuming simpler and more linear will
also make top-down modeling in vacuumlazy.c more robust.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com
2021-03-21 15:25:39 -07:00
Tom Lane 9fb9691a88 Suppress various new compiler warnings.
Compilers that don't understand that elog(ERROR) doesn't return
issued warnings here.  In the cases in libpq_pipeline.c, we were
not exactly helping things by failing to mark pg_fatal() as noreturn.

Per buildfarm.
2021-03-21 11:50:43 -04:00
Peter Eisentraut 96ae658e62 Move lwlock-release probe back where it belongs
The documentation specifically states that lwlock-release fires before
any released waiters have been awakened.  It worked that way until
ab5194e6f6, where is seems to have been
misplaced accidentally.  Move it back where it belongs.

Author: Craig Ringer <craig.ringer@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com
2021-03-21 08:02:30 +01:00
Tomas Vondra 882b2cdc08 Use valid compression method in brin_form_tuple
When compressing the BRIN summary, we can't simply use the compression
method from the indexed attribute.  The summary may use a different data
type, e.g. fixed-length attribute may have varlena summary, leading to
compression failures.  For the built-in BRIN opclasses this happens to
work, because the summary uses the same data type as the attribute.

When the data types match, we can inherit use the compression method
specified for the attribute (it's copied into the index descriptor).
Otherwise we don't have much choice and have to use the default one.

Author: Tomas Vondra
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/e0367f27-392c-321a-7411-a58e1a7e4817%40enterprisedb.com
2021-03-21 00:28:34 +01:00
Tom Lane e835e89a0f Fix memory leak when rejecting bogus DH parameters.
While back-patching e0e569e1d, I noted that there were some other
places where we ought to be applying DH_free(); namely, where we
load some DH parameters from a file and then reject them as not
being sufficiently secure.  While it seems really unlikely that
anybody would hit these code paths in production, let alone do
so repeatedly, let's fix it for consistency.

Back-patch to v10 where this code was introduced.

Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org
2021-03-20 12:47:21 -04:00
Tom Lane f0c2a5bba6 Avoid leaking memory in RestoreGUCState(), and improve comments.
RestoreGUCState applied InitializeOneGUCOption to already-live
GUC entries, causing any malloc'd subsidiary data to be forgotten.
We do want the effect of resetting the GUC to its compiled-in
default, and InitializeOneGUCOption seems like the best way to do
that, so add code to free any existing subsidiary data beforehand.

The interaction between can_skip_gucvar, SerializeGUCState, and
RestoreGUCState is way more subtle than their opaque comments
would suggest to an unwary reader.  Rewrite and enlarge the
comments to try to make it clearer what's happening.

Remove a long-obsolete assertion in read_nondefault_variables: the
behavior of set_config_option hasn't depended on IsInitProcessingMode
since f5d9698a8 installed a better way of controlling it.

Although this is fixing a clear memory leak, the leak is quite unlikely
to involve any large amount of data, and it can only happen once in the
lifetime of a worker process.  So it seems unnecessary to take any
risk of back-patching.

Discussion: https://postgr.es/m/4105247.1616174862@sss.pgh.pa.us
2021-03-19 23:03:17 -04:00
Thomas Munro 61752afb26 Provide recovery_init_sync_method=syncfs.
Since commit 2ce439f3 we have opened every file in the data directory
and called fsync() at the start of crash recovery.  This can be very
slow if there are many files, leading to field complaints of systems
taking minutes or even hours to begin crash recovery.

Provide an alternative method, for Linux only, where we call syncfs() on
every possibly different filesystem under the data directory.  This is
equivalent, but avoids faulting in potentially many inodes from
potentially slow storage.

The new mode comes with some caveats, described in the documentation, so
the default value for the new setting is "fsync", preserving the older
behavior.

Reported-by: Michael Brown <michael.brown@discourse.org>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Paul Guo <guopa@vmware.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org
Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
2021-03-20 12:07:28 +13:00
Tomas Vondra b822ae13ea Use lfirst_int in cmp_list_len_contents_asc
The function added in be45be9c33 is comparing integer lists (IntList) by
length and contents, but there were two bugs.  Firstly, it used intVal()
to extract the value, but that's for Value nodes, not for extracting int
values from IntList.  Secondly, it called it directly on the ListCell,
without doing lfirst().  So just do lfirst_int() instead.

Interestingly enough, this did not cause any crashes on the buildfarm,
but valgrind rightfully complained about it.

Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
2021-03-20 00:04:25 +01:00
Robert Haas d00fbdc431 Fix use-after-ReleaseSysCache problem in ATExecAlterColumnType.
Introduced by commit bbe0a81db6.

Per buildfarm member prion.
2021-03-19 17:17:48 -04:00
Robert Haas bbe0a81db6 Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.

In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.

Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.

It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT.  It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data.  These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project.  However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.

Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored.  More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.

Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
2021-03-19 15:10:38 -04:00
Fujii Masao fd31214075 Fix comments in postmaster.c.
Commit 86c23a6eb2 changed the option to specify that postgres will
stop all other server processes by sending the signal SIGSTOP,
from -s to -T. But previously there were comments incorrectly
explaining that SIGSTOP behavior is set by -s option. This commit
fixes them.

Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20210316.165141.1400441966284654043.horikyota.ntt@gmail.com
2021-03-19 11:28:54 +09:00
Tom Lane 9bacdf9f53 Don't leak malloc'd error string in libpqrcv_check_conninfo().
We leaked the error report from PQconninfoParse, when there was
one.  It seems unlikely that real usage patterns would repeat
the failure often enough to create serious bloat, but let's
back-patch anyway to keep the code similar in all branches.

Found via valgrind testing.
Back-patch to v10 where this code was added.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 377b7a8300 Don't leak malloc'd strings when a GUC setting is rejected.
Because guc.c prefers to keep all its string values in malloc'd
not palloc'd storage, it has to be more careful than usual to
avoid leaks.  Error exits out of string GUC hook checks failed
to clear the proposed value string, and error exits out of
ProcessGUCArray() failed to clear the malloc'd results of
ParseLongOption().

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane d303849b05 Don't leak compiled regex(es) when an ispell cache entry is dropped.
The text search cache mechanisms assume that we can clean up
an invalidated dictionary cache entry simply by resetting the
associated long-lived memory context.  However, that does not work
for ispell affixes that make use of regular expressions, because
the regex library deals in plain old malloc.  Hence, we leaked
compiled regex(es) any time we dropped such a cache entry.  That
could quickly add up, since even a fairly trivial regex can use up
tens of kB, and a large one can eat megabytes.  Add a memory context
callback to ensure that a regex gets freed when its owning cache
entry is cleared.

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 415ffdc220 Don't run RelationInitTableAccessMethod in a long-lived context.
Some code paths in this function perform syscache lookups, which
can lead to table accesses and possibly leakage of cruft into
the caller's context.  If said context is CacheMemoryContext,
we eventually will have visible bloat.  But fixing this is no
harder than moving one memory context switch step.  (The other
callers don't have a problem.)

Andres Freund and I independently found this via valgrind testing.
Back-patch to v12 where this code was added.

Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 28644fac10 Don't leak rd_statlist when a relcache entry is dropped.
Although these lists are usually NIL, and even when not empty
are unlikely to be large, constant relcache update traffic could
eventually result in visible bloat of CacheMemoryContext.

Found via valgrind testing.
Back-patch to v10 where this field was added.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 1d581ce712 Fix misuse of foreach_delete_current().
Our coding convention requires this macro's result to be assigned
back to the original List variable.  In this usage, since the
List could not become empty, there was no actual bug --- but
some compilers warned about it.  Oversight in be45be9c3.

Discussion: https://postgr.es/m/35077b31-2d62-1e31-0e2e-ddb52d590b73@enterprisedb.com
2021-03-18 19:24:22 -04:00
Tomas Vondra be45be9c33 Implement GROUP BY DISTINCT
With grouping sets, it's possible that some of the grouping sets are
duplicate.  This is especially common with CUBE and ROLLUP clauses. For
example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to

  GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b, c),
    (a, b, c),
    (a, b),
    (a, b),
    (a, b),
    (a),
    (a),
    (a),
    (c, a),
    (c, a),
    (c, a),
    (c),
    (b, c),
    (b),
    ()
  )

Some of the grouping sets are calculated multiple times, which is mostly
unnecessary.  This commit implements a new GROUP BY DISTINCT feature, as
defined in the SQL standard, which eliminates the duplicate sets.

Author: Vik Fearing
Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
2021-03-18 18:22:18 +01:00
Tomas Vondra cd91de0d17 Remove temporary files after backend crash
After a crash of a backend using temporary files, the files used to be
left behind, on the basis that it might be useful for debugging. But we
don't have any reports of anyone actually doing that, and it means the
disk usage may grow over time due to repeated backend failures (possibly
even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing
it required either manual cleanup (deleting files, which is error-prone)
or restart of the instance (i.e. service disruption).

This implements automatic cleanup of temporary files, controled by a new
GUC remove_temp_files_after_crash. By default the files are removed, but
it can be disabled to restore the old behavior if needed.

Author: Euler Taveira
Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro
Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
2021-03-18 17:38:28 +01:00
Magnus Hagander da18d829c2 Fix function name in error hint
pg_read_file() is the function that's in core, pg_file_read() is in
adminpack. But when using pg_file_read() in adminpack it calls the *C*
level function pg_read_file() in core, which probably threw the original
author off. But the error hint should be about the SQL function.

Reported-By: Sergei Kornilov
Backpatch-through: 11
Discussion: https://postgr.es/m/373021616060475@mail.yandex.ru
2021-03-18 11:22:20 +01:00
Amit Kapila c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
Commit 05c8482f7f added the implementation of parallel SELECT for
"INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
the additional parallel-safety checks that it performs, even when, in the
end, those checks determine that parallelism can't be used. This is
normally only ever a problem in the case of when the target table has a
large number of partitions.

A new GUC option "enable_parallel_insert" is added, to allow insert in
parallel-mode. The default is on.

In addition to the GUC option, the user may want a mechanism to allow
inserts in parallel-mode with finer granularity at table level. The new
table option "parallel_insert_enabled" allows this. The default is true.

Author: "Hou, Zhijie"
Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
2021-03-18 07:25:27 +05:30
Andres Freund 5f79580ad6 Fix memory lifetime issues of replication slot stats.
When accessing replication slot stats, introduced in 9868167500,
pgstat_read_statsfiles() reads the data into newly allocated
memory. Unfortunately the current memory context at that point is the
callers, leading to leaks and use-after-free dangers.

The fix is trivial, explicitly use pgStatLocalContext. There's some
potential for further improvements, but that's outside of the scope of
this bugfix.

No backpatch necessary, feature is only in HEAD.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i@alap3.anarazel.de
2021-03-17 16:21:46 -07:00
Tom Lane 8620a7f6db Code review for server's handling of "tablespace map" files.
While looking at Robert Foggia's report, I noticed a passel of
other issues in the same area:

* The scheme for backslash-quoting newlines in pathnames is just
wrong; it will misbehave if the last ordinary character in a pathname
is a backslash.  I'm not sure why we're bothering to allow newlines
in tablespace paths, but if we're going to do it we should do it
without introducing other problems.  Hence, backslashes themselves
have to be backslashed too.

* The author hadn't read the sscanf man page very carefully, because
this code would drop any leading whitespace from the path.  (I doubt
that a tablespace path with leading whitespace could happen in
practice; but if we're bothering to allow newlines in the path, it
sure seems like leading whitespace is little less implausible.)  Using
sscanf for the task of finding the first space is overkill anyway.

* While I'm not 100% sure what the rationale for escaping both \r and
\n is, if the idea is to allow Windows newlines in the file then this
code failed, because it'd throw an error if it saw \r followed by \n.

* There's no cross-check for an incomplete final line in the map file,
which would be a likely apparent symptom of the improper-escaping
bug.

On the generation end, aside from the escaping issue we have:

* If needtblspcmapfile is true then do_pg_start_backup will pass back
escaped strings in tablespaceinfo->path values, which no caller wants
or is prepared to deal with.  I'm not sure if there's a live bug from
that, but it looks like there might be (given the dubious assumption
that anyone actually has newlines in their tablespace paths).

* It's not being very paranoid about the possibility of random stuff
in the pg_tblspc directory.  IMO we should ignore anything without an
OID-like name.

The escaping rule change doesn't seem back-patchable: it'll require
doubling of backslashes in the tablespace_map file, which is basically
a basebackup format change.  The odds of that causing trouble are
considerably more than the odds of the existing bug causing trouble.
The rest of this seems somewhat unlikely to cause problems too,
so no back-patch.
2021-03-17 16:18:46 -04:00
Tom Lane a50e4fd028 Prevent buffer overrun in read_tablespace_map().
Robert Foggia of Trustwave reported that read_tablespace_map()
fails to prevent an overrun of its on-stack input buffer.
Since the tablespace map file is presumed trustworthy, this does
not seem like an interesting security vulnerability, but still
we should fix it just in the name of robustness.

While here, document that pg_basebackup's --tablespace-mapping option
doesn't work with tar-format output, because it doesn't.  To make it
work, we'd have to modify the tablespace_map file within the tarball
sent by the server, which might be possible but I'm not volunteering.
(Less-painful solutions would require changing the basebackup protocol
so that the source server could adjust the map.  That's not very
appetizing either.)
2021-03-17 16:10:37 -04:00
Thomas Munro 7f7f25f15e Revert "Fix race in Parallel Hash Join batch cleanup."
This reverts commit 378802e371.
This reverts commit 3b8981b6e1.

Discussion: https://postgr.es/m/CA%2BhUKGJmcqAE3MZeDCLLXa62cWM0AJbKmp2JrJYaJ86bz36LFA%40mail.gmail.com
2021-03-18 01:10:55 +13:00
Michael Paquier 9fd2952cf4 Fix comment in indexing.c
578b229, that removed support for WITH OIDS, has changed
CatalogTupleInsert() to not return an Oid, but one comment was still
mentioning that.

Author: Vik Fearing
Discussion: https://postgr.es/m/fef01975-ed10-3601-7b9e-80ecef72d00b@postgresfriends.org
2021-03-17 18:07:00 +09:00
Peter Eisentraut e1ae40f381 Small error message improvement 2021-03-17 08:17:33 +01:00
Thomas Munro 378802e371 Update the names of Parallel Hash Join phases.
Commit 3048898e dropped -ING from some wait event names that correspond
to barrier phases.  Update the phases' names to match.

While we're here making cosmetic changes, also rename "DONE" to "FREE".
That pairs better with "ALLOCATE", and describes the activity that
actually happens in that phase (as we do for the other phases) rather
than describing a state.  The distinction is clearer after bugfix commit
3b8981b6 split the phase into two.  As for the growth barriers, rename
their "ALLOCATE" phase to "REALLOCATE", which is probably a better
description of what happens then.  Also improve the comments about
the phases a bit.

Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com
2021-03-17 18:43:04 +13:00
Thomas Munro 3b8981b6e1 Fix race in Parallel Hash Join batch cleanup.
With very unlucky timing and parallel_leader_participation off, PHJ
could attempt to access per-batch state just as it was being freed.
There was code intended to prevent that by checking for a cleared
pointer, but it was buggy.

Fix, by introducing an extra barrier phase.  The new phase
PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to
find a batch to help with, and PHJ_BUILD_DONE means that it is too late.
The last to detach will free the array of per-batch state as before, but
now it will also atomically advance the phase at the same time, so that
late attachers can avoid the hazard, without the data race.  This
mirrors the way per-batch hash tables are freed (see phases
PHJ_BATCH_PROBING and PHJ_BATCH_DONE).

Revealed by a one-off build farm failure, where BarrierAttach() failed a
sanity check assertion, because the memory had been clobbered by
dsa_free().

Back-patch to 11, where the code arrived.

Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz
2021-03-17 18:05:39 +13:00
Amit Kapila 6b67d72b60 Fix race condition in drop subscription's handling of tablesync slots.
Commit ce0fdbfe97 made tablesync slots permanent and allow Drop
Subscription to drop such slots. However, it is possible that before
tablesync worker could get the acknowledgment of slot creation, drop
subscription stops it and that can lead to a dangling slot on the
publisher. Prevent cancel/die interrupts while creating a slot in the
tablesync worker.

Reported-by: Thomas Munro as per buildfarm
Author: Amit Kapila
Reviewed-by: Vignesh C, Takamichi Osumi
Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com
2021-03-17 08:15:12 +05:30
Thomas Munro 9e7ccd9ef6 Enable parallelism in REFRESH MATERIALIZED VIEW.
Pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() so that parallel plans
are considered when running the underlying SELECT query.  This wasn't
done in commit e9baa5e9, which did this for CREATE MATERIALIZED VIEW,
because it wasn't yet known to be safe.

Since REFRESH always inserts into a freshly created table before later
merging or swapping the data into place with separate operations, we can
enable such plans here too.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Luc Vlaming <luc@swarm64.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CALj2ACXg-4hNKJC6nFnepRHYT4t5jJVstYvri%2BtKQHy7ydcr8A%40mail.gmail.com
2021-03-17 15:04:17 +13:00
Peter Geoghegan fbe4cb3bd4 Fix comment about promising tuples.
Oversight in commit d168b66682, which added bottom-up index deletion.
2021-03-16 13:38:52 -07:00
Tom Lane 4b12ab18c9 Avoid corner-case memory leak in SSL parameter processing.
After reading the root cert list from the ssl_ca_file, immediately
install it as client CA list of the new SSL context.  That gives the
SSL context ownership of the list, so that SSL_CTX_free will free it.
This avoids a permanent memory leak if we fail further down in
be_tls_init(), which could happen if bogus CRL data is offered.

The leak could only amount to something if the CRL parameters get
broken after server start (else we'd just quit) and then the server
is SIGHUP'd many times without fixing the CRL data.  That's rather
unlikely perhaps, but it seems worth fixing, if only because the
code is clearer this way.

While we're here, add some comments about the memory management
aspects of this logic.

Noted by Jelte Fennema and independently by Andres Freund.
Back-patch to v10; before commit de41869b6 it doesn't matter,
since we'd not re-execute this code during SIGHUP.

Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org
2021-03-16 16:03:06 -04:00
Stephen Frost c6fc50cb40 Use pre-fetching for ANALYZE
When we have posix_fadvise() available, we can improve the performance
of an ANALYZE by quite a bit by using it to inform the kernel of the
blocks that we're going to be asking for.  Similar to bitmap index
scans, the number of buffers pre-fetched is based off of the
maintenance_io_concurrency setting (for the particular tablespace or,
if not set, globally, via get_tablespace_maintenance_io_concurrency()).

Reviewed-By: Heikki Linnakangas, Tomas Vondra
Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2021-03-16 14:46:48 -04:00
Stephen Frost 94d13d474d Improve logging of auto-vacuum and auto-analyze
When logging auto-vacuum and auto-analyze activity, include the I/O
timing if track_io_timing is enabled.  Also, for auto-analyze, add the
read rate and the dirty rate, similar to how that information has
historically been logged for auto-vacuum.

Stephen Frost and Jakub Wartak

Reviewed-By: Heikki Linnakangas, Tomas Vondra
Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2021-03-16 14:46:48 -04:00
Tom Lane 1ea396362b Improve logging of bad parameter values in BIND messages.
Since commit ba79cb5dc, values of bind parameters have been logged
during errors in extended query mode.  However, we only did that after
we'd collected and converted all the parameter values, thus failing to
offer any useful localization of invalid-parameter problems.  Add a
separate callback that's used during parameter collection, and have it
print the parameter number, along with the input string if text input
format is used.

Justin Pryzby and Tom Lane

Discussion: https://postgr.es/m/20210104170939.GH9712@telsasoft.com
Discussion: https://postgr.es/m/CANfkH5k-6nNt-4cSv1vPB80nq2BZCzhFVR5O4VznYbsX0wZmow@mail.gmail.com
2021-03-16 11:16:41 -04:00
Alvaro Herrera acb7e4eb6b
Implement pipeline mode in libpq
Pipeline mode in libpq lets an application avoid the Sync messages in
the FE/BE protocol that are implicit in the old libpq API after each
query.  The application can then insert Sync at its leisure with a new
libpq function PQpipelineSync.  This can lead to substantial reductions
in query latency.

Co-authored-by: Craig Ringer <craig.ringer@enterprisedb.com>
Co-authored-by: Matthieu Garrigues <matthieu.garrigues@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aya Iwata <iwata.aya@jp.fujitsu.com>
Reviewed-by: Daniel Vérité <daniel@manitou-mail.org>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Kirk Jamison <k.jamison@fujitsu.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Nikhil Sontakke <nikhils@2ndquadrant.com>
Reviewed-by: Vaishnavi Prabakaran <VaishnaviP@fast.au.fujitsu.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>

Discussion: https://postgr.es/m/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com
Discussion: https://postgr.es/m/CAJkzx4T5E-2cQe3dtv2R78dYFvz+in8PY7A8MArvLhs_pg75gg@mail.gmail.com
2021-03-15 18:13:42 -03:00
Fujii Masao d75288fb27 Make archiver process an auxiliary process.
This commit changes WAL archiver process so that it's treated as
an auxiliary process and can use shared memory. This is an infrastructure
patch required for upcoming shared-memory based stats collector patch
series. These patch series basically need any processes including archiver
that can report the statistics to access to shared memory. Since this patch
itself is useful to simplify the code and when users monitor the status of
archiver, it's committed separately in advance.

This commit simplifies the code for WAL archiving. For example, previously
backends need to signal to archiver via postmaster when they notify
archiver that there are some WAL files to archive. On the other hand,
this commit removes that signal to postmaster and enables backends to
notify archier directly using shared latch.

Also, as the side of this change, the information about archiver process
becomes viewable at pg_stat_activity view.

Author: Kyotaro Horiguchi
Reviewed-by: Andres Freund, Álvaro Herrera, Julien Rouhaud, Tomas Vondra, Arthur Zakirov, Fujii Masao
Discussion: https://postgr.es/m/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp
2021-03-15 13:13:14 +09:00
Peter Geoghegan 0ea71c93a0 Notice that heap page has dead items during VACUUM.
Consistently set a flag variable that tracks whether the current heap
page has a dead item during lazy vacuum's heap scan.  We missed the
common case where there is an preexisting (or even a new) LP_DEAD heap
line pointer.

Also make it clear that the variable might be affected by an existing
line pointer, say from an earlier opportunistic pruning operation.  This
distinction is important because it's the main reason why we can't just
use the nearby tups_vacuumed variable instead.

No backpatch.  In theory failing to set the page level flag variable had
no consequences.  Currently it is only used to defensively check if a
page marked all visible has dead items, which should never happen anyway
(if it does then the table must be corrupt).

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Diagnosed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAtZb4+HJT_8RoOXvu4HM-Zd4HKS3YSMCH6+-W=bDyh-w@mail.gmail.com
2021-03-14 18:05:57 -07:00
Amit Kapila c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
Commit 05c8482f7f added special logic related to parallel-safety of FK
triggers. This is a bit of a hack and should have instead been done by
simply setting appropriate proparallel values on those trigger functions
themselves.

Suggested-by: Tom Lane
Author: Greg Nancarrow
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us
2021-03-13 09:20:52 +05:30
Peter Geoghegan 02b5940dbe Consolidate nbtree VACUUM metapage routines.
Simplify _bt_vacuum_needs_cleanup() functions's signature (it only needs
a single 'rel' argument now), and move it next to its sibling function
in nbtpage.c.

I believe that _bt_vacuum_needs_cleanup() was originally located in
nbtree.c due to an include dependency issue.  That's no longer an issue.

Follow-up to commit 9f3665fb.
2021-03-12 13:11:47 -08:00
Tom Lane f52c5d6749 Forbid marking an identity column as nullable.
GENERATED ALWAYS AS IDENTITY implies NOT NULL, but the code failed
to complain if you overrode that with "GENERATED ALWAYS AS IDENTITY
NULL".  One might think the old behavior was a feature, but it was
inconsistent because the outcome varied depending on the order of
the clauses, so it seems to have been just an oversight.

Per bug #16913 from Pavel Boev.  Back-patch to v10 where identity
columns were introduced.

Vik Fearing (minor tweaks by me)

Discussion: https://postgr.es/m/16913-3b5198410f67d8c6@postgresql.org
2021-03-12 11:08:42 -05:00
Thomas Munro 1b88b8908e Specialize checkpointer sort functions.
When sorting a potentially large number of dirty buffers, the
checkpointer can benefit from a faster sort routine.  One reported
improvement on a large buffer pool system was 1.4s -> 0.6s.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com
2021-03-12 23:56:02 +13:00
Amit Kapila 519e4c9ee2 Fix size overflow in calculation introduced by commits d6ad34f3 and bea449c6.
Reported-by: Thomas Munro
Author: Takayuki Tsunakawa
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CA+hUKG+oPoFizjABt=GXZWTEHx3oev5rAe2scjW2r6F1rguo5w@mail.gmail.com
2021-03-12 15:42:08 +05:30
Amit Kapila e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
The commit added code which used a relcache TriggerDesc field across
another cache access, which it shouldn't because the relcache doesn't
guarantee it won't get moved.

Diagnosed-by: Tom Lane
Author: Greg Nancarrow
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us
2021-03-12 15:14:41 +05:30
Thomas Munro 57dcc2ef33 Poll postmaster less frequently in recovery.
Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death.  This avoids
expensive system calls reported to slow down CPU-bound recovery by as
much as 10-30%.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
2021-03-12 19:45:42 +13:00
Thomas Munro de829ddf23 Add condition variable for walreceiver shutdown.
Use this new CV to wait for walreceiver shutdown without a sleep/poll
loop, while also benefiting from standard postmaster death handling.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
2021-03-12 19:45:42 +13:00
Thomas Munro 600f2f50b7 Add condition variable for recovery resume.
Replace a sleep loop with a CV, to get a fast reaction time when
recovery is resumed or the postmaster exits via standard infrastructure.
Unfortunately we still need to wake up every second to perform extra
polling during the recovery pause loop.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
2021-03-12 19:45:42 +13:00
Fujii Masao b82640df00 Send statistics collected during shutdown checkpoint to the stats collector.
When shutdown is requested, checkpointer performs checkpoint or
restartpoint, and updates the statistics, before it exits. But previously
checkpointer didn't send those statistics to the stats collector.

Shutdown checkpoint and restartpoint are treated as requested ones
instead of scheduled ones, so the number of them are counted in
pg_stat_bgwriter.checkpoints_req column.

Author: Masahiro Ikeda
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-12 14:23:00 +09:00
Fujii Masao 33394ee6f2 Force to send remaining WAL stats to the stats collector at walwriter exit.
In walwriter's main loop, WAL stats message is only sent if enough time
has passed since last one was sent to reach PGSTAT_STAT_INTERVAL msecs.
This is necessary to avoid overloading to the stats collector. But this
can cause recent WAL stats to be unsent when walwriter exits.

To ensure that all the WAL stats are sent, this commit makes walwriter
force to send remaining WAL stats to the collector when it exits because
of shutdown request. Note that those remaining WAL stats can still be
unsent when walwriter exits with non-zero exit code (e.g., FATAL error).
This is OK because that walwriter exit leads to server crash and
subsequent recovery discards all the stats. So there is no need to send
remaining stats in that case.

Author: Masahiro Ikeda
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-12 13:29:59 +09:00
Thomas Munro 43c6662496 Minor modernization for README.barrier.
Itanium is very uncommon and being discontinued.  ARM is everywhere.
Prefer ARM as an example of an architecture with weak memory ordering.
2021-03-12 15:36:16 +13:00
Peter Geoghegan 7bb97211a5 Save a few cycles during nbtree VACUUM.
Avoid calling RelationGetNumberOfBlocks() unnecessarily in the common
case where there are no deleted but not yet recycled pages to recycle
during a cleanup-only nbtree VACUUM operation.

Follow-up to commit e5d8a999, which (among other things) taught the
"skip full scan" nbtree VACUUM mechanism to only trigger a full index
scan when the absolute number of deleted pages in the index is
considered excessive.
2021-03-11 14:18:23 -08:00
Peter Geoghegan effdd3f3b6 Add back vacuum_cleanup_index_scale_factor parameter.
Commit 9f3665fb removed the vacuum_cleanup_index_scale_factor storage
parameter.  However, that creates dump/reload hazards when moving across
major versions.

Add back the vacuum_cleanup_index_scale_factor parameter (though not the
GUC of the same name) purely to avoid problems when using tools like
pg_upgrade.  The parameter remains disabled and undocumented.

No backpatch to Postgres 13, since vacuum_cleanup_index_scale_factor was
only disabled by REL_13_STABLE's version of master branch commit
9f3665fb in the first place -- the parameter already looks like this on
REL_13_STABLE.

Discussion: https://postgr.es/m/YEm/a3Ko3nKnBuVq@paquier.xyz
2021-03-11 12:42:46 -08:00
Robert Haas 32fd2b57d7 Be clear about whether a recovery pause has taken effect.
Previously, the code and documentation seem to have essentially
assumed than a call to pg_wal_replay_pause() would take place
immediately, but that's not the case, because we only check for a
pause in certain places. This means that a tool that uses this
function and then wants to do something else afterward that is
dependent on the pause having taken effect doesn't know how long it
needs to wait to be sure that no more WAL is going to be replayed.

To avoid that, add a new function pg_get_wal_replay_pause_state()
which returns either 'not paused', 'paused requested', or 'paused'.
After calling pg_wal_replay_pause() the status will immediate change
from 'not paused' to 'pause requested'; when the startup process
has noticed this, the status will change to 'pause'.  For backward
compatibility, pg_is_wal_replay_paused() still exists and returns
the same thing as before: true if a pause has been requested,
whether or not it has taken effect yet; and false if not.
The documentation is updated to clarify.

To improve the changes that a pause request is quickly confirmed
effective, adjust things so that WaitForWALToBecomeAvailable will
swiftly reach a call to recoveryPausesHere() when a pause request
is made.

Dilip Kumar, reviewed by Simon Riggs, Kyotaro Horiguchi, Yugo Nagata,
Masahiko Sawada, and Bharath Rupireddy.

Discussion: http://postgr.es/m/CAFiTN-vcLLWEm8Zr%3DYK83rgYrT9pbC8VJCfa1kY9vL3AUPfu6g%40mail.gmail.com
2021-03-11 15:07:03 -05:00
Peter Geoghegan 5f8727f5a6 VACUUM ANALYZE: Always update pg_class.reltuples.
vacuumlazy.c sometimes fails to update pg_class entries for each index
(to ensure that pg_class.reltuples is current), even though analyze.c
assumed that that must have happened during VACUUM ANALYZE.  There are
at least a couple of reasons for this.  For example, vacuumlazy.c could
fail to update pg_class when the index AM indicated that its statistics
are merely an estimate, per the contract for amvacuumcleanup() routines
established by commit e57345975c back in 2006.

Stop assuming that pg_class must have been updated with accurate
statistics within VACUUM ANALYZE -- update pg_class for indexes at the
same time as the table relation in all cases.  That way VACUUM ANALYZE
will never fail to keep pg_class.reltuples reasonably accurate.

The only downside of this approach (compared to the old approach) is
that it might inaccurately set pg_class.reltuples for indexes whose heap
relation ends up with the same inaccurate value anyway.  This doesn't
seem too bad.  We already consistently called vac_update_relstats() (to
update pg_class) for the heap/table relation twice during any VACUUM
ANALYZE -- once in vacuumlazy.c, and once in analyze.c.  We now make
sure that we call vac_update_relstats() at least once (though often
twice) for each index.

This is follow up work to commit 9f3665fb, which dealt with issues in
btvacuumcleanup().  Technically this fixes an unrelated issue, though.
btvacuumcleanup() no longer provides an accurate num_index_tuples value
following commit 9f3665fb (when there was no btbulkdelete() call during
the VACUUM operation in question), but hashvacuumcleanup() has worked in
the same way for many years now.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzknxdComjhqo4SUxVFk_Q1171GJO2ZgHZ1Y6pion6u8rA@mail.gmail.com
Backpatch: 13-, just like commit 9f3665fb.
2021-03-10 17:07:57 -08:00
Peter Geoghegan 9f3665fbfc Don't consider newly inserted tuples in nbtree VACUUM.
Remove the entire idea of "stale stats" within nbtree VACUUM (stop
caring about stats involving the number of inserted tuples).  Also
remove the vacuum_cleanup_index_scale_factor GUC/param on the master
branch (though just disable them on postgres 13).

The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM
partially responsible for deciding when pg_class.reltuples stats needed
to be updated.  This seems contrary to the spirit of the index AM API,
though -- it is not actually necessary for an index AM's bulk delete and
cleanup callbacks to provide accurate stats when it happens to be
inconvenient.  The core code owns that.  (Index AMs have the authority
to perform or not perform certain kinds of deferred cleanup based on
their own considerations, such as page deletion and recycling, but that
has little to do with pg_class.reltuples/num_index_tuples.)

This issue was fairly harmless until the introduction of the
autovacuum_vacuum_insert_threshold feature by commit b07642db, which had
an undesirable interaction with the vacuum_cleanup_index_scale_factor
mechanism: it made insert-driven autovacuums perform full index scans,
even though there is no real benefit to doing so.  This has been tied to
a regression with an append-only insert benchmark [1].

Also have remaining cases that perform a full scan of an index during a
cleanup-only nbtree VACUUM indicate that the final tuple count is only
an estimate.  This prevents vacuumlazy.c from setting the index's
pg_class.reltuples in those cases (it will now only update pg_class when
vacuumlazy.c had TIDs for nbtree to bulk delete).  This arguably fixes
an oversight in deduplication-related bugfix commit 48e12913.

[1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com
Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
2021-03-10 16:27:01 -08:00
Bruce Momjian 845ac7f847 C comments: improve description of GiST NSN and GistBuildLSN
GiST indexes are complex, so adding more details in the code might help
someone.

Discussion: https://postgr.es/m/20210302164021.GA364@momjian.us
2021-03-10 17:03:10 -05:00
Thomas Munro d87251048a Replace buffer I/O locks with condition variables.
1.  Backends waiting for buffer I/O are now interruptible.

2.  If something goes wrong in a backend that is currently performing
I/O, waiting backends no longer wake up until that backend reaches
AbortBufferIO() and broadcasts on the CV.  Previously, any waiters would
wake up (because the I/O lock was automatically released) and then
busy-loop until AbortBufferIO() cleared BM_IO_IN_PROGRESS.

3.  LWLockMinimallyPadded is removed, as it would now be unused.

Author: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version, 2016)
Discussion: https://postgr.es/m/CA%2BhUKGJ8nBFrjLuCTuqKN0pd2PQOwj9b_jnsiGFFMDvUxahj_A%40mail.gmail.com
Discussion: https://postgr.es/m/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com
2021-03-11 10:36:17 +13:00
Tom Lane c3ffe34863 Avoid creating duplicate cached plans for inherited FK constraints.
When a foreign key constraint is applied to a partitioned table, each
leaf partition inherits a similar FK constraint.  We were processing all
of those constraints independently, meaning that in large partitioning
trees we'd build up large collections of cached FK-checking query plans.
However, in all cases but one, the generated queries are actually
identical for all members of the inheritance tree (because, in most
cases, the query only mentions the topmost table of the other side of
the FK relationship).  So we can share a single cached plan among all
the partitions, saving memory, not to mention time to build and maintain
the cached plans.

Keisuke Kuroda and Amit Langote

Discussion: https://postgr.es/m/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1
2021-03-10 14:22:31 -05:00
Peter Eisentraut bbaf315309 Add bound check before bsearch() for performance
In the current lazy vacuum implementation, some index AMs such as
btree indexes call lazy_tid_reaped() for each index tuple during
ambulkdelete to check if the index tuple points to the (collected)
garbage tuple.  In that function, we simply call bsearch(), but we
should be able to know the result without bsearch() if the index tuple
points to the heap tuple that is out of range of the collected garbage
tuples.  Therefore, add a simple bound check before resorting to
bsearch().  Testing has shown that this can give significant
performance benefits.

Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+fd4k76j8jKzJzcx8UqEugvayaMSnQz0iLUt_XgBp-_-bd22A@mail.gmail.com
2021-03-10 15:19:37 +01:00
Peter Eisentraut 1657b37d7c Small debug message tweak
This makes the wording of the delete case match the update case.
2021-03-10 08:16:38 +01:00
Amit Kapila e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
Initialize other newly added variables in max_parallel_hazard_context via
is_parallel_safe() because we don't check the parallel-safety of target
relations in that function.

Reported-by: Tom Lane as per buildfarm
Author: Amit Kapila
Discussion: https://postgr.es/m/2060179.1615347455@sss.pgh.pa.us
2021-03-10 10:06:39 +05:30
Amit Kapila 05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".
Parallel SELECT can't be utilized for INSERT in the following cases:
- INSERT statement uses the ON CONFLICT DO UPDATE clause
- Target table has a parallel-unsafe: trigger, index expression or
  predicate, column default expression or check constraint
- Target table has a parallel-unsafe domain constraint on any column
- Target table is a partitioned table with a parallel-unsafe partition key
  expression or support function

The planner is updated to perform additional parallel-safety checks for
the cases listed above, for determining whether it is safe to run INSERT
in parallel-mode with an underlying parallel SELECT. The planner will
consider using parallel SELECT for "INSERT INTO ... SELECT ...", provided
nothing unsafe is found from the additional parallel-safety checks, or
from the existing parallel-safety checks for SELECT.

While checking parallel-safety, we need to check it for all the partitions
on the table which can be costly especially when we decide not to use a
parallel plan. So, in a separate patch, we will introduce a GUC and or a
reloption to enable/disable parallelism for Insert statements.

Prior to entering parallel-mode for the execution of INSERT with parallel
SELECT, a TransactionId is acquired and assigned to the current
transaction state. This is necessary to prevent the INSERT from attempting
to assign the TransactionId whilst in parallel-mode, which is not allowed.
This approach has a disadvantage in that if the underlying SELECT does not
return any rows, then the TransactionId is not used, however that
shouldn't happen in practice in many cases.

Author: Greg Nancarrow, Amit Langote, Amit Kapila
Reviewed-by: Amit Langote, Hou Zhijie, Takayuki Tsunakawa, Antonin Houska, Bharath Rupireddy, Dilip Kumar, Vignesh C, Zhihong Yu, Amit Kapila
Tested-by: Tang, Haiying
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-03-10 07:38:58 +05:30
Fujii Masao ff99918c62 Track total amounts of times spent writing and syncing WAL data to disk.
This commit adds new GUC track_wal_io_timing. When this is enabled,
the total amounts of time XLogWrite writes and issue_xlog_fsync syncs
WAL data to disk are counted in pg_stat_wal. This information would be
useful to check how much WAL write and sync affect the performance.

Enabling track_wal_io_timing will make the server query the operating
system for the current time every time WAL is written or synced,
which may cause significant overhead on some platforms. To avoid such
additional overhead in the server with track_io_timing enabled,
this commit introduces track_wal_io_timing as a separate parameter from
track_io_timing.

Note that WAL write and sync activity by walreceiver has not been tracked yet.

This commit makes the server also track the numbers of times XLogWrite
writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal,
regardless of the setting of track_wal_io_timing. This counters can be
used to calculate the WAL write and sync time per request, for example.

Bump PGSTAT_FILE_FORMAT_ID.

Bump catalog version.

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-09 16:52:06 +09:00
Michael Paquier 9d2d457009 Add support for more progress reporting in COPY
The command (TO or FROM), its type (file, pipe, program or callback),
and the number of tuples excluded by a WHERE clause in COPY FROM are
added to the progress reporting already available.

The column "lines_processed" is renamed to "tuples_processed" to
disambiguate the meaning of this column in the cases of CSV and BINARY
COPY and to be more consistent with the other catalog progress views.

Bump catalog version, again.

Author: Matthias van de Meent
Reviewed-by: Michael Paquier, Justin Pryzby, Bharath Rupireddy, Josef
Šimánek, Tomas Vondra
Discussion: https://postgr.es/m/CAEze2WiOcgdH4aQA8NtZq-4dgvnJzp8PohdeKchPkhMY-jWZXA@mail.gmail.com
2021-03-09 14:21:03 +09:00
Michael Paquier f9264d1524 Remove support for SSL compression
PostgreSQL disabled compression as of e3bdb2d and the documentation
recommends against using it since.  Additionally, SSL compression has
been disabled in OpenSSL since version 1.1.0, and was disabled in many
distributions long before that.  The most recent TLS version, TLSv1.3,
disallows compression at the protocol level.

This commit removes the feature itself, removing support for the libpq
parameter sslcompression (parameter still listed for compatibility
reasons with existing connection strings, just ignored), and removes
the equivalent field in pg_stat_ssl and de facto PgBackendSSLStatus.

Note that, on top of removing the ability to activate compression by
configuration, compression is actively disabled in both frontend and
backend to avoid overrides from local configurations.

A TAP test is added for deprecated SSL parameters to check after
backwards compatibility.

Bump catalog version.

Author: Daniel Gustafsson
Reviewed-by: Peter Eisentraut, Magnus Hagander, Michael Paquier
Discussion:  https://postgr.es/m/7E384D48-11C5-441B-9EC3-F7DB1F8518F6@yesql.se
2021-03-09 11:16:47 +09:00
Tom Lane d4545dc19b Complain if a function-in-FROM returns a set when it shouldn't.
Throw a "function protocol violation" error if a function in FROM
tries to return a set though it wasn't marked proretset.  Although
such cases work at the moment, it doesn't seem like something we
want to guarantee will keep working.  Besides, there are other
negative consequences of not setting the proretset flag, such as
potentially bad plans.

No back-patch, since if there is any third-party code violating
this expectation, people wouldn't appreciate us breaking it in
a minor release.

Discussion: https://postgr.es/m/1636062.1615141782@sss.pgh.pa.us
2021-03-08 18:54:55 -05:00
Tom Lane 5c06abb9b9 Validate the OID argument of pg_import_system_collations().
"SELECT pg_import_system_collations(0)" caused an assertion failure.
With a random nonzero argument --- or indeed with zero, in non-assert
builds --- it would happily make pg_collation entries with garbage
values of collnamespace.  These are harmless as far as I can tell
(unless maybe the OID happens to become used for a schema, later on?).
In any case this isn't a security issue, since the function is
superuser-only.  But it seems like a gotcha for unwary DBAs, so let's
add a check that the given OID belongs to some schema.

Back-patch to v10 where this function was introduced.
2021-03-08 18:21:51 -05:00
Tom Lane 6c20bdb2a2 Further tweak memory management for regex DFAs.
Coverity is still unhappy after commit 190c79884, and after looking
closer I think it might be onto something.  The callers of newdfa()
typically drop out if v->err has been set nonzero, which newdfa()
is faithfully doing if it fails.  However, what if v->err was already
nonzero before we entered newdfa()?  Then newdfa() could succeed and
the caller would promptly leak its result.

I don't think this scenario can actually happen, but the predicate
"v->err is always zero when newdfa() is called" seems difficult to be
entirely sure of; there's a good deal of code that potentially could
get that wrong.

It seems better to adjust the callers to directly check for a null
result instead of relying on ISERR() tests.  This is slightly cheaper
than the previous coding anyway.

Lacking evidence that there's any real bug, no back-patch.
2021-03-08 16:32:29 -05:00
Amit Kapila 8a812e5106 Track replication origin progress for rollbacks.
Commit 1eb6d6527a allowed to track replica origin replay progress for 2PC
but it was not complete. It misses to properly track the progress for
rollback prepared especially it missed updating the code for recovery.
Additionally, we need to allow tracking it on subscriber nodes where
wal_level might not be logical.

It is required to track decoding of 2PC which is committed in PG14
(a271a1b50e) and also nobody complained about this till now so not
backpatching it.

Author: Amit Kapila
Reviewed-by: Michael Paquier and Ajin Cherian
Discussion: https://postgr.es/m/CAA4eK1L-kHmMnSdrRW6UhRbCjR7cgh04c+6psY15qzT6ktcd+g@mail.gmail.com
2021-03-08 07:54:03 +05:30
Heikki Linnakangas 3174d69fb9 Remove server and libpq support for old FE/BE protocol version 2.
Protocol version 3 was introduced in PostgreSQL 7.4. There shouldn't be
many clients or servers left out there without version 3 support. But as
a courtesy, I kept just enough of the old protocol support that we can
still send the "unsupported protocol version" error in v2 format, so that
old clients can display the message properly. Likewise, libpq still
understands v2 ErrorResponse messages when establishing a connection.

The impetus to do this now is that I'm working on a patch to COPY
FROM, to always prefetch some data. We cannot do that safely with the
old protocol, because it requires parsing the input one byte at a time
to detect the end-of-copy marker.

Reviewed-by: Tom Lane, Alvaro Herrera, John Naylor
Discussion: https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cca3b%40iki.fi
2021-03-04 10:45:55 +02:00
Tom Lane 0a687c8f10 Add trim_array() function.
This has been in the SQL spec since 2008.  It's a pretty thin
wrapper around the array slice functionality, but the spec
says we should have it, so here it is.

Vik Fearing, reviewed by Dian Fay

Discussion: https://postgr.es/m/fc92ce17-9655-8ff1-c62a-4dc4c8ccd815@postgresfriends.org
2021-03-03 16:39:57 -05:00
Peter Eisentraut e527a99055 Some copy-editing of GUC descriptions 2021-03-03 07:14:35 +01:00
Thomas Munro 8eda3eba30 Use sort_template.h for qsort_tuple() and qsort_ssup().
Replace the Perl code previously used to generate specialized sort
functions with sort_template.h.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com
2021-03-03 17:02:32 +13:00
Amit Kapila 19890a064e Add option to enable two_phase commits via pg_create_logical_replication_slot.
Commit 0aa8a01d04 extends the output plugin API to allow decoding of
prepared xacts and allowed the user to enable/disable the two-phase option
via pg_logical_slot_get_changes(). This can lead to a problem such that
the first time when it gets changes via pg_logical_slot_get_changes()
without two_phase option enabled it will not get the prepared even though
prepare is after consistent snapshot. Now next time during getting changes,
if the two_phase option is enabled it can skip prepare because by that
time start decoding point has been moved. So the user will only get commit
prepared.

Allow to enable/disable this option at the create slot time and default
will be false. It will break the existing slots which is fine in a major
release.

Author: Ajin Cherian
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
2021-03-03 07:34:11 +05:30
Peter Geoghegan 5b2f2af3d9 nbtree page deletion: Add leaftopparent assertion.
Add documenting assertion.  This makes it easier to follow how we
maintain the top parent link in target subtree's half-dead/leaf level
page.
2021-03-02 14:06:07 -08:00
Peter Geoghegan 3d8d5787a3 Fix nbtree page deletion error messages.
Adjust some "can't happen" error messages that assumed that the page
deletion target page must be a half-dead page.  This assumption was
wrong in the case of an internal target page.  Simply refer to these
pages as the target page instead.

Internal pages are never marked half-dead.  There is exactly one
half-dead page for each subtree undergoing deletion.  The half-dead page
is also the target subtree's leaf-level page.  This has been the case
since commit efada2b8, which totally overhauled nbtree page deletion.
2021-03-02 13:02:24 -08:00
Tom Lane d16f8c8e41 Mark default_transaction_read_only as GUC_REPORT.
This allows clients to find out the setting at connection time without
having to expend a query round trip to do so; which is helpful when
trying to identify read/write servers.  (One must also look at
in_hot_standby, but that's already GUC_REPORT, cf bf8a662c9.)
Modifying libpq to make use of this will come soon, but I felt it
cleaner to push the server change separately.

Haribabu Kommi, Greg Nancarrow, Vignesh C; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.

Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
2021-03-02 13:53:54 -05:00
Tom Lane 4604f83fdf Suppress unnecessary regex subre nodes in a couple more cases.
This extends the changes made in commit cebc1d34e, teaching
parseqatom() to generate fewer or cheaper subre nodes in some edge
cases.  The case of interest here is a quantified atom that is "messy"
only because it has greediness opposite to what preceded it (whereas
captures and backrefs are intrinsically messy).  In this case we don't
need an iteration node, since we don't care where the sub-matches of
the quantifier are; and we might also not need a second concatenation
node.  This seems of only marginal real-world use according to my
testing, but I wanted to get it in before wrapping up this series of
regex performance fixes.

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-03-02 12:14:14 -05:00
Tom Lane 0c3405cf11 Improve performance of regular expression back-references.
In some cases, at the time that we're doing an NFA-based precheck
of whether a backref subexpression can match at a particular place
in the string, we already know which substring the referenced
subexpression matched.  If so, we might as well forget about the NFA
and just compare the substring; this is faster and it gives an exact
rather than approximate answer.

In general, this optimization can help while we are prechecking within
the second child expression of a concat node, while the capture was
within the first child expression; then the substring was saved during
cdissect() of the first child and will be available to NFA checks done
while cdissect() recurses into the second child.  It can help quite a
lot if the tree looks like

              concat
             /      \
      capture        concat
                    /      \
     expensive stuff        backref

as we will be able to avoid recursively dissecting the "expensive
stuff" before discovering that the backref isn't satisfied with a
particular midpoint that the lower concat node is testing.  This
doesn't help if the concat tree is left-deep, as the capture node
won't get set soon enough (and it's hard to fix that without changing
the engine's match behavior).  Fortunately, right-deep concat trees
are the common case.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us
2021-03-02 11:55:12 -05:00
Tom Lane 4aea704a5b Fix semantics of regular expression back-references.
POSIX defines the behavior of back-references thus:

    The back-reference expression '\n' shall match the same (possibly
    empty) string of characters as was matched by a subexpression
    enclosed between "\(" and "\)" preceding the '\n'.

As far as I can see, the back-reference is supposed to consider only
the data characters matched by the referenced subexpression.  However,
because our engine copies the NFA constructed from the referenced
subexpression, it effectively enforces any constraints therein, too.
As an example, '(^.)\1' ought to match 'xx', or any other string
starting with two occurrences of the same character; but in our code
it does not, and indeed can't match anything, because the '^' anchor
constraint is included in the backref's copied NFA.  If POSIX intended
that, you'd think they'd mention it.  Perl for one doesn't act that
way, so it's hard to conclude that this isn't a bug.

Fix by modifying the backref's NFA immediately after it's copied from
the reference, replacing all constraint arcs by EMPTY arcs so that the
constraints are treated as automatically satisfied.  This still allows
us to enforce matching rules that depend only on the data characters;
for example, in '(^\d+).*\1' the NFA matching step will still know
that the backref can only match strings of digits.

Perhaps surprisingly, this change does not affect the results of any
of a rather large corpus of real-world regexes.  Nonetheless, I would
not consider back-patching it, since it's a clear compatibility break.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us
2021-03-02 11:34:53 -05:00
Michael Paquier fabde52fab Simplify code to switch pg_class.relrowsecurity in tablecmds.c
The same code pattern was repeated twice to enable or disable ROW LEVEL
SECURITY with an ALTER TABLE command.  This makes the code slightly
cleaner.

Author: Justin Pryzby
Reviewed-by: Zhihong Yu
Discussion: https://postgr.es/m/20210228211854.GC20769@telsasoft.com
2021-03-02 12:30:21 +09:00
Tom Lane ffd3944ab9 Improve reporting for syntax errors in multi-line JSON data.
Point to the specific line where the error was detected; the
previous code tended to include several preceding lines as well.
Avoid re-scanning the entire input to recompute which line that
was.  Simplify the logic a bit.  Add test cases.

Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself

Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com
2021-03-01 16:44:17 -05:00
Thomas Munro bd69ddfcdb Remove obsolete comment for WaitForProcSignalBarrier().
Commit 814f1d8b removed the behavior described.

Reported-by: Amit Kapila <amit.kapila16@gmail.com>
2021-03-02 09:30:57 +13:00
Thomas Munro f5a5773a9d Allow condition variables to be used in interrupt code.
Adjust the condition variable sleep loop to work correctly when code
reached by its internal CHECK_FOR_INTERRUPTS() call interacts with
another condition variable.

There are no such cases currently, but a proposed patch would do this.

Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com
2021-03-01 17:24:47 +13:00
Thomas Munro 814f1d8bc3 Use condition variables for ProcSignalBarriers.
Instead of a poll/sleep loop, use a condition variable for precise
wake-up whenever a backend's pss_barrierGeneration advances.

Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com
2021-03-01 17:23:43 +13:00
Amit Kapila 8bdb1332eb Avoid repeated decoding of prepared transactions after a restart.
In commit a271a1b50e, we allowed decoding at prepare time and the prepare
was decoded again if there is a restart after decoding it. It was done
that way because we can't distinguish between the cases where we have not
decoded the prepare because it was prior to consistent snapshot or we have
decoded it earlier but restarted. To distinguish between these two cases,
we have introduced an initial_consistent_point at the slot level which is
an LSN at which we found a consistent point at the time of slot creation.
This is also the point where we have exported a snapshot for the initial
copy. So, prepare transaction prior to this point are sent along with
commit prepared.

This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It
will break existing slots which is fine in a major release.

Author: Ajin Cherian, based on idea by Andres Freund
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
2021-03-01 09:11:18 +05:30
Thomas Munro 6230912f23 Use FeBeWaitSet for walsender.c.
This avoids the need to set up and tear down a fresh WaitEventSet every
time we need need to wait.  We have to add an explicit exit on
postmaster exit (FeBeWaitSet isn't set up to do that automatically), so
move the code to do that into a new function to avoid repetition.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
2021-03-01 16:19:38 +13:00
Thomas Munro a042ba2ba7 Introduce symbolic names for FeBeWaitSet positions.
Previously we used 0 and 1 to refer to the socket and latch in far flung
parts of the tree, without any explanation.  Also use PGINVALID_SOCKET
rather than -1 in a couple of places that didn't already do that.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
2021-03-01 16:10:16 +13:00
Amit Kapila b4e3dc7fd4 Update the docs and comments for decoding of prepared xacts.
Commit a271a1b50e introduced decoding at prepare time in ReorderBuffer.
This can lead to deadlock for out-of-core logical replication solutions
that uses this feature to build distributed 2PC in case such transactions
lock [user] catalog tables exclusively. They need to inform users to not
have locks on catalog tables (via explicit LOCK command) in such
transactions.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210222222847.tpnb6eg3yiykzpky@alap3.anarazel.de
2021-03-01 08:14:33 +05:30
Thomas Munro 6148656a0b Use EVFILT_SIGNAL for kqueue latches.
Cut down on system calls and other overheads by waiting for SIGURG
explicitly with kqueue instead of using a signal handler and self-pipe.
Affects *BSD and macOS systems.

This leaves only the poll implementation with a signal handler and the
traditional self-pipe trick.

Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 14:20:04 +13:00
Thomas Munro 6a2a70a020 Use signalfd(2) for epoll latches.
Cut down on system calls and other overheads by reading from a signalfd
instead of using a signal handler and self-pipe.  Affects Linux sytems,
and possibly others including illumos that implement the Linux epoll and
signalfd interfaces.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 14:12:02 +13:00
Thomas Munro 83709a0d5a Use SIGURG rather than SIGUSR1 for latches.
Traditionally, SIGUSR1 has been overloaded for ad-hoc signals,
procsignal.c signals and latch.c wakeups.  Move that last use over to a
new dedicated signal.  SIGURG is normally used to report out-of-band
socket data, but PostgreSQL doesn't use that facility.

The signal handler is now installed in all postmaster children by
InitializeLatchSupport().  Those wishing to disconnect from it should
call ShutdownLatchSupport().

Future patches will use this separation of signals to avoid the need for
a signal handler on some operating systems.

Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 12:44:12 +13:00
Thomas Munro c8f3bc2401 Optimize latches to send fewer signals.
Don't send signals to processes that aren't sleeping.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 12:44:12 +13:00
Thomas Munro d1b90995e8 Remove latch.c workaround for Linux < 2.6.27.
Commit 82ebbeb0 added a workaround for systems with no epoll_create1()
and EPOLL_CLOEXEC.  Linux < 2.6.27 and glibc < 2.9 are long gone.  Now
seems like a good time to drop the extra code, because otherwise we'd
have to add similar already-dead workaround code to new patches using
XXX_CLOEXEC flags that arrived in the same kernel release.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGKL_%3DaO%3Dr30N%3Ds9VoDgTqHpRSzePRbA9dkYO7snc7HsxA%40mail.gmail.com
2021-03-01 11:24:28 +13:00
Alvaro Herrera 25936fd46c
Fix use-after-free bug with AfterTriggersTableData.storeslot
AfterTriggerSaveEvent() wrongly allocates the slot in execution-span
memory context, whereas the correct thing is to allocate it in
a transaction-span context, because that's where the enclosing
AfterTriggersTableData instance belongs into.

Backpatch to 12 (the test back to 11, where it works well with no code
changes, and it's good to have to confirm that the case was previously
well supported); this bug seems introduced by commit ff11e7f4b9.

Reported-by: Bertrand Drouvot <bdrouvot@amazon.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/39a71864-b120-5a5c-8cc5-c632b6f16761@amazon.com
2021-02-27 18:09:15 -03:00
David Rowley 977b2c0853 Add missing TidRangeScan readfunc
Mistakenly forgotten in bb437f995
2021-02-27 23:21:21 +13:00
David Rowley bb437f995d Add TID Range Scans to support efficient scanning ranges of TIDs
This adds a new executor node named TID Range Scan.  The query planner
will generate paths for TID Range scans when quals are discovered on base
relations which search for ranges on the table's ctid column.  These
ranges may be open at either end. For example, WHERE ctid >= '(10,0)';
will return all tuples on page 10 and over.

To support this, two new optional callback functions have been added to
table AM.  scan_set_tidrange is used to set the scan range to just the
given range of TIDs.  scan_getnextslot_tidrange fetches the next tuple
in the given range.

For AMs were scanning ranges of TIDs would not make sense, these functions
can be set to NULL in the TableAmRoutine.  The query planner won't
generate TID Range Scan Paths in that case.

Author: Edmund Horner, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
2021-02-27 22:59:36 +13:00
Peter Eisentraut f4adc41c4f Enhanced cycle mark values
Per SQL:202x draft, in the CYCLE clause of a recursive query, the
cycle mark values can be of type boolean and can be omitted, in which
case they default to TRUE and FALSE.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-27 08:13:24 +01:00
Tom Lane 0fc1af174c Improve memory management in regex compiler.
The previous logic here created a separate pool of arcs for each
state, so that the out-arcs of each state were physically stored
within it.  Perhaps this choice was driven by trying to not include
a "from" pointer within each arc; but Spencer gave up on that idea
long ago, and it's hard to see what the value is now.  The approach
turns out to be fairly disastrous in terms of memory consumption,
though.  In the first place, NFAs built by this engine seem to have
about 4 arcs per state on average, with a majority having only one
or two out-arcs.  So pre-allocating 10 out-arcs for each state is
already cause for a factor of two or more bloat.  Worse, the NFA
optimization phase moves arcs around with abandon.  In a large NFA,
some of the states will have hundreds of out-arcs, so towards the
end of the optimization phase we have a significant number of states
whose arc pools have room for hundreds of arcs each, even though only
a few of those arcs are in use.  We have seen real-world regexes in
which this effect bloats the memory requirement by 25X or even more.

Hence, get rid of the per-state arc pools in favor of a single arc
pool for the whole NFA, with variable-sized allocation batches
instead of always asking for 10 at a time.  While we're at it,
let's batch the allocations of state structs too, to further reduce
the malloc traffic.

This incidentally allows moveouts() to be optimized in a similar
way to moveins(): when moving an arc to another state, it's now
valid to just re-link the same arc struct into a different outchain,
where before the code invariants required us to make a physically
new arc and then free the old one.

These changes reduce the regex compiler's typical space consumption
for average-size regexes by about a factor of two, and much more for
large or complicated regexes.  In a large test set of real-world
regexes, we formerly had half a dozen cases that failed with "regular
expression too complex" due to exceeding the REG_MAX_COMPILE_SPACE
limit (about 150MB); we would have had to raise that limit to
something close to 400MB to make them work with the old code.  Now,
none of those cases need more than 13MB to compile.  Furthermore,
the test set is about 10% faster overall due to less malloc traffic.

Discussion: https://postgr.es/m/168861.1614298592@sss.pgh.pa.us
2021-02-26 13:52:10 -05:00
Thomas Munro 8556267b2b Revert "pg_collation_actual_version() -> pg_collation_current_version()."
This reverts commit 9cf184cc05.  Name
change less well received than anticipated.

Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org
2021-02-26 15:29:27 +13:00
Tom Lane 80ca8464fe Fix list-manipulation bug in WITH RECURSIVE processing.
makeDependencyGraphWalker and checkWellFormedRecursionWalker
thought they could hold onto a pointer to a list's first
cons cell while the list was modified by recursive calls.
That was okay when the cons cell was actually separately
palloc'd ... but since commit 1cff1b95a, it's quite unsafe,
leading to core dumps or incorrect complaints of faulty
WITH nesting.

In the field this'd require at least a seven-deep WITH nest
to cause an issue, but enabling DEBUG_LIST_MEMORY_USAGE
allows the bug to be seen with lesser nesting depths.

Per bug #16801 from Alexander Lakhin.  Back-patch to v13.

Michael Paquier and Tom Lane

Discussion: https://postgr.es/m/16801-393c7922143eaa4d@postgresql.org
2021-02-25 20:47:32 -05:00
Peter Geoghegan 2376361839 VACUUM VERBOSE: Count "newly deleted" index pages.
Teach VACUUM VERBOSE to report on pages deleted by the _current_ VACUUM
operation -- these are newly deleted pages.  VACUUM VERBOSE continues to
report on the total number of deleted pages in the entire index (no
change there).  The former is a subset of the latter.

The distinction between each category of deleted index page only arises
with index AMs where page deletion is supported and is decoupled from
page recycling for performance reasons.

This is follow-up work to commit e5d8a999, which made nbtree store
64-bit XIDs (not 32-bit XIDs) in pages at the point at which they're
deleted.  Note that the btm_last_cleanup_num_delpages metapage field
added by that commit usually gets set to pages_newly_deleted.  The
exceptions (the scenarios in which they're not equal) all seem to be
tricky cases for the implementation (of page deletion and recycling) in
general.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX%3Dd5u-tRYhi-F4wnV2uN2zHpMUXw%40mail.gmail.com
2021-02-25 14:32:18 -08:00
Tom Lane 301ed8812e Doc: remove src/backend/regex/re_syntax.n.
We aren't publishing this file as documentation, and it's been
much more haphazardly maintained than the real docs in func.sgml,
so let's just drop it.  I think the only reason I included it in
commit 7bcc6d98f was that the Berkeley-era sources had had a man
page in this directory.

Discussion: https://postgr.es/m/4099447.1614186542@sss.pgh.pa.us
2021-02-25 13:33:27 -05:00
Tom Lane 7dc13a0f08 Change regex \D and \W shorthands to always match newlines.
Newline is certainly not a digit, nor a word character, so it is
sensible that it should match these complemented character classes.
Previously, \D and \W acted that way by default, but in
newline-sensitive mode ('n' or 'p' flag) they did not match newlines.

This behavior was previously forced because explicit complemented
character classes don't match newlines in newline-sensitive mode;
but as of the previous commit that implementation constraint no
longer exists.  It seems useful to change this because the primary
real-world use for newline-sensitive mode seems to be to match the
default behavior of other regex engines such as Perl and Javascript
... and their default behavior is that these match newlines.

The old behavior can be kept by writing an explicit complemented
character class, i.e. [^[:digit:]] or [^[:word:]].  (This means
that \D and \W are not exactly equivalent to those strings, but
they weren't anyway.)

Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
2021-02-25 13:29:06 -05:00
Tom Lane 2a0af7fe46 Allow complemented character class escapes within regex brackets.
The complement-class escapes \D, \S, \W are now allowed within
bracket expressions.  There is no semantic difficulty with doing
that, but the rather hokey macro-expansion-based implementation
previously used here couldn't cope.

Also, invent "word" as an allowed character class name, thus "\w"
is now equivalent to "[[:word:]]" outside brackets, or "[:word:]"
within brackets.  POSIX allows such implementation-specific
extensions, and the same name is used in e.g. bash.

One surprising compatibility issue this raises is that constructs
such as "[\w-_]" are now disallowed, as our documentation has always
said they should be: character classes can't be endpoints of a range.
Previously, because \w was just a macro for "[:alnum:]_", such a
construct was read as "[[:alnum:]_-_]", so it was accepted so long as
the character after "-" was numerically greater than or equal to "_".

Some implementation cleanup along the way:

* Remove the lexnest() hack, and in consequence clean up wordchrs()
to not interact with the lexer.

* Fix colorcomplement() to not be O(N^2) in the number of colors
involved.

* Get rid of useless-as-far-as-I-can-see calls of element()
on single-character character element names in brackpart().
element() always maps these to the character itself, and things
would be quite broken if it didn't --- should "[a]" match something
different than "a" does?  Besides, the shortcut path in brackpart()
wasn't doing this anyway, making it even more inconsistent.

Discussion: https://postgr.es/m/2845172.1613674385@sss.pgh.pa.us
Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
2021-02-25 13:00:40 -05:00
Peter Geoghegan e5d8a99903 Use full 64-bit XIDs in deleted nbtree pages.
Otherwise we risk "leaking" deleted pages by making them non-recyclable
indefinitely.  Commit 6655a729 did the same thing for deleted pages in
GiST indexes.  That work was used as a starting point here.

Stop storing an XID indicating the oldest bpto.xact across all deleted
though unrecycled pages in nbtree metapages.  There is no longer any
reason to care about that condition/the oldest XID.  It only ever made
sense when wraparound was something _bt_vacuum_needs_cleanup() had to
consider.

The btm_oldest_btpo_xact metapage field has been repurposed and renamed.
It is now btm_last_cleanup_num_delpages, which is used to remember how
many non-recycled deleted pages remain from the last VACUUM (in practice
its value is usually the precise number of pages that were _newly
deleted_ during the specific VACUUM operation that last set the field).

The general idea behind storing btm_last_cleanup_num_delpages is to use
it to give _some_ consideration to non-recycled deleted pages inside
_bt_vacuum_needs_cleanup() -- though never too much.  We only really
need to avoid leaving a truly excessive number of deleted pages in an
unrecycled state forever.  We only do this to cover certain narrow cases
where no other factor makes VACUUM do a full scan, and yet the index
continues to grow (and so actually misses out on recycling existing
deleted pages).

These metapage changes result in a clear user-visible benefit: We no
longer trigger full index scans during VACUUM operations solely due to
the presence of only 1 or 2 known deleted (though unrecycled) blocks
from a very large index.  All that matters now is keeping the costs and
benefits in balance over time.

Fix an issue that has been around since commit 857f9c36, which added the
"skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup()
logic).  The accuracy of btm_last_cleanup_num_heap_tuples accidentally
hinged upon _when_ the source value gets stored.  We now always store
btm_last_cleanup_num_heap_tuples in btvacuumcleanup().  This fixes the
issue because IndexVacuumInfo.num_heap_tuples (the source field) is
expected to accurately indicate the state of the table _after_ the
VACUUM completes inside btvacuumcleanup().

A backpatchable fix cannot easily be extracted from this commit.  A
targeted fix for the issue will follow in a later commit, though that
won't happen today.

I (pgeoghegan) have chosen to remove any mention of deleted pages in the
documentation of the vacuum_cleanup_index_scale_factor GUC/param, since
the presence of deleted (though unrecycled) pages is no longer of much
concern to users.  The vacuum_cleanup_index_scale_factor description in
the docs now seems rather unclear in any case, and it should probably be
rewritten in the near future.  Perhaps some passing mention of page
deletion will be added back at the same time.

Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
2021-02-24 18:41:34 -08:00
Amit Kapila 8a4f9522d0 Fix relcache reference leak introduced by ce0fdbfe97.
Author: Sawada Masahiko
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAD21AoA7ZEfsOXQ9HQqMv3QYGsEm2H5Wk5ic5S=mvzDf-3a3SA@mail.gmail.com
2021-02-25 07:48:24 +05:30
Michael Paquier bcf2667bf6 Fix some typos, grammar and style in docs and comments
The portions fixing the documentation are backpatched where needed.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210210235557.GQ20012@telsasoft.com
backpatch-through: 9.6
2021-02-24 16:13:17 +09:00
Peter Eisentraut 8ec8fe0f31 Message style fix
Don't quote type name placeholders.
2021-02-24 07:00:49 +01:00
Alvaro Herrera 5a65eacfdc
Fix confusion in comments about generate_gather_paths
d2d8a229bc introduced a new function generate_useful_gather_paths to
be used as a replacement for generate_gather_paths, but forgot to update
a couple of places that referenced the older function.

This is possibly not 100% complete (ref. create_ordered_paths), but it's
better than not changing anything.

Author: "Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/4ce1d5116fe746a699a6d29858c6a39a@G08CNEXMBPEKD05.g08.fujitsu.local
2021-02-23 20:05:15 -03:00
Alvaro Herrera 8deb6b38dc
Reinstate HEAP_XMAX_LOCK_ONLY|HEAP_KEYS_UPDATED as allowed
Commit 866e24d47d added an assert that HEAP_XMAX_LOCK_ONLY and
HEAP_KEYS_UPDATED cannot appear together, on the faulty assumption that
the latter necessarily referred to an update and not a tuple lock; but
that's wrong, because SELECT FOR UPDATE can use precisely that
combination, as evidenced by the amcheck test case added here.

Remove the Assert(), and also patch amcheck's verify_heapam.c to not
complain if the combination is found.  Also, out of overabundance of
caution, update (across all branches) README.tuplock to be more explicit
about this.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/20210124061758.GA11756@nol
2021-02-23 17:30:21 -03:00
Tom Lane 3db05e76f9 Suppress compiler warning in new regex match-all detection code.
gcc 10 is smart enough to notice that control could reach this
"hasmatch[depth]" assignment with depth < 0, but not smart enough
to know that that would require a badly broken NFA graph.  Change
the assert() to a plain runtime test to shut it up.

Per report from Andres Freund.

Discussion: https://postgr.es/m/20210223173437.b3ywijygsy6q42gq@alap3.anarazel.de
2021-02-23 13:55:34 -05:00
Alvaro Herrera d9d076222f
VACUUM: ignore indexing operations with CONCURRENTLY
As envisioned in commit c98763bf51, it is possible for VACUUM to
ignore certain transactions that are executing CREATE INDEX CONCURRENTLY
and REINDEX CONCURRENTLY for the purposes of computing Xmin; that's
because we know those transactions are not going to examine any other
tables, and are not going to execute anything else in the same
transaction.  (Only operations on "safe" indexes can be ignored: those
on indexes that are neither partial nor expressional).

This is extremely useful in cases where CIC/RC can run for a very long
time, because that used to be a significant headache for concurrent
vacuuming of other tables.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql
2021-02-23 12:15:09 -03:00
Peter Eisentraut 6f6f284c7e Simplify printing of LSNs
Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs.
Convert all applicable code to use it.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com
2021-02-23 10:27:02 +01:00
Amit Kapila ade89ba5f4 Fix an oversight in ReorderBufferFinishPrepared.
We don't have anything to decode in a transaction if ReorderBufferTXN
doesn't exist by the time we decode the commit prepared. So don't create a
new ReorderBufferTXN here. This is an oversight in commit a271a1b5.

Reported-by: Markus Wanner
Discussion: https://postgr.es/m/dbec82e2-dbd7-95a2-c6b6-e488cbbdf853@bluegap.ch
2021-02-23 09:47:41 +05:30
Amit Kapila bc617a7b1c Change the error message for logical replication authentication failure.
The authentication failure error message wasn't distinguishing whether
it is a physical replication or logical replication connection failure and
was giving incomplete information on what led to failure in case of logical
replication connection.

Author: Paul Martinez and Amit Kapila
Reviewed-by: Euler Taveira and Amit Kapila
Discussion: https://postgr.es/m/CACqFVBYahrAi2OPdJfUA3YCvn3QMzzxZdw0ibSJ8wouWeDtiyQ@mail.gmail.com
2021-02-23 09:11:22 +05:30
Alvaro Herrera 0f5505a881
Remove pointless HeapTupleHeaderIndicatesMovedPartitions calls
Pavan Deolasee recently noted that a few of the
HeapTupleHeaderIndicatesMovedPartitions calls added by commit
5db6df0c01 are useless, since they are done after comparing t_self
with t_ctid.  But because t_self can never be set to the magical values
that indicate that the tuple moved partition, this can never succeed: if
the first test fails (so we know t_self equals t_ctid), necessarily the
second test will also fail.

So these checks can be removed and no harm is done.  There's no bug
here, just a code legibility issue.

Reported-by: Pavan Deolasee <pavan.deolasee@gmail.com>
Discussion: https://postgr.es/m/20200929164411.GA15497@alvherre.pgsql
2021-02-22 16:51:44 -03:00
Alvaro Herrera 6a03369a71
Fix typo 2021-02-22 11:34:05 -03:00
Thomas Munro beb4480c85 Refactor get_collation_current_version().
The code paths for three different OSes finished up with three different
ways of excluding C[.xxx] and POSIX from consideration.  Merge them.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:32:16 +13:00
Thomas Munro 9cf184cc05 pg_collation_actual_version() -> pg_collation_current_version().
The new name seems a bit more natural.

Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:32:16 +13:00
Thomas Munro 0fb0a0503b Hide internal error for pg_collation_actual_version(<bad OID>).
Instead of an unsightly internal "cache lookup failed" message, just
return NULL for bad OIDs, as is the convention for other similar things.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:01:20 +13:00
Fujii Masao f05ed5a5cf Initialize atomic variable waitStart in PGPROC, at postmaster startup.
Commit 46d6e5f567 added the atomic variable "waitStart" into PGPROC struct,
to store the time at which wait for lock acquisition started. Previously
this variable was initialized every time each backend started. Instead,
this commit makes postmaster initialize it at the startup, to ensure that
the variable should be initialized before any use of it.

This commit also moves the code to initialize "waitStart" variable for
prepare transaction, from TwoPhaseGetDummyProc() to MarkAsPreparingGuts().
Because MarkAsPreparingGuts() is more proper place to do that since
it initializes other PGPROC variables.

Author: Fujii Masao
Reviewed-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/1df88660-6f08-cc6e-b7e2-f85296a2bdab@oss.nttdata.com
2021-02-22 18:25:00 +09:00
Peter Eisentraut efbfb64241 Improve new hash partition bound check error messages
For the error message "every hash partition modulus must be a factor
of the next larger modulus", add a detail message that shows the
particular numbers and existing partition involved.  Also comment the
code more.

Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/bb9d60b4-aadb-607a-1a9d-fdc3434dddcd%40enterprisedb.com
2021-02-22 08:06:45 +01:00
Michael Paquier 9294264278 Use pgstat_progress_update_multi_param() where possible
This commit changes one code path in REINDEX INDEX and one code path
in CREATE INDEX CONCURRENTLY to report the progress of each operation
using pgstat_progress_update_multi_param() rather than
multiple calls to pgstat_progress_update_param().  This has the
advantage to make the progress report more consistent to the end-user
without impacting the amount of information provided.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACV5zW7GxD8D_tyO==bcj6ZktQchEKWKPBOAGKiLhAQo=w@mail.gmail.com
2021-02-22 14:21:40 +09:00
Thomas Munro db8374d804 Remove outdated reference to RAID spindles.
Commit b09ff536 left behind some outdated advice in the long_desc field
of the GUC "effective_io_concurrency".  Remove it.

Back-patch to 13.

Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJyyWqFBxL9gEj-qtjBThGjhAOBE8GBnF8MUJOJ3vrfag%40mail.gmail.com
2021-02-22 14:42:15 +13:00
Tom Lane 190c79884a Simplify memory management for regex DFAs a little.
Coverity complained that functions in regexec.c might leak DFA
storage.  It's wrong, but this logic is confusing enough that it's
not so surprising Coverity couldn't make sense of it.  Rewrite
in hopes of making it more legible to humans as well as machines.
2021-02-21 20:29:11 -05:00
Tom Lane ea1268f630 Avoid generating extra subre tree nodes for capturing parentheses.
Previously, each pair of capturing parentheses gave rise to a separate
subre tree node, whose only function was to identify that we ought to
capture the match details for this particular sub-expression.  In
most cases we don't really need that, since we can perfectly well
put a "capture this" annotation on the child node that does the real
matching work.  As with the two preceding commits, the main value
of this is to avoid generating and optimizing an NFA for a tree node
that's not really pulling its weight.

The chosen data representation only allows one capture annotation
per subre node.  In the legal-per-spec, but seemingly not very useful,
case where there are multiple capturing parens around the exact same
bit of the regex (i.e. "((xyz))"), wrap the child node in N-1 capture
nodes that act the same as before.  We could work harder at that but
I'll refrain, pending some evidence that such cases are worth troubling
over.

In passing, improve the comments in regex.h to say what all the
different re_info bits mean.  Some of them were pretty obvious
but others not so much, so reverse-engineer some documentation.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 19:26:41 -05:00
Tom Lane 5810430894 Convert regex engine's subre tree from binary to N-ary style.
Instead of having left and right child links in subre structs,
have a single child link plus a sibling link.  Multiple children
of a tree node are now reached by chasing the sibling chain.

The beneficiary of this is alternation tree nodes.  A regular
expression with N (>1) branches is now represented by one alternation
node with N children, rather than a tree that includes N alternation
nodes as well as N children.  While the old representation didn't
really cost anything extra at execution time, it was pretty horrid
for compilation purposes, because each of the alternation nodes had
its own NFA, which we were too stupid not to separately optimize.
(To make matters worse, all of those NFAs described the entire
alternation pattern, not just the portion of it that one might
expect from the tree structure.)

We continue to require concatenation nodes to have exactly two
children.  This data structure is now prepared to support more,
but the executor's logic would need some careful redesign, and
it's not clear that a lot of benefit could be had.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 19:07:45 -05:00
Tom Lane cebc1d34e5 Fix regex engine to suppress useless concatenation sub-REs.
The comment for parsebranch() claims that it avoids generating
unnecessary concatenation nodes in the "subre" tree, but it missed
some significant cases.  Once we've decided that a given atom is
"messy" and can't be bundled with the preceding atom(s) of the
current regex branch, parseqatom() always generated two new concat
nodes, one to concat the messy atom to what follows it in the branch,
and an upper node to concatenate the preceding part of the branch
to that one.  But one or both of these could be unnecessary, if the
messy atom is the first, last, or only one in the branch.  Improve
the code to suppress such useless concat nodes, along with the
no-op child nodes representing empty chunks of a branch.

Reducing the number of subre tree nodes offers significant savings
not only at execution but during compilation, because each subre node
has its own NFA that has to be separately optimized.  (Maybe someday
we'll figure out how to share the optimization work across multiple
tree nodes, but it doesn't look easy.)  Eliminating upper tree nodes
is especially useful because they tend to have larger NFAs.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 18:45:29 -05:00
Tom Lane 824bf71902 Recognize "match-all" NFAs within the regex engine.
This builds on the previous "rainbow" patch to detect NFAs that will
match any string, though possibly with constraints on the string length.
This definition is chosen to match constructs such as ".*", ".+", and
".{1,100}".  Recognizing such an NFA after the optimization pass is
fairly cheap, since we basically just have to verify that all arcs
are RAINBOW arcs and count the number of steps to the end state.
(Well, there's a bit of complication with pseudo-color arcs for string
boundary conditions, but not much.)

Once we have these markings, the regex executor functions longest(),
shortest(), and matchuntil() don't have to expend per-character work
to determine whether a given substring satisfies such an NFA; they
just need to check its length against the bounds.  Since some matching
problems require O(N) invocations of these functions, we've reduced
the runtime for an N-character string from O(N^2) to O(N).  Of course,
this is no help for non-matchall sub-patterns, but those usually have
constraints that allow us to avoid needing O(N) substring checks in the
first place.  It's precisely the unconstrained "match-all" cases that
cause the most headaches.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 18:31:19 -05:00
Tom Lane 08c0d6ad65 Invent "rainbow" arcs within the regex engine.
Some regular expression constructs, most notably the "." match-anything
metacharacter, produce a sheaf of parallel NFA arcs covering all
possible colors (that is, character equivalence classes).  We can make
a noticeable improvement in the space and time needed to process large
regexes by replacing such cases with a single arc bearing the special
color code "RAINBOW".  This requires only minor additional complication
in places such as pull() and push().

Callers of pg_reg_getoutarcs() must now be prepared for the possibility
of seeing a RAINBOW arc.  For the one known user, contrib/pg_trgm,
that's a net benefit since it cuts the number of arcs to be dealt with,
and the handling isn't any different than for other colors that contain
too many characters to be dealt with individually.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 18:11:56 -05:00
Fujii Masao 8a55cb5ba9 Fix bug in COMMIT AND CHAIN command.
This commit fixes COMMIT AND CHAIN command so that it starts new transaction
immediately even if savepoints are defined within the transaction to commit.
Previously COMMIT AND CHAIN command did not in that case because
commit 280a408b48 forgot to make CommitTransactionCommand() handle
a transaction chaining when the transaction state was TBLOCK_SUBCOMMIT.

Also this commit adds the regression test for COMMIT AND CHAIN command
when savepoints are defined.

Back-patch to v12 where transaction chaining was added.

Reported-by: Arthur Nascimento
Author: Fujii Masao
Reviewed-by: Arthur Nascimento, Vik Fearing
Discussion: https://postgr.es/m/16867-3475744069228158@postgresql.org
2021-02-19 21:57:52 +09:00
Peter Eisentraut 678d0e239b Update snowball
Update to snowball tag v2.1.0.  Major changes are new stemmers for
Armenian, Serbian, and Yiddish.
2021-02-19 08:10:15 +01:00
Peter Geoghegan b071a31149 Add nbtree README section on page recycling.
Consolidate discussion of how VACUUM places pages in the FSM for
recycling by adding a new section that comes after discussion of page
deletion.  This structure reflects the fact that page recycling is
explicitly decoupled from page deletion in Lanin & Shasha's paper.  Page
recycling in nbtree is an implementation of what the paper calls "the
drain technique".

This decoupling is an important concept for nbtree VACUUM.  Searchers
have to detect and recover from concurrent page deletions, but they will
never have to reason about concurrent page recycling.  Recycling can
almost always be thought of as a low level garbage collection operation
that asynchronously frees the physical space that backs a logical tree
node.  Almost all code need only concern itself with logical tree nodes.
(Note that "logical tree node" is not currently a term of art in the
nbtree code -- this all works implicitly.)

This is preparation for an upcoming patch that teaches nbtree VACUUM to
remember the details of pages that it deletes on the fly, in local
memory.  This enables the same VACUUM operation to consider placing its
own deleted pages in the FSM later on, when it reaches the end of
btvacuumscan().
2021-02-18 21:16:33 -08:00
Tom Lane b5a66e7353 Fix another ancient bug in parsing of BRE-mode regular expressions.
While poking at the regex code, I happened to notice that the bug
squashed in commit afcc8772e had a sibling: next() failed to return
a specific value associated with the '}' token for a "\{m,n\}"
quantifier when parsing in basic RE mode.  Again, this could result
in treating the quantifier as non-greedy, which it never should be in
basic mode.  For that to happen, the last character before "\}" that
sets "nextvalue" would have to set it to zero, or it'd have to have
accidentally been zero from the start.  The failure can be provoked
repeatably with, for example, a bound ending in digit "0".

Like the previous patch, back-patch all the way.
2021-02-18 22:38:55 -05:00
Fujii Masao 614b7f18b3 Fix "invalid spinlock number: 0" error in pg_stat_wal_receiver.
Commit 2c8dd05d6c added the atomic variable writtenUpto into
walreceiver's shared memory information. It's initialized only
when walreceiver started up but could be read via pg_stat_wal_receiver
view anytime, i.e., even before it's initialized. In the server built
with --disable-atomics and --disable-spinlocks, this uninitialized
atomic variable read could cause "invalid spinlock number: 0" error.

This commit changed writtenUpto so that it's initialized at
the postmaster startup, to avoid the uninitialized variable read
via pg_stat_wal_receiver and fix the error.

Also this commit moved the read of writtenUpto after the release
of spinlock protecting walreceiver's shared variables. This is
necessary to prevent new spinlock from being taken by atomic
variable read while holding another spinlock, and to shorten
the spinlock duration. This change leads writtenUpto not to be
consistent with the other walreceiver's shared variables protected
by a spinlock. But this is OK because writtenUpto should not be
used for data integrity checks.

Back-patch to v13 where commit 2c8dd05d6c introduced the bug.

Author: Fujii Masao
Reviewed-by: Michael Paquier, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/7ef8708c-5b6b-edd3-2cf2-7783f1c7c175@oss.nttdata.com
2021-02-18 23:28:15 +09:00
Peter Eisentraut f5465fade9 Allow specifying CRL directory
Add another method to specify CRLs, hashed directory method, for both
server and client side.  This offers a means for server or libpq to
load only CRLs that are required to verify a certificate.  The CRL
directory is specifed by separate GUC variables or connection options
ssl_crl_dir and sslcrldir, alongside the existing ssl_crl_file and
sslcrl, so both methods can be used at the same time.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20200731.173911.904649928639357911.horikyota.ntt@gmail.com
2021-02-18 07:59:10 +01:00
Peter Geoghegan 128dd901a5 nbtree README: move VACUUM linear scan section.
Discuss VACUUM's linear scan after discussion of tuple deletion by
VACUUM, but before discussion of page deletion by VACUUM.  This
progression is a lot more natural.

Also tweak the wording a little.  It seems unnecessary to talk about how
it worked prior to PostgreSQL 8.2.
2021-02-17 21:13:15 -08:00
Tomas Vondra 927f453a94 Fix tuple routing to initialize batching only for inserts
A cross-partition update on a partitioned table is implemented as a
delete followed by an insert. With foreign partitions, this was however
causing issues, because the FDW and core may disagree on when to enable
batching.  postgres_fdw was only allowing batching for plain inserts
(CMD_INSERT) while core was trying to batch the insert component of the
cross-partition update.  Fix by restricting core to apply batching only
to plain CMD_INSERT queries.

It's possible to allow batching for cross-partition updates, but that
will require more extensive changes, so better to leave that for a
separate patch.

Author: Amit Langote
Reviewed-by: Tomas Vondra, Takayuki Tsunakawa
Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-02-18 00:03:45 +01:00
Tom Lane 4e703d6719 Make some minor improvements in the regex code.
Push some hopefully-uncontroversial bits extracted from an upcoming
patch series, to remove non-relevant clutter from the main patches.

In compact(), return immediately after setting REG_ASSERT error;
continuing the loop would just lead to assertion failure below.
(Ask me how I know.)

In parseqatom(), remove assertion that moresubs() did its job.
When moresubs actually did its job, this is redundant with that
function's final assert; but when it failed on OOM, this is an
assertion crash.  We could avoid the crash by adding a NOERR()
check before the assertion, but it seems better to subtract code
than add it.  (Note that there's a NOERR exit a few lines further
down, and nothing else between here and there requires moresubs
to have succeeded.  So we don't really need an extra error exit.)
This is a live bug in assert-enabled builds, but given the very
low likelihood of OOM in moresub's tiny allocation, I don't think
it's worth back-patching.

On the other hand, it seems worthwhile to add an assertion that
our intended v->subs[subno] target is still null by the time we
are ready to insert into it, since there's a recursion in between.

In pg_regexec, ensure we fflush any debug output on the way out,
and try to make MDEBUG messages more uniform and helpful.  (In
particular, ensure that all of them are prefixed with the subre's
id number, so one can match up entry and exit reports.)

Add some test cases in test_regex to improve coverage of lookahead
and lookbehind constraints.  Adding these now is mainly to establish
that this is indeed the existing behavior.

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-17 12:24:23 -05:00
Peter Eisentraut f40c6969d0 Routine usage information schema tables
Several information schema views track dependencies between
functions/procedures and objects used by them.  These had not been
implemented so far because PostgreSQL doesn't track objects used in a
function body.  However, formally, these also show dependencies used
in parameter default expressions, which PostgreSQL does support and
track.  So for the sake of completeness, we might as well add these.
If dependency tracking for function bodies is ever implemented, these
views will automatically work correctly.

Reviewed-by: Erik Rijkers <er@xs4all.nl>
Discussion: https://www.postgresql.org/message-id/flat/ac80fc74-e387-8950-9a31-2560778fc1e3%40enterprisedb.com
2021-02-17 18:16:06 +01:00
Peter Eisentraut 0e392fcc0d Use errmsg_internal for debug messages
An inconsistent set of debug-level messages was not using
errmsg_internal(), thus uselessly exposing the messages to translation
work.  Fix those.
2021-02-17 11:33:25 +01:00
Tom Lane 38bb3aef35 Convert tsginidx.c's GIN indexing logic to fully ternary operation.
Commit 2f2007fbb did this partially, but there were two remaining
warts.  checkcondition_gin handled some uncertain cases by setting
the out-of-band recheck flag, some by returning TS_MAYBE, and some
by doing both.  Meanwhile, TS_execute arbitrarily converted a
TS_MAYBE result to TS_YES.  Thus, if checkcondition_gin chose to
only return TS_MAYBE, the outcome would be TS_YES with no recheck
flag, potentially resulting in wrong query outputs.

The case where this'd happen is if there were GIN_MAYBE entries
in the indexscan results passed to gin_tsquery_[tri]consistent,
which so far as I can see would only happen if the tidbitmap used
to accumulate indexscan results grew large enough to become lossy.

I initially thought of fixing this by ensuring we always set the
recheck flag as well as returning TS_MAYBE in uncertain cases.
But that errs in the other direction, potentially forcing rechecks
of rows that provably match the query (since the recheck flag
remains set even if TS_execute later finds that the answer must be
TS_YES).  Instead, let's get rid of the out-of-band recheck flag
altogether and rely on returning TS_MAYBE.  This requires exporting
a version of TS_execute that will actually return the full ternary
result of the evaluation ... but we likely should have done that
to start with.

Unfortunately it doesn't seem practical to add a regression test case
that covers this: the amount of data needed to cause the GIN bitmap to
become lossy results in a longer runtime than I think we want to have
in the tests.  (I'm wondering about allowing smaller work_mem settings
to ameliorate that, but it'd be a matter for a separate patch.)

Per bug #16865 from Dimitri Nüscheler.  Back-patch to v13 where
the faulty commit came in.

Discussion: https://postgr.es/m/16865-4ffdc3e682e6d75b@postgresql.org
2021-02-16 12:07:14 -05:00
Amit Kapila f672df5fdd Remove the unnecessary PrepareWrite in pgoutput.
This issue exists from the inception of this code (PG-10) but got exposed
by the recent commit ce0fdbfe97 where we are using origins in tablesync
workers. The problem was that we were sometimes sending the prepare_write
('w') message but then the actual message was not being sent and on the
subscriber side, we always expect a message after prepare_write message
which led to this bug.

I refrained from backpatching this because there is no way in the core
code to hit this prior to commit ce0fdbfe97 and we haven't received any
complaints so far.

Reported-by: Erik Rijkers
Author: Amit Kapila and Vignesh C
Tested-by: Erik Rijkers
Discussion: https://postgr.es/m/1295168140.139428.1613133237154@webmailclassic.xs4all.nl
2021-02-16 07:26:50 +05:30
Andres Freund 8001cb77ee Fix heap_page_prune() parameter order confusion introduced in dc7420c2c9.
Both luckily and unluckily the passed values meant the same for all
types. Luckily because that meant my confusion caused no harm,
unluckily because otherwise the compiler might have warned...

In passing, synchronize parameter names between definition and
declaration.

Reported-By: Peter Geoghegan <pg@bowt.ie>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wz=L=nBoepQdH9b5Qd0nMvepFT2CnT6sjWvvpOXa=K8HVQ@mail.gmail.com
2021-02-15 17:12:12 -08:00
Andres Freund a975ff4980 Remove backwards compat ugliness in snapbuild.c.
In 955a684e04 we fixed a bug in initial snapshot creation. In the
course of which several members of struct SnapBuild were obsoleted. As
SnapBuild is serialized to disk we couldn't change the memory layout.

Unfortunately I subsequently forgot about removing the backward compat
gunk, but luckily Heikki just reminded me.

This commit bumps SNAPBUILD_VERSION, therefore breaking existing
slots (which is fine in a major release).

Author: Andres Freund
Reminded-By: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/c94be044-818f-15e3-1ad3-7a7ae2dfed0a@iki.fi
2021-02-15 16:57:47 -08:00
Tom Lane 0e52903128 Simplify loop logic in nodeIncrementalSort.c.
The inner loop in switchToPresortedPrefixMode() can be implemented
as a conventional integer-counter for() loop, removing a couple of
redundant boolean state variables.  The old logic here was a remnant
of earlier development, but as things now stand there's no reason
for extra complexity.

Also, annotate the test case added by 82e0e2930 to explain why it
manages to hit the corner case fixed in that commit, and add an
EXPLAIN to verify that it's creating an incremental-sort plan.

Back-patch to v13, like the previous patch.

James Coleman and Tom Lane

Discussion: https://postgr.es/m/16846-ae49f51ac379a4cb@postgresql.org
2021-02-15 10:17:58 -05:00
Heikki Linnakangas 54e51dcde0 Make ExecGetInsertedCols() and friends more robust and improve comments.
If ExecGetInsertedCols(), ExecGetUpdatedCols() or ExecGetExtraUpdatedCols()
were called with a ResultRelInfo that's not in the range table and isn't a
partition routing target, the functions would dereference a NULL pointer,
relinfo->ri_RootResultRelInfo. Such ResultRelInfos are created when firing
RI triggers in tables that are not modified directly. None of the current
callers of these functions pass such relations, so this isn't a live bug,
but let's make them more robust.

Also update comment in ResultRelInfo; after commit 6214e2b228,
ri_RangeTableIndex is zero for ResultRelInfos created for partition tuple
routing.

Noted by Coverity. Backpatch down to v11, like commit 6214e2b228.

Reviewed-by: Tom Lane, Amit Langote
2021-02-15 09:28:08 +02:00
Fujii Masao 46d6e5f567 Display the time when the process started waiting for the lock, in pg_locks, take 2
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

The first attempt of this patch (commit 3b733fcd04) caused the buildfarm
member "rorqual" (built with --disable-atomics --disable-spinlocks) to report
the failure of the regression test. It was reverted by commit 890d2182a2.
The cause of this failure was that the atomic variable for "waitstart"
in the dummy process entry created at the end of prepare transaction was
not initialized. This second attempt fixes that issue.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-15 15:13:37 +09:00
Peter Geoghegan 7cde6b13a9 Adjust lazy_scan_heap() accounting comments.
Explain which particular LP_DEAD line pointers get accounted for by the
tups_vacuumed variable.
2021-02-14 19:28:37 -08:00
Thomas Munro f900a79ecd Default to wal_sync_method=fdatasync on FreeBSD.
FreeBSD 13 gained O_DSYNC, which would normally cause wal_sync_method to
choose open_datasync as its default value.  That may not be a good
choice for all systems, and performs worse than fdatasync in some
scenarios.  Let's preserve the existing default behavior for now.

Like commit 576477e73c, which did the same for Linux, back-patch to all
supported releases.

Discussion: https://postgr.es/m/CA%2BhUKGLsAMXBQrCxCXoW-JsUYmdOL8ALYvaX%3DCrHqWxm-nWbGA%40mail.gmail.com
2021-02-15 16:04:59 +13:00
Amit Kapila d9b0767bec Fix the warnings introduced in commit ce0fdbfe97.
Author: Amit Kapila
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/1610789.1613170207@sss.pgh.pa.us
2021-02-15 07:28:02 +05:30
Thomas Munro 637668fb1d Hold interrupts while running dsm_detach() callbacks.
While cleaning up after a parallel query or parallel index creation that
created temporary files, we could be interrupted by a statement timeout.
The error handling path would then fail to clean up the files when it
ran dsm_detach() again, because the callback was already popped off the
list.  Prevent this hazard by holding interrupts while the cleanup code
runs.

Thanks to Heikki Linnakangas for this suggestion, and also to Kyotaro
Horiguchi, Masahiko Sawada, Justin Pryzby and Tom Lane for discussion of
this and earlier ideas on how to fix the problem.

Back-patch to all supported releases.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20191212180506.GR2082@telsasoft.com
2021-02-15 14:27:33 +13:00
Michael Paquier b83dcf7928 Add result size as argument of pg_cryptohash_final() for overflow checks
With its current design, a careless use of pg_cryptohash_final() could
would result in an out-of-bound write in memory as the size of the
destination buffer to store the result digest is not known to the
cryptohash internals, without the caller knowing about that.  This
commit adds a new argument to pg_cryptohash_final() to allow such sanity
checks, and implements such defenses.

The internals of SCRAM for HMAC could be tightened a bit more, but as
everything is based on SCRAM_KEY_LEN with uses particular to this code
there is no need to complicate its interface more than necessary, and
this comes back to the refactoring of HMAC in core.  Except that, this
minimizes the uses of the existing DIGEST_LENGTH variables, relying
instead on sizeof() for the result sizes.  In ossp-uuid, this also makes
the code more defensive, as it already relied on dce_uuid_t being at
least the size of a MD5 digest.

This is in philosophy similar to cfc40d3 for base64.c and aef8948 for
hex.c.

Reported-by: Ranier Vilela
Author: Michael Paquier, Ranier Vilela
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAEudQAoqEGmcff3J4sTSV-R_16Monuz-UpJFbf_dnVH=APr02Q@mail.gmail.com
2021-02-15 10:18:34 +09:00
Tom Lane 2dd6733108 Minor fixes to improve regex debugging code.
When REG_DEBUG is defined, ensure that an un-filled "struct cnfa"
is all-zeroes, not just that it has nstates == 0.  This is mainly
so that looking at "struct subre" structs in gdb doesn't distract
one with a lot of garbage fields during regex compilation.

Adjust some places that print debug output to have suitable fflush
calls afterwards.

In passing, correct an erroneous ancient comment: the concatenation
subre-s created by parsebranch() have op == '.' not ','.

Noted while fooling around with some regex performance improvements.
2021-02-14 19:53:42 -05:00
Thomas Munro c7ecd6af01 ReadNewTransactionId() -> ReadNextTransactionId().
The new name conveys the effect better, is more consistent with similar
functions ReadNextMultiXactId(), ReadNextFullTransactionId(), and
matches the name of the variable that it reads.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmVR4SakBXQUdhhPpMf1aYvZCnna5%3DHKa7DAgEmBAg%2B8g%40mail.gmail.com
2021-02-15 13:17:02 +13:00
Bruce Momjian 8facf1ea00 README/C-comment: document GiST's NSN value 2021-02-13 13:50:49 -05:00
Tom Lane ae4867ec74 Avoid divide-by-zero in regex_selectivity() with long fixed prefix.
Given a regex pattern with a very long fixed prefix (approaching 500
characters), the result of pow(FIXED_CHAR_SEL, fixed_prefix_len) can
underflow to zero.  Typically the preceding selectivity calculation
would have underflowed as well, so that we compute 0/0 and get NaN.
In released branches this leads to an assertion failure later on.
That doesn't happen in HEAD, for reasons I've not explored yet,
but it's surely still a bug.

To fix, just skip the division when the pow() result is zero, so
that we'll (most likely) return a zero selectivity estimate.  In
the edge cases where "sel" didn't yet underflow, perhaps this
isn't desirable, but I'm not sure that the case is worth spending
a lot of effort on.  The results of regex_selectivity_sub() are
barely worth the electrons they're written on anyway :-(

Per report from Alexander Lakhin.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/6de0a0c3-ada9-cd0c-3e4e-2fa9964b41e3@gmail.com
2021-02-12 16:26:47 -05:00
Amit Kapila ce0fdbfe97 Allow multiple xacts during table sync in logical replication.
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.

There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.

This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.

The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and
ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.

This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.

Bump catalog version due to change of state in the catalog
(pg_subscription_rel).

Author: Peter Smith, Amit Kapila, and Takamichi Osumi
Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
2021-02-12 07:41:51 +05:30
Peter Geoghegan 3063eb1759 Remove obsolete IndexBulkDeleteResult stats field.
The pages_removed field is no longer used for anything.  It hasn't been
possible for an index to physically shrink since old-style VACUUM FULL
was removed by commit 0a469c87.
2021-02-11 16:49:41 -08:00
Tom Lane 69036aafb9 Simplify jsonfuncs.c code by using strtoint() not strtol().
Explicitly testing for INT_MIN and INT_MAX isn't particularly good
style; it's tedious and may draw useless compiler warnings on
machines where int and long are the same width.  We invented
strtoint() precisely for this usage, so use that instead.

While here, remove gratuitous variations in the way the tests for
did-strtoint-succeed were spelled.  Also, avoid attempting to
negate INT_MIN; that would probably work given that the result
is implicitly cast to uint32, but I think it's nominally undefined
behavior.

Per gripe from Ranier Vilela, though this isn't his proposed patch.

Discussion: https://postgr.es/m/CAEudQAqge3QfzoBRhe59QrB_5g+NmQUj2QpzqZ9Nc7QepXGAEw@mail.gmail.com
2021-02-11 12:49:22 -05:00
Tom Lane d4c746516b Remove no-longer-used RTE argument of markVarForSelectPriv().
In the wake of c028faf2a, this is no longer needed.  I left it
out of that patch since the API change would be undesirable in
a released branch; but there's no reason not to do it in HEAD.
2021-02-11 11:23:25 -05:00
Peter Eisentraut 4ad5611055 Fix lack of message pluralization 2021-02-10 11:35:45 +01:00
Michael Paquier 092b785fad Simplify code related to compilation of SSL and OpenSSL
This commit makes more generic some comments and code related to the
compilation with OpenSSL and SSL in general to ease the addition of more
SSL implementations in the future.  In libpq, some OpenSSL-only code is
moved under USE_OPENSSL and not USE_SSL.

While on it, make a comment more consistent in libpq-fe.h.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/5382CB4A-9CF3-4145-BA46-C802615935E0@yesql.se
2021-02-10 15:28:19 +09:00
Michael Paquier bd12080980 Preserve pg_attribute.attstattarget across REINDEX CONCURRENTLY
For an index, attstattarget can be updated using ALTER INDEX SET
STATISTICS.  This data was lost on the new index after REINDEX
CONCURRENTLY.

The update of this field is done when the old and new indexes are
swapped to make the fix back-patchable.  Another approach we could look
after in the long-term is to change index_create() to pass the wanted
values of attstattarget when creating the new relation, but, as this
would cause an ABI breakage this can be done only on HEAD.

Reported-by: Ronan Dunklau
Author: Michael Paquier
Reviewed-by: Ronan Dunklau, Tomas Vondra
Discussion: https://postgr.es/m/16628084.uLZWGnKmhe@laptop-ronand
Backpatch-through: 12
2021-02-10 13:06:48 +09:00
Amit Kapila cd142e032e Make pg_replication_origin_drop safe against concurrent drops.
Currently, we get the origin id from the name and then drop the origin by
taking ExclusiveLock on ReplicationOriginRelationId. So, two concurrent
sessions can get the id from the name at the same time and then when they
try to drop the origin, one of the sessions will get the either
"tuple concurrently deleted" or "cache lookup failed for replication
origin ..".

To prevent this race condition we do the entire operation under lock. This
obviates the need for replorigin_drop() API and we have removed it so if
any extension authors are using it they need to instead use
replorigin_drop_by_name. See it's usage in pg_replication_origin_drop().

Author: Peter Smith
Reviewed-by: Amit Kapila, Euler Taveira, Petr Jelinek, and Alvaro
Herrera
Discussion: https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com
2021-02-10 07:17:09 +05:30
Peter Geoghegan 31c7fb41e2 Fix obsolete FSM remarks in nbtree README.
The free space map has used a dedicated relation fork rather than shared
memory segments for over a decade.
2021-02-09 11:36:51 -08:00
Fujii Masao 890d2182a2 Revert "Display the time when the process started waiting for the lock, in pg_locks."
This reverts commit 3b733fcd04.

Per buildfarm members prion and rorqual.
2021-02-09 18:30:40 +09:00
Fujii Masao 3b733fcd04 Display the time when the process started waiting for the lock, in pg_locks.
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-09 18:10:19 +09:00
Michael Paquier 7cb3048f38 Add option PROCESS_TOAST to VACUUM
This option controls if toast tables associated with a relation are
vacuumed or not when running a manual VACUUM.  It was already possible
to trigger a manual VACUUM on a toast relation without processing its
main relation, but a manual vacuum on a main relation always forced a
vacuum on its toast table.  This is useful in scenarios where the level
of bloat or transaction age of the main and toast relations differs a
lot.

This option is an extension of the existing VACOPT_SKIPTOAST that was
used by autovacuum to control if toast relations should be skipped or
not.  This internal flag is renamed to VACOPT_PROCESS_TOAST for
consistency with the new option.

A new option switch, called --no-process-toast, is added to vacuumdb.

Author: Nathan Bossart
Reviewed-by: Kirk Jamison, Michael Paquier, Justin Pryzby
Discussion: https://postgr.es/m/BA8951E9-1524-48C5-94AF-73B1F0D7857F@amazon.com
2021-02-09 14:13:57 +09:00
Tom Lane c028faf2a6 Fix mishandling of column-level SELECT privileges for join aliases.
scanNSItemForColumn, expandNSItemAttrs, and ExpandSingleTable would
pass the wrong RTE to markVarForSelectPriv when dealing with a join
ParseNamespaceItem: they'd pass the join RTE, when what we need to
mark is the base table that the join column came from.  The end
result was to not fill the base table's selectedCols bitmap correctly,
resulting in an understatement of the set of columns that are read
by the query.  The executor would still insist on there being at
least one selectable column; but with a correctly crafted query,
a user having SELECT privilege on just one column of a table would
nonetheless be allowed to read all its columns.

To fix, make markRTEForSelectPriv fetch the correct RTE for itself,
ignoring the possibly-mismatched RTE passed by the caller.  Later,
we'll get rid of some now-unused RTE arguments, but that risks
API breaks so we won't do it in released branches.

This problem was introduced by commit 9ce77d75c, so back-patch
to v13 where that came in.  Thanks to Sven Klemm for reporting
the problem.

Security: CVE-2021-20229
2021-02-08 10:14:09 -05:00
Heikki Linnakangas 6214e2b228 Fix permission checks on constraint violation errors on partitions.
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.

The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.

ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.

With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.

NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.

Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.

Reviewed-by: Amit Langote
Security: CVE-2021-3393
2021-02-08 11:01:51 +02:00
Peter Geoghegan 617fffee8a Rename removable xid function for consistency.
GlobalVisIsRemovableFullXid() is now GlobalVisCheckRemovableFullXid().
This is consistent with the general convention for FullTransactionId
equivalents of functions that deal with TransactionId values.  It now
matches the nearby GlobalVisCheckRemovableXid() function, which performs
the same check for callers that use TransactionId values.

Oversight in commit dc7420c2c9.

Discussion: https://postgr.es/m/CAH2-Wzmes12jFNDcVgpU89Vp=r6uLFrE-MT0fjSWGsE70UiNaA@mail.gmail.com
2021-02-07 10:11:14 -08:00
Tom Lane d1d2979852 Revert "Propagate CTE property flags when copying a CTE list into a rule."
This reverts commit ed29089633 and
equivalent back-branch commits.  The issue is subtler than I thought,
and it's far from new, so just before a release deadline is no time
to be fooling with it.  We'll consider what to do at a bit more
leisure.

Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-02-07 12:54:08 -05:00
Tom Lane ed29089633 Propagate CTE property flags when copying a CTE list into a rule.
rewriteRuleAction() neglected this step, although it was careful to
propagate other similar flags such as hasSubLinks or hasRowSecurity.
Omitting to transfer hasRecursive is just cosmetic at the moment,
but omitting hasModifyingCTE is a live bug, since the executor
certainly looks at that.

The proposed test case only fails back to v10, but since the executor
examines hasModifyingCTE in 9.x as well, I suspect that a test case
could be devised that fails in older branches.  Given the nearness
of the release deadline, though, I'm not going to spend time looking
for a better test.

Report and patch by Greg Nancarrow, cosmetic changes by me

Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-02-06 19:28:39 -05:00
Tom Lane dd705a039f Disallow converting an inheritance child table to a view.
Generally, members of inheritance trees must be plain tables (or,
in more recent versions, foreign tables).  ALTER TABLE INHERIT
rejects creating an inheritance relationship that has a view at
either end.  When DefineQueryRewrite attempts to convert a relation
to a view, it already had checks prohibiting doing so for partitioning
parents or children as well as traditional-inheritance parents ...
but it neglected to check that a traditional-inheritance child wasn't
being converted.  Since the planner assumes that any inheritance
child is a table, this led to making plans that tried to do a physical
scan on a view, causing failures (or even crashes, in recent versions).

One could imagine trying to support such a case by expanding the view
normally, but since the rewriter runs before the planner does
inheritance expansion, it would take some very fundamental refactoring
to make that possible.  There are probably a lot of other parts of the
system that don't cope well with such a situation, too.  For now,
just forbid it.

Per bug #16856 from Yang Lin.  Back-patch to all supported branches.
(In versions before v10, this includes back-patching the portion of
commit 501ed02cf that added has_superclass().  Perhaps the lack of
that infrastructure partially explains the missing check.)

Discussion: https://postgr.es/m/16856-0363e05c6e1612fd@postgresql.org
2021-02-06 15:17:01 -05:00
Michael Paquier f7400823c3 Clarify some comments around SharedRecoveryState in xlog.c
SharedRecoveryState has been switched from a boolean to an enum as of
commit 4e87c48, but some comments still referred to it as a boolean.

Author: Amul Sul
Reviewed-by: Dilip Kumar, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAAJ_b97Hf+1SXnm8jySpO+Fhm+-VKFAAce1T_cupUYtnE3Nxig
2021-02-06 10:27:55 +09:00
Heikki Linnakangas c444472af5 Fix backslash-escaping multibyte chars in COPY FROM.
If a multi-byte character is escaped with a backslash in TEXT mode input,
and the encoding is one of the client-only encodings where the bytes after
the first one can have an ASCII byte "embedded" in the char, we didn't
skip the character correctly. After a backslash, we only skipped the first
byte of the next character, so if it was a multi-byte character, we would
try to process its second byte as if it was a separate character. If it
was one of the characters with special meaning, like '\n', '\r', or
another '\\', that would cause trouble.

One such exmple is the byte sequence '\x5ca45c2e666f6f' in Big5 encoding.
That's supposed to be [backslash][two-byte character][.][f][o][o], but
because the second byte of the two-byte character is 0x5c, we incorrectly
treat it as another backslash. And because the next character is a dot, we
parse it as end-of-copy marker, and throw an "end-of-copy marker corrupt"
error.

Backpatch to all supported versions.

Reviewed-by: John Naylor, Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/a897f84f-8dca-8798-3139-07da5bb38728%40iki.fi
2021-02-05 11:14:56 +02:00
Tom Lane 0ff865fbe5 Fix bug in HashAgg's selective-column-spilling logic.
Commit 230230223 taught nodeAgg.c that, when spilling tuples from
memory in an oversized hash aggregation, it only needed to spill
input columns referenced in the node's tlist and quals.  Unfortunately,
that's wrong: we also have to save the grouping columns.  The error
is masked in common cases because the grouping columns also appear
in the tlist, but that's not necessarily true.  The main category
of plans where it's not true seem to come from semijoins ("WHERE
outercol IN (SELECT innercol FROM innertable)") where the innercol
needs an implicit promotion to make it comparable to the outercol.
The grouping column will be "innercol::promotedtype", but that
expression appears nowhere in the Agg node's own tlist and quals;
only the bare "innercol" is found in the tlist.

I spent quite a bit of time looking for a suitable regression test
case for this, without much success.  If the number of distinct
values of the innercol is large enough to make spilling happen,
the planner tends to prefer a non-HashAgg plan, at least for
problem sizes that are reasonable to use in the regression tests.
So, no new regression test.  However, this patch does demonstrably
fix the originally-reported test case.

Per report from s.p.e (at) gmx-topmail.de.  Backpatch to v13
where the troublesome code came in.

Discussion: https://postgr.es/m/trinity-1c565d44-159f-488b-a518-caf13883134f-1611835701633@3c-app-gmx-bap78
2021-02-04 23:01:37 -05:00
Tom Lane 82e0e29308 Fix YA incremental sort bug.
switchToPresortedPrefixMode() did the wrong thing if it detected
a batch boundary just at the last tuple of a fullsort group.

The initially-reported symptom was a "retrieved too many tuples in a
bounded sort" error, but the test case added here just silently gives
the wrong answer without this patch.

I (tgl) am not really happy about committing this patch without review
from the incremental-sort authors, but they seem AWOL and we are hard
against a release deadline.  This does demonstrably make some cases
better, anyway.

Per bug #16846 from Yoran Heling.  Back-patch to v13 where incremental
sort was introduced.

Neil Chen

Discussion: https://postgr.es/m/16846-ae49f51ac379a4cb@postgresql.org
2021-02-04 19:12:14 -05:00
Peter Geoghegan c34787f910 Harden nbtree page deletion.
Add some additional defensive checks in the second phase of index
deletion to detect and report index corruption during VACUUM, and to
avoid having VACUUM become stuck in more cases.  The code is still not
robust in the presence of a circular chain of sibling links, though it's
not clear whether that really matters.  This is follow-up work to commit
3a01f68e.

The new defensive checks rely on the assumption that there can be no
more than one VACUUM operation running for an index at any given time.
Remove an old comment suggesting that multiple concurrent VACUUMs need
to be considered here.  This concern now seems highly unlikely to have
any real validity, since we clearly rely on the same assumption in
several other places.  For example, there are much more recent comments
that appear in the same function (added by commit efada2b8e9) that make
the same assumption.

Also add a CHECK_FOR_INTERRUPTS() to the relevant code path.  Contrary
to comments added by commit 3a01f68e, it is actually possible to handle
interrupts here, at least in the common case where processing takes
place at the leaf level.  We only hold a pin on leafbuf/target page when
stepping right at the leaf level.

No backpatch due to the lack of complaints following hardening added to
the same area by commit 3a01f68e.
2021-02-04 15:42:36 -08:00
Heikki Linnakangas 2f86ab305e Fix small error in COPY FROM progress reporting.
The # of bytes processed was accumulated slightly incorrectly. After
loading more data to the input buffer, we added the number of bytes in
the buffer to the sum. But in case of multi-byte characters or escapes,
there can be a few unprocessed bytes left over from previous load in the
buffer. Those bytes got counted twice.
2021-02-04 17:40:33 +02:00
Peter Eisentraut 3c78e0569c Refactor Windows error message for easier translation
In the error messages referring to the user right "Lock pages in
memory", this is a term from the Windows OS, so it should be
translated in accordance with the OS localization.  Refactor the error
messages so this is easier and clearer.  Also fix the capitalization
to match the existing capitalization in the OS.
2021-02-04 13:31:13 +01:00
Michael Paquier 5128483d06 Ensure unlinking of old index file with REINDEX (TABLESPACE)
The original versions of the patch included this part, but a mismerge
from my side has made this piece go missing.  Oversight in c5b28604.
2021-02-04 17:16:47 +09:00
Michael Paquier fc749bc704 Clarify comment in tablesync.c
Author: Peter Smith
Reviewed-by: Amit Kapila, Michael Paquier, Euler Taveira
Discussion: https://postgr.es/m/CAHut+Pt9_T6pWar0FLtPsygNmme8HPWPdGUyZ_8mE1Yvjdf0ZA@mail.gmail.com
2021-02-04 16:02:31 +09:00
Michael Paquier c5b286047c Add TABLESPACE option to REINDEX
This patch adds the possibility to move indexes to a new tablespace
while rebuilding them.  Both the concurrent and the non-concurrent cases
are supported, and the following set of restrictions apply:
- When using TABLESPACE with a REINDEX command that targets a
partitioned table or index, all the indexes of the leaf partitions are
moved to the new tablespace.  The tablespace references of the non-leaf,
partitioned tables in pg_class.reltablespace are not changed. This
requires an extra ALTER TABLE SET TABLESPACE.
- Any index on a toast table rebuilt as part of a parent table is kept
in its original tablespace.
- The operation is forbidden on system catalogs, including trying to
directly move a toast relation with REINDEX.  This results in an error
if doing REINDEX on a single object.  REINDEX SCHEMA, DATABASE and
SYSTEM skip system relations when TABLESPACE is used.

Author: Alexey Kondratov, Michael Paquier, Justin Pryzby
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
2021-02-04 14:34:20 +09:00
Tom Lane 9624321ec5 Avoid crash when rolling back within a prepared statement.
If a portal is used to run a prepared CALL or DO statement that
contains a ROLLBACK, PortalRunMulti fails because the portal's
statement list gets cleared by the rollback.  (Since the grammar
doesn't allow CALL/DO in PREPARE, the only easy way to get to this is
via extended query protocol, which treats all inputs as prepared
statements.)  It's difficult to avoid resetting the portal early
because of resource-management issues, so work around this by teaching
PortalRunMulti to be wary of portal->stmts having suddenly become NIL.

The crash has only been seen to occur in v13 and HEAD (as a
consequence of commit 1cff1b95a having added an extra touch of
portal->stmts).  But even before that, the code involved touching a
List that the portal no longer has any claim on.  In the test case at
hand, the List will still exist because of another refcount on the
cached plan; but I'm far from convinced that it's impossible for the
cached plan to have been dropped by the time control gets back to
PortalRunMulti.  Hence, backpatch to v11 where nested transactions
were added.

Thomas Munro and Tom Lane, per bug #16811 from James Inform

Discussion: https://postgr.es/m/16811-c1b599b2c6c2d622@postgresql.org
2021-02-03 19:38:43 -05:00
Tom Lane ba0faf81c6 Remove special BKI_LOOKUP magic for namespace and role OIDs.
Now that commit 62f34097c attached BKI_LOOKUP annotation to all the
namespace and role OID columns in the catalogs, there's no real reason
to have the magic PGNSP and PGUID symbols.  Get rid of them in favor
of implementing those lookups according to genbki.pl's normal pattern.

This means that in the catalog headers, BKI_DEFAULT(PGNSP) becomes
BKI_DEFAULT(pg_catalog), which seems a lot more transparent.
BKI_DEFAULT(PGUID) becomes BKI_DEFAULT(POSTGRES), which is perhaps
less so; but you can look into pg_authid.dat to discover that
POSTGRES is the nonce name for the bootstrap superuser.

This change also means that if we ever need cross-references in the
initial catalog data to any of the other built-in roles besides
POSTGRES, or to some other built-in schema besides pg_catalog,
we can just do it.

No catversion bump here, as there's no actual change in the contents
of postgres.bki.

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-03 12:01:48 -05:00
Tom Lane 62f34097c8 Build in some knowledge about foreign-key relationships in the catalogs.
This follows in the spirit of commit dfb75e478, which created primary
key and uniqueness constraints to improve the visibility of constraints
imposed on the system catalogs.  While our catalogs contain many
foreign-key-like relationships, they don't quite follow SQL semantics,
in that the convention for an omitted reference is to write zero not
NULL.  Plus, we have some cases in which there are arrays each of whose
elements is supposed to be an FK reference; SQL has no way to model that.
So we can't create actual foreign key constraints to describe the
situation.  Nonetheless, we can collect and use knowledge about these
relationships.

This patch therefore adds annotations to the catalog header files to
declare foreign-key relationships.  (The BKI_LOOKUP annotations cover
simple cases, but we weren't previously distinguishing which such
columns are allowed to contain zeroes; we also need new markings for
multi-column FK references.)  Then, Catalog.pm and genbki.pl are
taught to collect this information into a table in a new generated
header "system_fk_info.h".  The only user of that at the moment is
a new SQL function pg_get_catalog_foreign_keys(), which exposes the
table to SQL.  The oidjoins regression test is rewritten to use
pg_get_catalog_foreign_keys() to find out which columns to check.
Aside from removing the need for manual maintenance of that test
script, this allows it to cover numerous relationships that were not
checked by the old implementation based on findoidjoins.  (As of this
commit, 217 relationships are checked by the test, versus 181 before.)

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-02 17:11:55 -05:00
Peter Eisentraut 1d71f3c83c Improve confusing variable names
The prototype calls the second argument of
pgstat_progress_update_multi_param() "index", and some callers name
their local variable that way.  But when the surrounding code deals
with index relations, this is confusing, and in at least one case
shadowed another variable that is referring to an index relation.
Adjust those call sites to have clearer local variable naming, similar
to existing callers in indexcmds.c.
2021-02-02 09:20:22 +01:00
Michael Paquier 4ad31bb2ef Remove unused column atttypmod from initial tablesync query
The initial tablesync done by logical replication used a query to fetch
the information of a relation's columns that included atttypmod, but it
was left unused.  This was added by 7c4f524.

Author: Euler Taveira
Reviewed-by: Önder Kalacı, Amit Langote, Japin Li
Discussion: https://postgr.es/m/CAHE3wggb715X+mK_DitLXF25B=jE6xyNCH4YOwM860JR7HarGQ@mail.gmail.com
2021-02-02 13:59:23 +09:00
Tom Lane f003a7522b Remove [Merge]AppendPath.partitioned_rels.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

The preceding two commits did away with all use of the partitioned_rels
values; this commit just mechanically removes the fields and the code
that calculated them.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:43:54 -05:00
Tom Lane 5076f88bc9 Remove incidental dependencies on partitioned_rels lists.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

This patch undoes a couple of very minor uses of the partitioned_rels
values.

createplan.c was testing for nil-ness to optimize away the preparatory
work for make_partition_pruneinfo().  That is worth doing if the check
is nigh free, but it's not worth going to any great lengths to avoid.

create_append_path() was testing for nil-ness as part of deciding how
to set up ParamPathInfo for an AppendPath.  I replaced that with a
check for the appendrel's parent rel being partitioned.  That's not
quite the same thing but should cover most cases.  If we note any
interesting loss of optimizations, we can dumb this down to just
always use the more expensive method when the parent is a baserel.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:34:59 -05:00
Tom Lane fb2d645dd5 Revise make_partition_pruneinfo to not use its partitioned_rels input.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

As a first step, teach make_partition_pruneinfo to collect the relevant
partitioned tables for itself.  It's not hard to do so by traversing
from child tables up to parents using the AppendRelInfo links.

While here, make some minor stylistic improvements; mainly, don't use
the "Relids" alias for bitmapsets that are not identities of any
relation considered by the planner.  Try to document the logic better,
too.

No backpatch, as there does not seem to be a live problem before
a929e17e5a.  Also no new regression test; the code where the bug
was will be gone at the end of this patch series, so it seems a
bit pointless to memorialize the issue.

Tom Lane and David Rowley, per reports from Andreas Seltenreich
and Jaime Casanova.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:05:51 -05:00
Peter Eisentraut 3696a600e2 SEARCH and CYCLE clauses
This adds the SQL standard feature that adds the SEARCH and CYCLE
clauses to recursive queries to be able to do produce breadth- or
depth-first search orders and detect cycles.  These clauses can be
rewritten into queries using existing syntax, and that is what this
patch does in the rewriter.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-01 14:32:51 +01:00
Alexander Korotkov bb513b364b Get rid of unnecessary memory allocation in jsonb_subscript_assign()
Current code allocates memory for JsonbValue, but it could be placed locally.
2021-02-01 14:06:02 +03:00
Michael Paquier fe61df7f82 Introduce --with-ssl={openssl} as a configure option
This is a replacement for the existing --with-openssl, extending the
logic to make easier the addition of new SSL libraries.  The grammar is
chosen to be similar to --with-uuid, where multiple values can be
chosen, with "openssl" as the only supported value for now.

The original switch, --with-openssl, is kept for compatibility.

Author: Daniel Gustafsson, Michael Paquier
Reviewed-by: Jacob Champion
Discussion: https://postgr.es/m/FAB21FC8-0F62-434F-AA78-6BD9336D630A@yesql.se
2021-02-01 19:19:44 +09:00
Tom Lane 7c5d57caed Fix portability issue in new jsonbsubs code.
On machines where sizeof(Datum) > sizeof(Oid) (that is, any 64-bit
platform), the previous coding would compute a misaligned
workspace->index pointer if nupper is odd.  Architectures where
misaligned access is a hard no-no would then fail.  This appears
to explain why thorntail is unhappy but other buildfarm members
are not.
2021-02-01 02:03:59 -05:00
Alexander Korotkov aa6e46daf5 Throw error when assigning jsonb scalar instead of a composite object
During the jsonb subscripting assignment, the provided path might assume an
object or an array where the source jsonb has a scalar value.  Initial
subscripting assignment logic will skip such an update operation with no
message shown.  This commit makes it throw an error to indicate this type
of situation.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:51:06 +03:00
Alexander Korotkov 81fcc72e66 Filling array gaps during jsonb subscripting
This commit introduces two new flags for jsonb assignment:

* JB_PATH_FILL_GAPS: Appending array elements on the specified position, gaps
  are filled with nulls (similar to the JavaScript behavior).  This mode also
  instructs to   create the whole path in a jsonb object if some part of the
  path (more than just the last element) is not present.

* JB_PATH_CONSISTENT_POSITION: Assigning keeps array positions consistent by
  preventing prepending of elements.

Both flags are used only in jsonb subscripting assignment.

Initially proposed by Nikita Glukhov based on polymorphic subscripting
patch, but transformed into an independent change.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:51:01 +03:00
Alexander Korotkov 676887a3b0 Implementation of subscripting for jsonb
Subscripting for jsonb does not support slices, does not have a limit for the
number of subscripts, and an assignment expects a replace value to have jsonb
type.  There is also one functional difference between assignment via
subscripting and assignment via jsonb_set().  When an original jsonb container
is NULL, the subscripting replaces it with an empty jsonb and proceeds with
an assignment.

For the sake of code reuse, we rearrange some parts of jsonb functionality
to allow the usage of the same functions for jsonb_set and assign subscripting
operation.

The original idea belongs to Oleg Bartunov.

Catversion is bumped.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:50:40 +03:00
Peter Geoghegan dc43492e46 Remove unused _bt_delitems_delete() argument.
The latestRemovedXid values used by nbtree deletion operations are
determined by _bt_delitems_delete()'s caller, so there is no reason to
pass a separate heapRel argument.

Oversight in commit d168b66682.
2021-01-31 10:10:55 -08:00
Alexander Korotkov 0c4f355c6a Fix parsing of complex morphs to tsquery
When to_tsquery() or websearch_to_tsquery() meet a complex morph containing
multiple words residing adjacent position, these words are connected
with OP_AND operator.  That leads to surprising results.  For instace,
both websearch_to_tsquery('"pg_class pg"') and to_tsquery('pg_class <-> pg')
produce '( pg & class ) <-> pg' tsquery.  This tsquery requires
'pg' and 'class' words to reside on the same position and doesn't match
to to_tsvector('pg_class pg').  It appears to be ridiculous behavior, which
needs to be fixed.

This commit makes to_tsquery() or websearch_to_tsquery() connect words
residing adjacent position with OP_PHRASE.  Therefore, now those words are
normally chained with other OP_PHRASE operator.  The examples of above now
produces 'pg <-> class <-> pg' tsquery, which matches to
to_tsvector('pg_class pg').

Another effect of this commit is that complex morph word positions now need to
match the tsvector even if there is no surrounding OP_PHRASE.  This behavior
change generally looks like an improvement but making this commit not
backpatchable.

Reported-by: Barry Pederson
Bug: #16592
Discussion: https://postgr.es/m/16592-70b110ff9731c07d@postgresql.org
Discussion: https://postgr.es/m/CAPpHfdv0EzVhf6CWfB1_TTZqXV_2Sn-jSY3zSd7ePH%3D-%2B1V2DQ%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Tom Lane, Neil Chen
2021-01-31 20:14:29 +03:00
Peter Eisentraut dfb75e478c Add primary keys and unique constraints to system catalogs
For those system catalogs that have a unique indexes, make a primary
key and unique constraint, using ALTER TABLE ... PRIMARY KEY/UNIQUE
USING INDEX.

This can be helpful for GUI tools that look for a primary key, and it
might in the future allow declaring foreign keys, for making schema
diagrams.

The constraint creation statements are automatically created by
genbki.pl from DECLARE_UNIQUE_INDEX directives.  To specify which one
of the available unique indexes is the primary key, use the new
directive DECLARE_UNIQUE_INDEX_PKEY instead.  By convention, we
usually make a catalog's OID column its primary key, if it has one.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/dc5f44d9-5ec1-a596-0251-dadadcdede98@2ndquadrant.com
2021-01-30 19:44:29 +01:00
Peter Eisentraut 6aaaa76bb4 Allow GRANTED BY clause in normal GRANT and REVOKE statements
The SQL standard allows a GRANTED BY clause on GRANT and
REVOKE (privilege) statements that can specify CURRENT_USER or
CURRENT_ROLE.  In PostgreSQL, both of these are the default behavior.
Since we already have all the parsing support for this for the
GRANT (role) statement, we might as well add basic support for this
for the privilege variant as well.  This allows us to check off SQL
feature T332.  In the future, perhaps more interesting things could be
done with this, too.

Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/f2feac44-b4c5-f38f-3699-2851d6a76dc9@2ndquadrant.com
2021-01-30 09:45:11 +01:00
Noah Misch 7da83415e5 Revive "snapshot too old" with wal_level=minimal and SET TABLESPACE.
Given a permanent relation rewritten in the current transaction, the
old_snapshot_threshold mechanism assumed the relation had never been
subject to early pruning.  Hence, a query could fail to report "snapshot
too old" when the rewrite followed an early truncation.  ALTER TABLE SET
TABLESPACE is probably the only rewrite mechanism capable of exposing
this bug.  REINDEX sets indcheckxmin, avoiding the problem.  CLUSTER has
zeroed page LSNs since before old_snapshot_threshold existed, so
old_snapshot_threshold has never cooperated with it.  ALTER TABLE
... SET DATA TYPE makes the table look empty to every past snapshot,
which is strictly worse.  Back-patch to v13, where commit
c6b92041d3 broke this.

Kyotaro Horiguchi and Noah Misch

Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com
2021-01-30 00:12:18 -08:00
Noah Misch 360bd2321b Fix error with CREATE PUBLICATION, wal_level=minimal, and new tables.
CREATE PUBLICATION has failed spuriously when applied to a permanent
relation created or rewritten in the current transaction.  Make the same
change to another site having the same semantic intent; the second
instance has no user-visible consequences.  Back-patch to v13, where
commit c6b92041d3 broke this.

Kyotaro Horiguchi

Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com
2021-01-30 00:11:38 -08:00
Noah Misch 8a54e12a38 Fix CREATE INDEX CONCURRENTLY for simultaneous prepared transactions.
In a cluster having used CREATE INDEX CONCURRENTLY while having enabled
prepared transactions, queries that use the resulting index can silently
fail to find rows.  Fix this for future CREATE INDEX CONCURRENTLY by
making it wait for prepared transactions like it waits for ordinary
transactions.  This expands the VirtualTransactionId structure domain to
admit prepared transactions.  It may be necessary to reindex to recover
from past occurrences.  Back-patch to 9.5 (all supported versions).

Andrey Borodin, reviewed (in earlier versions) by Tom Lane and Michael
Paquier.

Discussion: https://postgr.es/m/2E712143-97F7-4890-B470-4A35142ABC82@yandex-team.ru
2021-01-30 00:00:27 -08:00
Michael Paquier 24843297a9 Adjust comments of CheckRelationTableSpaceMove() and SetRelationTableSpace()
4c9c359, that introduced those two functions, has been overoptimistic on
the point that only ShareUpdateExclusiveLock would be required when
moving a relation to a new tablespace.  AccessExclusiveLock is a
requirement, but ShareUpdateExclusiveLock may be used under specific
conditions like REINDEX CONCURRENTLY where waits on past transactions
make the operation safe even with a lower-level lock.  The current code
does only the former, so update the existing comments to reflect that.

Once a REINDEX (TABLESPACE) is introduced, those comments would require
an extra refresh to mention their new use case.

While on it, fix an incorrect variable name.

Per discussion with Álvaro Herrera.

Discussion: https://postgr.es/m/20210127140741.GA14174@alvherre.pgsql
2021-01-29 13:59:18 +09:00
Thomas Munro 514b411a2b Retire pg_standby.
pg_standby was useful more than a decade ago, but now it is obsolete.
It has been proposed that we retire it many times.  Now seems like a
good time to finally do it, because "waiting restore commands"
are incompatible with a proposed recovery prefetching feature.

Discussion: https://postgr.es/m/20201029024412.GP5380%40telsasoft.com
Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
2021-01-29 14:09:41 +13:00
Tom Lane 1046dbedde Silence another gcc 11 warning.
Per buildfarm and local experimentation, bleeding-edge gcc isn't
convinced that the MemSet in reorder_function_arguments() is safe.
Shut it up by adding an explicit check that pronargs isn't negative,
and by changing MemSet to memset.  (It appears that either change is
enough to quiet the warning at -O2, but let's do both to be sure.)
2021-01-28 17:19:16 -05:00
Alvaro Herrera 6f5c8a8ec2
Remove bogus restriction from BEFORE UPDATE triggers
In trying to protect the user from inconsistent behavior, commit
487e9861d0 "Enable BEFORE row-level triggers for partitioned tables"
tried to prevent BEFORE UPDATE FOR EACH ROW triggers from moving the row
from one partition to another.  However, it turns out that the
restriction is wrong in two ways: first, it fails spuriously, preventing
valid situations from working, as in bug #16794; and second, they don't
protect from any misbehavior, because tuple routing would cope anyway.

Fix by removing that restriction.

We keep the same restriction on BEFORE INSERT FOR EACH ROW triggers,
though.  It is valid and useful there.  In the future we could remove it
by having tuple reroute work for inserts as it does for updates.

Backpatch to 13.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Phillip Menke <pg@pmenke.de>
Discussion: https://postgr.es/m/16794-350a655580fbb9ae@postgresql.org
2021-01-28 16:56:07 -03:00
Tom Lane 1d9351a87c Fix hash partition pruning with asymmetric partition sets.
perform_pruning_combine_step() was not taught about the number of
partition indexes used in hash partitioning; more embarrassingly,
get_matching_hash_bounds() also had it wrong.  These errors are masked
in the common case where all the partitions have the same modulus
and no partition is missing.  However, with missing or unequal-size
partitions, we could erroneously prune some partitions that need
to be scanned, leading to silently wrong query answers.

While a minimal-footprint fix for this could be to export
get_partition_bound_num_indexes and make the incorrect functions use it,
I'm of the opinion that that function should never have existed in the
first place.  It's not reasonable data structure design that
PartitionBoundInfoData lacks any explicit record of the length of
its indexes[] array.  Perhaps that was all right when it could always
be assumed equal to ndatums, but something should have been done about
it as soon as that stopped being true.  Putting in an explicit
"nindexes" field makes both partition_bounds_equal() and
partition_bounds_copy() simpler, safer, and faster than before,
and removes explicit knowledge of the number-of-partition-indexes
rules from some other places too.

This change also makes get_hash_partition_greatest_modulus obsolete.
I left that in place in case any external code uses it, but no core
code does anymore.

Per bug #16840 from Michał Albrycht.  Back-patch to v11 where the
hash partitioning code came in.  (In the back branches, add the new
field at the end of PartitionBoundInfoData to minimize ABI risks.)

Discussion: https://postgr.es/m/16840-571a22976f829ad4@postgresql.org
2021-01-28 13:41:55 -05:00
Heikki Linnakangas 6c5576075b Add direct conversion routines between EUC_TW and Big5.
Conversions between EUC_TW and Big5 were previously implemented by
converting the whole input to MIC first, and then from MIC to the target
encoding. Implement functions to convert directly between the two.

The reason to do this now is that I'm working on a patch that will change
the conversion function signature so that if the input is invalid, we
convert as much as we can and return the number of bytes successfully
converted. That's not possible if we use an intermediary format, because
if an error happens in the intermediary -> final conversion, we lose track
of the location of the invalid character in the original input. Avoiding
the intermediate step makes the conversions faster, too.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/b9e3167f-f84b-7aa4-5738-be578a4db924%40iki.fi
2021-01-28 14:53:03 +02:00
Heikki Linnakangas b80e10638e Add mbverifystr() functions specific to each encoding.
This makes pg_verify_mbstr() function faster, by allowing more efficient
encoding-specific implementations. All the implementations included in
this commit are pretty naive, they just call the same encoding-specific
verifychar functions that were used previously, but that already gives a
performance boost because the tight character-at-a-time loop is simpler.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01@iki.fi
2021-01-28 14:40:07 +02:00
Andrew Gierth a3367aa3c4 Don't add bailout adjustment for non-strict deserialize calls.
When building aggregate expression steps, strict checks need a bailout
jump for when a null value is encountered, so there is a list of steps
that require later adjustment. Adding entries to that list for steps
that aren't actually strict would be harmless, except that there is an
Assert which catches them. This leads to spurious errors on asserts
builds, for data sets that trigger parallel aggregation of an
aggregate with a non-strict deserialization function (no such
aggregates exist in the core system).

Repair by not adding the adjustment entry when it's not needed.

Backpatch back to 11 where the code was introduced.

Per a report from Darafei (Komzpa) of the PostGIS project; analysis
and patch by me.

Discussion: https://postgr.es/m/87mty7peb3.fsf@news-spur.riddles.org.uk
2021-01-28 10:53:10 +00:00
Michael Paquier f854c69a5b Refactor SQL functions of SHA-2 in cryptohashfuncs.c
The same code pattern was repeated four times when compiling a SHA-2
hash.  This refactoring has the advantage to issue a compilation warning
if a new value is added to pg_cryptohash_type, so as anybody doing an
addition in this area would need to consider if support for a new SQL
function is needed or not.

Author: Sehrope Sarkuni, Michael Paquier
Discussion: https://postgr.es/m/YA7DvLRn2xnTgsMc@paquier.xyz
2021-01-28 16:13:26 +09:00
Peter Geoghegan e19594c5c0 Reduce the default value of vacuum_cost_page_miss.
When commit f425b605 introduced cost based vacuum delays back in 2004,
the defaults reflected then-current trends in hardware, as well as
certain historical limitations in PostgreSQL.  There have been enormous
improvements in both areas since that time.  The cost limit GUC defaults
finally became much more representative of current trends following
commit cbccac37, which decreased autovacuum_vacuum_cost_delay's default
by 10x for PostgreSQL 12 (it went from 20ms to only 2ms).

The relative costs have shifted too.  This should also be accounted for
by the defaults.  More specifically, the relative importance of avoiding
dirtying pages within VACUUM has greatly increased, primarily due to
main memory capacity scaling and trends in flash storage.  Within
Postgres itself, improvements like sequential access during index
vacuuming (at least in nbtree and GiST indexes) have also been
contributing factors.

To reflect all this, decrease the default of vacuum_cost_page_miss to 2.
Since the default of vacuum_cost_page_dirty remains 20, dirtying a page
is now considered 10x "costlier" than a page miss by default.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmLPFnkWT8xMjmcsm7YS3+_Qi3iRWAb2+_Bc8UhVyHfuA@mail.gmail.com
2021-01-27 15:11:13 -08:00
Robert Haas 69059d3b2f In TrimCLOG(), don't reset XactCtl->shared->latest_page_number.
Since the CLOG page number is not recorded directly in the checkpoint
record, we have to use ShmemVariableCache->nextXid to figure out the
latest CLOG page number at the start of recovery. However, as recovery
progresses, replay of CLOG/EXTEND records will update our notion of
the latest page number, and we should rely on that being accurate
rather than recomputing the value based on an updated notion of
nextXid. ShmemVariableCache->nextXid is only an approximation
during recovery anyway, whereas CLOG/EXTEND records are an
authoritative representation of how the SLRU has been updated.

Commit 0fcc2decd4 makes this
simplification possible, as before that change clog_redo() might
have injected a bogus value here, and we'd want to get rid of
that before entering normal running.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 15:52:34 -05:00
Robert Haas 0fcc2decd4 In clog_redo(), don't set XactCtl->shared->latest_page_number.
The comment is no longer accurate, and hasn't been entirely accurate
since Hot Standby was introduced. The original idea here was that
StartupCLOG() wouldn't be called until the end of recovery and
therefore this value would be uninitialized when this code is reached,
but Hot Standby made that true only when hot_standby=off, and commit
1f113abdf8 means that this value is now
always initialized before replay even starts.

The original purpose of this code was to bypass the sanity check
in SimpleLruTruncate(), which will no longer occur: now, if something
is wrong, that sanity check might trip during recovery. That's
probably a good thing, because in the current code base
latest_page_number should always be initialized and therefore we
expect that the sanity check should pass. If it doesn't, something
has gone wrong, and complaining about it is appropriate.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 13:11:30 -05:00
Robert Haas 1f113abdf8 Move StartupCLOG() calls to just after we initialize ShmemVariableCache.
Previously, the hot_standby=off code path did this at end of recovery,
while the hot_standby=on code path did it at the beginning of recovery.
It's better to do this in only one place because (a) it's simpler,
(b) StartupCLOG() is trivial so trying to postpone the work isn't
useful, and (c) this will make it possible to simplify some other
logic.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 12:20:46 -05:00
Peter Geoghegan e42b3c3bd6 Fix GiST index deletion assert issue.
Avoid calling heap_index_delete_tuples() with an empty deltids array to
avoid an assertion failure.

This issue was arguably an oversight in commit b5f58cf2, though the
failing assert itself was added by my recent commit d168b666.  No
backpatch, though, since the oversight is harmless in the back branches.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec>
Discussion: https://postgr.es/m/CAJKUy5jscES84n3puE=sYngyF+zpb4wv8UMtuLnLPv5z=6yyNw@mail.gmail.com
2021-01-26 23:24:37 -08:00
Michael Paquier 4c9c359d38 Refactor code in tablecmds.c to check and process tablespace moves
Two code paths of tablecmds.c (for relations with storage and without
storage) use the same logic to check if the move of a relation to a
new tablespace is allowed or not and to update pg_class.reltablespace
and pg_class.relfilenode.

A potential TABLESPACE clause for REINDEX, CLUSTER and VACUUM FULL needs
similar checks to make sure that nothing is moved around in illegal ways
(no mapped relations, shared relations only in pg_global, no move of
temp tables owned by other backends).

This reorganizes the existing code of ALTER TABLE so as all this logic
is controlled by two new routines that can be reused for the other
commands able to move relations across tablespaces, limiting the number
of code paths in need of the same protections.  This also removes some
code that was duplicated for tables with and without storage for ALTER
TABLE.

Author: Alexey Kondratov, Michael Paquier
Discussion: https://postgr.es/m/YA+9mAMWYLXJMVPL@paquier.xyz
2021-01-27 11:54:16 +09:00
Tom Lane d5a83d79c9 Rethink recently-added SPI interfaces.
SPI_execute_with_receiver and SPI_cursor_parse_open_with_paramlist are
new in v14 (cf. commit 2f48ede08).  Before they can get out the door,
let's change their APIs to follow the practice recently established by
SPI_prepare_extended etc: shove all optional arguments into a struct
that callers are supposed to pre-zero.  The hope is to allow future
addition of more options without either API breakage or a continuing
proliferation of new SPI entry points.  With that in mind, choose
slightly more generic names for them: SPI_execute_extended and
SPI_cursor_parse_open respectively.

Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
2021-01-26 16:37:12 -05:00
Tom Lane ee895a655c Improve performance of repeated CALLs within plpgsql procedures.
This patch essentially is cleaning up technical debt left behind
by the original implementation of plpgsql procedures, particularly
commit d92bc83c4.  That patch (or more precisely, follow-on patches
fixing its worst bugs) forced us to re-plan CALL and DO statements
each time through, if we're in a non-atomic context.  That wasn't
for any fundamental reason, but just because use of a saved plan
requires having a ResourceOwner to hold a reference count for the
plan, and we had no suitable resowner at hand, nor would the
available APIs support using one if we did.  While it's not that
expensive to create a "plan" for CALL/DO, the cycles do add up
in repeated executions.

This patch therefore makes the following API changes:

* GetCachedPlan/ReleaseCachedPlan are modified to let the caller
specify which resowner to use to pin the plan, rather than forcing
use of CurrentResourceOwner.

* spi.c gains a "SPI_execute_plan_extended" entry point that lets
callers say which resowner to use to pin the plan.  This borrows the
idea of an options struct from the recently added SPI_prepare_extended,
hopefully allowing future options to be added without more API breaks.
This supersedes SPI_execute_plan_with_paramlist (which I've marked
deprecated) as well as SPI_execute_plan_with_receiver (which is new
in v14, so I just took it out altogether).

* I also took the opportunity to remove the crude hack of letting
plpgsql reach into SPI private data structures to mark SPI plans as
"no_snapshot".  It's better to treat that as an option of
SPI_prepare_extended.

Now, when running a non-atomic procedure or DO block that contains
any CALL or DO commands, plpgsql creates a ResourceOwner that
will be used to pin the plans of the CALL/DO commands.  (In an
atomic context, we just use CurrentResourceOwner, as before.)
Having done this, we can just save CALL/DO plans normally,
whether or not they are used across transaction boundaries.
This seems to be good for something like 2X speedup of a CALL
of a trivial procedure with a few simple argument expressions.
By restricting the creation of an extra ResourceOwner like this,
there's essentially zero penalty in cases that can't benefit.

Pavel Stehule, with some further hacking by me

Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
2021-01-25 22:28:29 -05:00
Andres Freund 55ef8555f0 Fix two typos in snapbuild.c.
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/c94be044-818f-15e3-1ad3-7a7ae2dfed0a@iki.fi
2021-01-25 12:15:10 -08:00
Tom Lane 07d46fceb4 Fix broken ruleutils support for function TRANSFORM clauses.
I chanced to notice that this dumped core due to a faulty Assert.
To add insult to injury, the output has been misformatted since v11.
Obviously we need some regression testing here.

Discussion: https://postgr.es/m/d1cc628c-3953-4209-957b-29427acc38c8@www.fastmail.com
2021-01-25 13:03:43 -05:00
Robert Haas d18e75664a Remove CheckpointLock.
Up until now, we've held this lock when performing a checkpoint or
restartpoint, but commit 076a055acf back
in 2004 and commit 7e48b77b1c from 2009,
taken together, have removed all need for this. In the present code,
there's only ever one process entitled to attempt a checkpoint: either
the checkpointer, during normal operation, or the postmaster, during
single-user operation. So, we don't need the lock.

One possible concern in making this change is that it means that
a substantial amount of code where HOLD_INTERRUPTS() was previously
in effect due to the preceding LWLockAcquire() will now be
running without that. This could mean that ProcessInterrupts()
gets called in places from which it didn't before. However, this
seems unlikely to do very much, because the checkpointer doesn't
have any signal mapped to die(), so it's not clear how,
for example, ProcDiePending = true could happen in the first
place. Similarly with ClientConnectionLost and recovery conflicts.

Also, if there are any such problems, we might want to fix them
rather than reverting this, since running lots of code with
interrupt handling suspended is generally bad.

Patch by me, per an inquiry by Amul Sul. Review by Tom Lane
and Michael Paquier.

Discussion: http://postgr.es/m/CAAJ_b97XnBBfYeSREDJorFsyoD1sHgqnNuCi=02mNQBUMnA=FA@mail.gmail.com
2021-01-25 12:34:38 -05:00
David Rowley 16dfe253e3 Fix hypothetical bug in heap backward scans
Both heapgettup() and heapgettup_pagemode() incorrectly set the first page
to scan in a backward scan in which the number of pages to scan was
specified by heap_setscanlimits().  The code incorrectly started the scan
at the end of the relation when startBlk was 0, or otherwise at
startBlk - 1, neither of which is correct when only scanning a subset of
pages.

The fix here checks if heap_setscanlimits() has changed the number of
pages to scan and if so we set the first page to scan as the final page in
the specified range during backward scans.

Proper adjustment of this code was forgotten when heap_setscanlimits() was
added in 7516f5259 back in 9.5.  However, practice, nowhere in core code
performs backward scans after having used heap_setscanlimits(), yet, it is
possible an extension uses the heap functions in this way, hence
backpatch.

An upcoming patch does use heap_setscanlimits() with backward scans, so
this must be fixed before that can go in.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvpGc9h0_oVD2CtgBcxCS1N-qDYZSeBRnUh+0CWJA9cMaA@mail.gmail.com
Backpatch-through: 9.5, all supported versions
2021-01-25 19:52:18 +13:00
Amit Kapila 40ab64c1ec Fix ALTER PUBLICATION...DROP TABLE behavior.
Commit 69bd60672 fixed the initialization of streamed transactions for
RelationSyncEntry. It forgot to initialize the publication actions while
invalidating the RelationSyncEntry due to which even though the relation
is dropped from a particular publication we still publish its changes. Fix
it by initializing pubactions when entry got invalidated.

Author: Japin Li and Bharath Rupireddy
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CALj2ACV+0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw@mail.gmail.com
2021-01-25 07:39:29 +05:30
Tomas Vondra 39b66a91bd Fix COPY FREEZE with CLOBBER_CACHE_ALWAYS
This adds code omitted from commit 7db0cd2145 by accident, which had
two consequences. Firstly, only rows inserted by heap_multi_insert were
frozen as expected when running COPY FREEZE, while heap_insert left
rows unfrozen. That however includes rows in TOAST tables, so a lot of
data might have been left unfrozen. Secondly, page might have been left
partially empty after relcache invalidation.

This addresses both of those issues.

Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
2021-01-24 01:08:11 +01:00
Tom Lane 7cd9765f9b Re-allow DISTINCT in pl/pgsql expressions.
I'd omitted this from the grammar in commit c9d529848, figuring that
it wasn't worth supporting.  However we already have one complaint,
so it seems that judgment was wrong.  It doesn't require a huge
amount of code, so add it back.  (I'm still drawing the line at
UNION/INTERSECT/EXCEPT though: those'd require an unreasonable
amount of grammar refactoring, and the single-result-row restriction
makes them near useless anyway.)

Also rethink the documentation: this behavior is a property of
all pl/pgsql expressions, not just assignments.

Discussion: https://postgr.es/m/20210122134106.e94c5cd7@mail.verfriemelt.org
2021-01-22 16:26:22 -05:00
Peter Eisentraut 09418bed67 Remove bogus tracepoint
Calls to LWLockWaitForVar() fired the TRACE_POSTGRESQL_LWLOCK_ACQUIRE
tracepoint, but LWLockWaitForVar() never actually acquires the LWLock.
(Probably a copy/paste bug in 68a2e52bbaf.)  Remove it.

Author: Craig Ringer <craig.ringer@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAGRY4nxJo+-HCC2i5H93ttSZ4gZO-FSddCwvkb-qAfQ1zdXd1w@mail.gmail.com
2021-01-22 11:58:21 +01:00
Michael Paquier af0e79c8f4 Move SSL information callback earlier to capture more information
The callback for retrieving state change information during connection
setup was only installed when the connection was mostly set up, and
thus didn't provide much information and missed all the details related
to the handshake.

This also extends the callback with SSL_state_string_long() to print
more information about the state change within the SSL object handled.

While there, fix some comments which were incorrectly referring to the
callback and its previous location in fe-secure.c.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/232CF476-94E1-42F1-9408-719E2AEC5491@yesql.se
2021-01-22 09:26:27 +09:00
Tom Lane 55dc86eca7 Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar.
Previously, pull_varnos() took the relids of a PlaceHolderVar as being
equal to the relids in its contents, but that fails to account for the
possibility that we have to postpone evaluation of the PHV due to outer
joins.  This could result in a malformed plan.  The known cases end up
triggering the "failed to assign all NestLoopParams to plan nodes"
sanity check in createplan.c, but other symptoms may be possible.

The right value to use is the join level we actually intend to evaluate
the PHV at.  We can get that from the ph_eval_at field of the associated
PlaceHolderInfo.  However, there are some places that call pull_varnos()
before the PlaceHolderInfos have been created; in that case, fall back
to the conservative assumption that the PHV will be evaluated at its
syntactic level.  (In principle this might result in missing some legal
optimization, but I'm not aware of any cases where it's an issue in
practice.)  Things are also a bit ticklish for calls occurring during
deconstruct_jointree(), but AFAICS the ph_eval_at fields should have
reached their final values by the time we need them.

The main problem in making this work is that pull_varnos() has no
way to get at the PlaceHolderInfos.  We can fix that easily, if a
bit tediously, in HEAD by passing it the planner "root" pointer.
In the back branches that'd cause an unacceptable API/ABI break for
extensions, so leave the existing entry points alone and add new ones
with the additional parameter.  (If an old entry point is called and
encounters a PHV, it'll fall back to using the syntactic level,
again possibly missing some valid optimization.)

Back-patch to v12.  The computation is surely also wrong before that,
but it appears that we cannot reach a bad plan thanks to join order
restrictions imposed on the subquery that the PlaceHolderVar came from.
The error only became reachable when commit 4be058fe9 allowed trivial
subqueries to be collapsed out completely, eliminating their join order
restrictions.

Per report from Stephan Springl.

Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
2021-01-21 15:37:23 -05:00
Tomas Vondra 920f853dc9 Fix initialization of FDW batching in ExecInitModifyTable
ExecInitModifyTable has to initialize batching for all result relations,
not just the first one. Furthermore, when junk filters were necessary,
the pointer pointed past the mtstate->resultRelInfo array.

Per reports from multiple non-x86 animals (florican, locust, ...).

Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-01-21 03:34:32 +01:00
Tomas Vondra b663a41363 Implement support for bulk inserts in postgres_fdw
Extends the FDW API to allow batching inserts into foreign tables. That
is usually much more efficient than inserting individual rows, due to
high latency for each round-trip to the foreign server.

It was possible to implement something similar in the regular FDW API,
but it was inconvenient and there were issues with reporting the number
of actually inserted rows etc. This extends the FDW API with two new
functions:

* GetForeignModifyBatchSize - allows the FDW picking optimal batch size

* ExecForeignBatchInsert - inserts a batch of rows at once

Currently, only INSERT queries support batching. Support for DELETE and
UPDATE may be added in the future.

This also implements batching for postgres_fdw. The batch size may be
specified using "batch_size" option both at the server and table level.

The initial patch version was written by me, but it was rewritten and
improved in many ways by Takayuki Tsunakawa.

Author: Takayuki Tsunakawa
Reviewed-by: Tomas Vondra, Amit Langote
Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-01-20 23:57:27 +01:00
Heikki Linnakangas 6b4d3046f4 Fix bug in detecting concurrent page splits in GiST insert
In commit 9eb5607e69, I got the condition on checking for split or
deleted page wrong: I used && instead of ||. The comment correctly said
"concurrent split _or_ deletion".

As a result, GiST insertion could miss a concurrent split, and insert to
wrong page. Duncan Sands demonstrated this with a test script that did a
lot of concurrent inserts.

Backpatch to v12, where this was introduced. REINDEX is required to fix
indexes that were affected by this bug.

Backpatch-through: 12
Reported-by: Duncan Sands
Discussion: https://www.postgresql.org/message-id/a9690483-6c6c-3c82-c8ba-dc1a40848f11%40deepbluecap.com
2021-01-20 11:58:03 +02:00
Michael Paquier 21378e1fef Fix ALTER DEFAULT PRIVILEGES with duplicated objects
Specifying duplicated objects in this command would lead to unique
constraint violations in pg_default_acl or "tuple already updated by
self" errors.  Similarly to GRANT/REVOKE, increment the command ID after
each subcommand processing to allow this case to work transparently.

A regression test is added by tweaking one of the existing queries of
privileges.sql to stress this case.

Reported-by: Andrus
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/ae2a7dc1-9d71-8cba-3bb9-e4cb7eb1f44e@hot.ee
Backpatch-through: 9.5
2021-01-20 11:38:17 +09:00
Tom Lane a0efda88a6 Remove faulty support for MergeAppend plan with WHERE CURRENT OF.
Somebody extended search_plan_tree() to treat MergeAppend exactly
like Append, which is 100% wrong, because unlike Append we can't
assume that only one input node is actively returning tuples.
Hence a cursor using a MergeAppend across a UNION ALL or inheritance
tree could falsely match a WHERE CURRENT OF query at a row that
isn't actually the cursor's current output row, but coincidentally
has the same TID (in a different table) as the current output row.

Delete the faulty code; this means that such a case will now return
an error like 'cursor "foo" is not a simply updatable scan of table
"bar"', instead of silently misbehaving.  Users should not find that
surprising though, as the same cursor query could have failed that way
already depending on the chosen plan.  (It would fail like that if the
sort were done with an explicit Sort node instead of MergeAppend.)

Expand the clearly-inadequate commentary to be more explicit about
what this code is doing, in hopes of forestalling future mistakes.

It's been like this for awhile, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us
2021-01-19 13:25:33 -05:00
Amit Kapila ed43677e20 pgindent worker.c.
This is a leftover from commit 0926e96c49. Changing this separately
because this file is being modified for upcoming patch logical replication
of 2PC.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Ps+EgG8KzcmAyAgBUi_vuTps6o9ZA8DG6SdnO0-YuOhPQ@mail.gmail.com
2021-01-19 08:10:13 +05:30
Tom Lane 60661bbf2d Avoid crash with WHERE CURRENT OF and a custom scan plan.
execCurrent.c's search_plan_tree() assumed that ForeignScanStates
and CustomScanStates necessarily have a valid ss_currentRelation.
This is demonstrably untrue for postgres_fdw's remote join and
remote aggregation plans, and non-leaf custom scans might not have
an identifiable scan relation either.  Avoid crashing by ignoring
such nodes when the field is null.

This solution will lead to errors like 'cursor "foo" is not a
simply updatable scan of table "bar"' in cases where maybe we
could have allowed WHERE CURRENT OF to work.  That's not an issue
for postgres_fdw's usages, since joins or aggregations would render
WHERE CURRENT OF invalid anyway.  But an otherwise-transparent
upper level custom scan node might find this annoying.  When and if
someone cares to expend work on such a scenario, we could invent a
custom-scan-provider callback to determine what's safe.

Report and patch by David Geier, commentary by me.  It's been like
this for awhile, so back-patch to all supported branches.

Discussion: https://postgr.es/m/0253344d-9bdd-11c4-7f0d-d88c02cd7991@swarm64.com
2021-01-18 18:32:30 -05:00
Tom Lane 3fd80c728d Narrow the scope of a local variable.
This is better style and more symmetrical with the other if-branch.
This likely should have been included in 9de77b545 (which created
the opportunity), but it was overlooked.

Japin Li

Discussion: https://postgr.es/m/MEYP282MB16699FA4A7CD57EB250E871FB6A40@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2021-01-18 15:55:01 -05:00
Tom Lane a6cf3df4eb Add bytea equivalents of ltrim() and rtrim().
We had bytea btrim() already, but for some reason not the other two.

Joel Jacobson

Discussion: https://postgr.es/m/d10cd5cd-a901-42f1-b832-763ac6f7ff3a@www.fastmail.com
2021-01-18 15:11:32 -05:00
Robert Haas a3ed4d1efe Allow for error or refusal while absorbing a ProcSignalBarrier.
Previously, the per-barrier-type functions tasked with absorbing
them were expected to always succeed and never throw an error.
However, that's a bit inconvenient. Further study has revealed that
there are realistic cases where it might not be possible to absorb
a ProcSignalBarrier without terminating the transaction, or even
the whole backend. Similarly, for some barrier types, there might
be other reasons where it's not reasonably possible to absorb the
barrier at certain points in the code, so provide a way for a
per-barrier-type function to reject absorbing the barrier.

Unfortunately, there's still no committed code making use of this
infrastructure; hopefully, we'll get there. :-(

Patch by me, reviewed by Andres Freund and Amul Sul.

Discussion: http://postgr.es/m/20200908182005.xya7wetdh3pndzim@alap3.anarazel.de
Discussion: http://postgr.es/m/CA+Tgmob56Pk1-5aTJdVPCWFHon7me4M96ENpGe9n_R4JUjjhZA@mail.gmail.com
2021-01-18 12:09:52 -05:00
Peter Eisentraut 15251c0a60 Pause recovery for insufficient parameter settings
When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

This patch changes this behavior for hot standbys to pause recovery at
that point instead.  That allows read traffic on the standby to
continue while database administrators figure out next steps.  When
recovery is unpaused, the server shuts down (as before).  The idea is
to fix the parameters while recovery is paused and then restart when
there is a maintenance window.

Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
2021-01-18 09:04:04 +01:00
Michael Paquier a3dc926009 Refactor option handling of CLUSTER, REINDEX and VACUUM
This continues the work done in b5913f6.  All the options of those
commands are changed to use hex values rather than enums to reduce the
risk of compatibility bugs when introducing new options.  Each option
set is moved into a new structure that can be extended with more
non-boolean options (this was already the case of VACUUM).  The code of
REINDEX is restructured so as manual REINDEX commands go through a
single routine from utility.c, like VACUUM, to ease the allocation
handling of option parameters when a command needs to go through
multiple transactions.

This can be used as a base infrastructure for future patches related to
those commands, including reindex filtering and tablespace support.

Per discussion with people mentioned below, as well as Alvaro Herrera
and Peter Eisentraut.

Author: Michael Paquier, Justin Pryzby
Reviewed-by: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/X8riynBLwxAD9uKk@paquier.xyz
2021-01-18 14:03:10 +09:00
Tomas Vondra 7db0cd2145 Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE
Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the
visibility map. Until now we only marked individual tuples as frozen,
but page-level flags were not updated, so the first VACUUM after the
COPY FREEZE had to rewrite the whole table.

This is a fairly old patch, and multiple people worked on it. The first
version was written by Jeff Janes, and then reworked by Pavan Deolasee
and Anastasia Lubennikova.

Author: Anastasia Lubennikova, Pavan Deolasee, Jeff Janes
Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada,
             Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii,
             Darafei Praliaskouski
Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com
2021-01-17 22:28:26 +01:00
Magnus Hagander 960869da08 Add pg_stat_database counters for sessions and session time
This add counters for number of sessions, the different kind of session
termination types, and timers for how much time is spent in active vs
idle in a database to pg_stat_database.

Internally this also renames the parameter "force" to disconnect. This
was the only use-case for the parameter before, so repurposing it to
this mroe narrow usecase makes things cleaner than inventing something
new.

Author: Laurenz Albe
Reviewed-By: Magnus Hagander, Soumyadeep Chakraborty, Masahiro Ikeda
Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.camel@cybertec.at
2021-01-17 13:52:31 +01:00
Noah Misch 6db992833c Prevent excess SimpleLruTruncate() deletion.
Every core SLRU wraps around.  With the exception of pg_notify, the wrap
point can fall in the middle of a page.  Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback.  Update each callback implementation to fit the new
specification.  This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.)  The bug fixed here has the same symptoms and user
followup steps as 592a589a04.  Back-patch
to 9.5 (all supported versions).

Reviewed by Andrey Borodin and (in earlier versions) by Tom Lane.

Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
2021-01-16 12:21:35 -08:00
Amit Kapila c95765f476 Remove unnecessary pstrdup in fetch_table_list.
The result of TextDatumGetCString is already palloc'ed so we don't need to
allocate memory for it again. We decide not to backpatch it as there
doesn't seem to be any case where it can create a meaningful leak.

Author: Zhijie Hou
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/229fed2eb8c54c71a96ccb99e516eb12@G08CNEXMBPEKD05.g08.fujitsu.local
2021-01-16 10:15:32 +05:30
Tomas Vondra c9a0dc3486 Disallow CREATE STATISTICS on system catalogs
Add a check that CREATE STATISTICS does not add extended statistics on
system catalogs, similarly to indexes etc.  It can be overriden using
the allow_system_table_mods GUC.

This bug exists since 7b504eb282, adding the extended statistics, so
backpatch all the way back to PostgreSQL 10.

Author: Tomas Vondra
Reported-by: Dean Rasheed
Backpatch-through: 10
Discussion: https://postgr.es/m/CAEZATCXAPrrOKwEsyZKQ4uzzJQWBCt6QAvOcgqRGdWwT1zb%2BrQ%40mail.gmail.com
2021-01-15 23:31:22 +01:00
Alvaro Herrera f9900df5f9
Avoid spurious wait in concurrent reindex
This is like commit c98763bf51, but for REINDEX CONCURRENTLY.  To wit:
this flags indicates that the current process is safe to ignore for the
purposes of waiting for other snapshots, when doing CREATE INDEX
CONCURRENTLY and REINDEX CONCURRENTLY.  This helps two processes doing
either of those things not deadlock, and also avoids spurious waits.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
2021-01-15 10:31:42 -03:00
Fujii Masao 2ad78a87f0 Fix calculation of how much shared memory is required to store a TOC.
Commit ac883ac453 refactored shm_toc_estimate() but changed its calculation
of shared memory size for TOC incorrectly. Previously this could cause too
large memory to be allocated.

Back-patch to v11 where the bug was introduced.

Author: Takayuki Tsunakawa
Discussion: https://postgr.es/m/TYAPR01MB2990BFB73170E2C4921E2C4DFEA80@TYAPR01MB2990.jpnprd01.prod.outlook.com
2021-01-15 12:44:17 +09:00
Michael Paquier 5ae1572993 Fix O(N^2) stat() calls when recycling WAL segments
The counter tracking the last segment number recycled was getting
initialized when recycling one single segment, while it should be used
across a full cycle of segments recycled to prevent useless checks
related to entries already recycled.

This performance issue has been introduced by b2a5545, and it was first
implemented in 61b86142.

No backpatch is done per the lack of field complaints.

Reported-by: Andres Freund, Thomas Munro
Author: Michael Paquier
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20170621211016.eln6cxxp3jrv7m4m@alap3.anarazel.de
Discussion: https://postgr.es/m/CA+hUKG+DRiF9z1_MU4fWq+RfJMxP7zjoptfcmuCFPeO4JM2iVg@mail.gmail.com
2021-01-15 10:33:13 +09:00
Alvaro Herrera ebfe2dbd6b
Prevent drop of tablespaces used by partitioned relations
When a tablespace is used in a partitioned relation (per commits
ca4103025d in pg12 for tables and 33e6c34c32 in pg11 for indexes),
it is possible to drop the tablespace, potentially causing various
problems.  One such was reported in bug #16577, where a rewriting ALTER
TABLE causes a server crash.

Protect against this by using pg_shdepend to keep track of tablespaces
when used for relations that don't keep physical files; we now abort a
tablespace if we see that the tablespace is referenced from any
partitioned relations.

Backpatch this to 11, where this problem has been latent all along.  We
don't try to create pg_shdepend entries for existing partitioned
indexes/tables, but any ones that are modified going forward will be
protected.

Note slight behavior change: when trying to drop a tablespace that
contains both regular tables as well as partitioned ones, you'd
previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST.  Arguably, the latter is more
correct.

It is possible to add protecting pg_shdepend entries for existing
tables/indexes, by doing
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
for each partitioned table/index that is not in the database default
tablespace.  Because these partitioned objects do not have storage, no
file needs to be actually moved, so it shouldn't take more time than
what's required to acquire locks.

This query can be used to search for such relations:
SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2021-01-14 15:32:14 -03:00
Fujii Masao fef5b47f6b Ensure that a standby is able to follow a primary on a newer timeline.
Commit 709d003fbd refactored WAL-reading code, but accidentally caused
WalSndSegmentOpen() to fail to follow a timeline switch while reading from
a historic timeline. This issue caused a standby to fail to follow a primary
on a newer timeline when WAL archiving is enabled.

If there is a timeline switch within the segment, WalSndSegmentOpen() should
read from the WAL segment belonging to the new timeline. But previously
since it failed to follow a timeline switch, it tried to read the WAL segment
with old timeline. When WAL archiving is enabled, that WAL segment with
old timeline doesn't exist because it's renamed to .partial. This leads
a primary to have tried to read non-existent WAL segment, and which caused
replication to faill with the error "ERROR:  requested WAL segment ... has
 already been removed".

This commit fixes WalSndSegmentOpen() so that it's able to follow a timeline
switch, to ensure that a standby is able to follow a primary on a newer
timeline even when WAL archiving is enabled.

This commit also adds the regression test to check whether a standby is
able to follow a primary on a newer timeline when WAL archiving is enabled.

Back-patch to v13 where the bug was introduced.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi, tweaked by Fujii Masao
Reviewed-by:  Alvaro Herrera, Fujii Masao
Discussion: https://postgr.es/m/20201209.174314.282492377848029776.horikyota.ntt@gmail.com
2021-01-14 12:27:11 +09:00
Michael Paquier aef8948f38 Rework refactoring of hex and encoding routines
This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected.  Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing.  This is moved to src/common/ and
it gains the same overflow checks as the decoding part.

On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries.  Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.

While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection.  The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code.  Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks.  The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.

Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
2021-01-14 11:13:24 +09:00
Thomas Munro 0d56acfbaa Move our p{read,write}v replacements into their own files.
macOS's ranlib issued a warning about an empty pread.o file with the
previous arrangement, on systems new enough to require no replacement
functions.  Let's go back to using configure's AC_REPLACE_FUNCS system
to build and include each .o in the library only if it's needed, which
requires moving the *v() functions to their own files.

Also move the _with_retry() wrapper to a more permanent home.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1283127.1610554395%40sss.pgh.pa.us
2021-01-14 11:16:59 +13:00
Peter Geoghegan d168b66682 Enhance nbtree index tuple deletion.
Teach nbtree and heapam to cooperate in order to eagerly remove
duplicate tuples representing dead MVCC versions.  This is "bottom-up
deletion".  Each bottom-up deletion pass is triggered lazily in response
to a flood of versions on an nbtree leaf page.  This usually involves a
"logically unchanged index" hint (these are produced by the executor
mechanism added by commit 9dc718bd).

The immediate goal of bottom-up index deletion is to avoid "unnecessary"
page splits caused entirely by version duplicates.  It naturally has an
even more useful effect, though: it acts as a backstop against
accumulating an excessive number of index tuple versions for any given
_logical row_.  Bottom-up index deletion complements what we might now
call "top-down index deletion": index vacuuming performed by VACUUM.
Bottom-up index deletion responds to the immediate local needs of
queries, while leaving it up to autovacuum to perform infrequent clean
sweeps of the index.  The overall effect is to avoid certain
pathological performance issues related to "version churn" from UPDATEs.

The previous tableam interface used by index AMs to perform tuple
deletion (the table_compute_xid_horizon_for_tuples() function) has been
replaced with a new interface that supports certain new requirements.
Many (perhaps all) of the capabilities added to nbtree by this commit
could also be extended to other index AMs.  That is left as work for a
later commit.

Extend deletion of LP_DEAD-marked index tuples in nbtree by adding logic
to consider extra index tuples (that are not LP_DEAD-marked) for
deletion in passing.  This increases the number of index tuples deleted
significantly in many cases.  The LP_DEAD deletion process (which is now
called "simple deletion" to clearly distinguish it from bottom-up
deletion) won't usually need to visit any extra table blocks to check
these extra tuples.  We have to visit the same table blocks anyway to
generate a latestRemovedXid value (at least in the common case where the
index deletion operation's WAL record needs such a value).

Testing has shown that the "extra tuples" simple deletion enhancement
increases the number of index tuples deleted with almost any workload
that has LP_DEAD bits set in leaf pages.  That is, it almost never fails
to delete at least a few extra index tuples.  It helps most of all in
cases that happen to naturally have a lot of delete-safe tuples.  It's
not uncommon for an individual deletion operation to end up deleting an
order of magnitude more index tuples compared to the old naive approach
(e.g., custom instrumentation of the patch shows that this happens
fairly often when the regression tests are run).

Add a further enhancement that augments simple deletion and bottom-up
deletion in indexes that make use of deduplication: Teach nbtree's
_bt_delitems_delete() function to support granular TID deletion in
posting list tuples.  It is now possible to delete individual TIDs from
posting list tuples provided the TIDs have a tableam block number of a
table block that gets visited as part of the deletion process (visiting
the table block can be triggered directly or indirectly).  Setting the
LP_DEAD bit of a posting list tuple is still an all-or-nothing thing,
but that matters much less now that deletion only needs to start out
with the right _general_ idea about which index tuples are deletable.

Bump XLOG_PAGE_MAGIC because xl_btree_delete changed.

No bump in BTREE_VERSION, since there are no changes to the on-disk
representation of nbtree indexes.  Indexes built on PostgreSQL 12 or
PostgreSQL 13 will automatically benefit from bottom-up index deletion
(i.e. no reindexing required) following a pg_upgrade.  The enhancement
to simple deletion is available with all B-Tree indexes following a
pg_upgrade, no matter what PostgreSQL version the user upgrades from.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com
2021-01-13 09:21:32 -08:00
Peter Geoghegan 9dc718bdf2 Pass down "logically unchanged index" hint.
Add an executor aminsert() hint mechanism that informs index AMs that
the incoming index tuple (the tuple that accompanies the hint) is not
being inserted by execution of an SQL statement that logically modifies
any of the index's key columns.

The hint is received by indexes when an UPDATE takes place that does not
apply an optimization like heapam's HOT (though only for indexes where
all key columns are logically unchanged).  Any index tuple that receives
the hint on insert is expected to be a duplicate of at least one
existing older version that is needed for the same logical row.  Related
versions will typically be stored on the same index page, at least
within index AMs that apply the hint.

Recognizing the difference between MVCC version churn duplicates and
true logical row duplicates at the index AM level can help with cleanup
of garbage index tuples.  Cleanup can intelligently target tuples that
are likely to be garbage, without wasting too many cycles on less
promising tuples/pages (index pages with little or no version churn).

This is infrastructure for an upcoming commit that will teach nbtree to
perform bottom-up index deletion.  No index AM actually applies the hint
just yet.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=CEKFa74EScx_hFVshCOn6AA5T-ajFASTdzipdkLTNQQ@mail.gmail.com
2021-01-13 08:11:00 -08:00
Fujii Masao 39b03690b5 Log long wait time on recovery conflict when it's resolved.
This is a follow-up of the work done in commit 0650ff2303. This commit
extends log_recovery_conflict_waits so that a log message is produced
also when recovery conflict has already been resolved after deadlock_timeout
passes, i.e., when the startup process finishes waiting for recovery
conflict after deadlock_timeout. This is useful in investigating how long
recovery conflicts prevented the recovery from applying WAL.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Bertrand Drouvot
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
2021-01-13 22:59:17 +09:00
Amit Kapila ee1b38f659 Fix memory leak in SnapBuildSerialize.
The memory for the snapshot was leaked while serializing it to disk during
logical decoding. This memory will be freed only once walsender stops
streaming the changes. This can lead to a huge memory increase when master
logs Standby Snapshot too frequently say when the user is trying to create
many replication slots.

Reported-by: funnyxj.fxj@alibaba-inc.com
Diagnosed-by: funnyxj.fxj@alibaba-inc.com
Author: Amit Kapila
Backpatch-through: 9.5
Discussion: https://postgr.es/m/033ab54c-6393-42ee-8ec9-2b399b5d8cde.funnyxj.fxj@alibaba-inc.com
2021-01-13 08:19:50 +05:30
Amit Kapila bea449c635 Optimize DropRelFileNodesAllBuffers() for recovery.
Similar to commit d6ad34f341, this patch optimizes
DropRelFileNodesAllBuffers() by avoiding the complete buffer pool scan and
instead find the buffers to be invalidated by doing lookups in the
BufMapping table.

This optimization helps operations where the relation files need to be
removed like Truncate, Drop, Abort of Create Table, etc.

Author: Kirk Jamison
Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com
2021-01-13 07:46:11 +05:30
Michael Paquier fce7d0e6ef Fix routine name in comment of catcache.c
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACUDXLAkf_XxQO9tAUtnTNGi3Lmd8fANd+vBJbcHn1HoWA@mail.gmail.com
2021-01-13 10:32:21 +09:00
Alvaro Herrera c6c4b37395
Invent struct ReindexIndexInfo
This struct is used by ReindexRelationConcurrently to keep track of the
relations to process.  This saves having to obtain some data repeatedly,
and has future uses as well.

Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
2021-01-12 17:05:06 -03:00
Alvaro Herrera a3e51a36b7
Fix thinko in comment
This comment has been wrong since its introduction in commit
2c03216d83.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAzz6qipFJBbGEaHmyWxvvNDp8httbwLR9tUQWaTjUs2Q@mail.gmail.com
2021-01-12 11:48:45 -03:00
Amit Kapila 044aa9e70e Fix relation descriptor leak.
We missed closing the relation descriptor while sending changes via the
root of partitioned relations during logical replication.

Author: Amit Langote and Mark Zhao
Reviewed-by: Amit Kapila and Ashutosh Bapat
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/tencent_41FEA657C206F19AB4F406BE9252A0F69C06@qq.com
Discussion: https://postgr.es/m/tencent_6E296D2F7D70AFC90D83353B69187C3AA507@qq.com
2021-01-12 08:19:39 +05:30
Amit Kapila d6ad34f341 Optimize DropRelFileNodeBuffers() for recovery.
The recovery path of DropRelFileNodeBuffers() is optimized so that
scanning of the whole buffer pool can be avoided when the number of
blocks to be truncated in a relation is below a certain threshold. For
such cases, we find the buffers by doing lookups in BufMapping table.
This improves the performance by more than 100 times in many cases
when several small tables (tested with 1000 relations) are truncated
and where the server is configured with a large value of shared
buffers (greater than equal to 100GB).

This optimization helps cases (a) when vacuum or autovacuum truncated off
any of the empty pages at the end of a relation, or (b) when the relation is
truncated in the same transaction in which it was created.

This commit introduces a new API smgrnblocks_cached which returns a cached
value for the number of blocks in a relation fork. This helps us to determine
the exact size of relation which is required to apply this optimization. The
exact size is required to ensure that we don't leave any buffer for the
relation being dropped as otherwise the background writer or checkpointer
can lead to a PANIC error while flushing buffers corresponding to files that
don't exist.

Author: Kirk Jamison based on ideas by Amit Kapila
Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com
2021-01-12 07:45:40 +05:30
Tom Lane 4edf96846a Rethink SQLSTATE code for ERRCODE_IDLE_SESSION_TIMEOUT.
Move it to class 57 (Operator Intervention), which seems like a
better choice given that from the client's standpoint it behaves
a heck of a lot like, e.g., ERRCODE_ADMIN_SHUTDOWN.

In a green field I'd put ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT
here as well.  But that's been around for a few years, so it's
probably too late to change its SQLSTATE code.

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-11 14:53:42 -05:00
Thomas Munro ce6a71fa53 Use vectored I/O to fill new WAL segments.
Instead of making many block-sized write() calls to fill a new WAL file
with zeroes, make a smaller number of pwritev() calls (or various
emulations).  The actual number depends on the OS's IOV_MAX, which
PG_IOV_MAX currently caps at 32.  That means we'll write 256kB per call
on typical systems.  We may want to tune the number later with more
experience.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
2021-01-11 15:28:31 +13:00
Tom Lane afcc8772ed Fix ancient bug in parsing of BRE-mode regular expressions.
brenext(), when parsing a '*' quantifier, forgot to return any "value"
for the token; per the equivalent case in next(), it should return
value 1 to indicate that greedy rather than non-greedy behavior is
wanted.  The result is that the compiled regexp could behave like 'x*?'
rather than the intended 'x*', if we were unlucky enough to have
a zero in v->nextvalue at this point.  That seems to happen with some
reliability if we have '.*' at the beginning of a BRE-mode regexp,
although that depends on the initial contents of a stack-allocated
struct, so it's not guaranteed to fail.

Found by Alexander Lakhin using valgrind testing.  This bug seems
to be aboriginal in Spencer's code, so back-patch all the way.

Discussion: https://postgr.es/m/16814-6c5e3edd2bdf0d50@postgresql.org
2021-01-08 12:16:00 -05:00
Tom Lane b8d0cda533 Further second thoughts about idle_session_timeout patch.
On reflection, the order of operations in PostgresMain() is wrong.
These timeouts ought to be shut down before, not after, we do the
post-command-read CHECK_FOR_INTERRUPTS, to guarantee that any
timeout error will be detected there rather than at some ill-defined
later point (possibly after having wasted a lot of work).

This is really an error in the original idle_in_transaction_timeout
patch, so back-patch to 9.6 where that was introduced.
2021-01-07 11:45:23 -05:00
Fujii Masao 0650ff2303 Add GUC to log long wait times on recovery conflicts.
This commit adds GUC log_recovery_conflict_waits that controls whether
a log message is produced when the startup process is waiting longer than
deadlock_timeout for recovery conflicts. This is useful in determining
if recovery conflicts prevent the recovery from applying WAL.

Note that currently a log message is produced only when recovery conflict
has not been resolved yet even after deadlock_timeout passes, i.e.,
only when the startup process is still waiting for recovery conflict
even after deadlock_timeout.

Author: Bertrand Drouvot, Masahiko Sawada
Reviewed-by: Alvaro Herrera, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
2021-01-08 00:47:03 +09:00
Tom Lane 9486e7b666 Improve commentary in timeout.c.
On re-reading I realized that I'd missed one race condition in the new
timeout code.  It's safe, but add a comment explaining it.

Discussion: https://postgr.es/m/CA+hUKG+o6pbuHBJSGnud=TadsuXySWA7CCcPgCt2QE9F6_4iHQ@mail.gmail.com
2021-01-06 22:09:16 -05:00
Tom Lane 9877374bef Add idle_session_timeout.
This GUC variable works much like idle_in_transaction_session_timeout,
in that it kills sessions that have waited too long for a new client
query.  But it applies when we're not in a transaction, rather than
when we are.

Li Japin, reviewed by David Johnston and Hayato Kuroda, some
fixes by me

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-06 18:28:52 -05:00
Tom Lane 09cf1d5226 Improve timeout.c's handling of repeated timeout set/cancel.
A very common usage pattern is that we set a timeout that we don't
expect to reach, cancel it after a little bit, and later repeat.
With the original implementation of timeout.c, this results in one
setitimer() call per timeout set or cancel.  We can do a lot better
by being lazy about changing the timeout interrupt request, namely:
(1) never cancel the outstanding interrupt, even when we have no
active timeout events;
(2) if we need to set an interrupt, but there already is one pending
at or before the required time, leave it alone.  When the interrupt
happens, the signal handler will reschedule it at whatever time is
then needed.

For example, with a one-second setting for statement_timeout, this
method results in having to interact with the kernel only a little
more than once a second, no matter how many statements we execute
in between.  The mainline code might never call setitimer() at all
after the first time, while each time the signal handler fires,
it sees that the then-pending request is most of a second away,
and that's when it sets the next interrupt request for.  Each
mainline timeout-set request after that will observe that the time
it wants is past the pending interrupt request time, and do nothing.

This also works pretty well for cases where a few different timeout
lengths are in use, as long as none of them are very short.  But
that describes our usage well.

Idea and original patch by Thomas Munro; I fixed a race condition
and improved the comments.

Discussion: https://postgr.es/m/CA+hUKG+o6pbuHBJSGnud=TadsuXySWA7CCcPgCt2QE9F6_4iHQ@mail.gmail.com
2021-01-06 18:28:52 -05:00
Tomas Vondra 8a4f618e7a Report progress of COPY commands
This commit introduces a view pg_stat_progress_copy, reporting progress
of COPY commands.  This allows rough estimates how far a running COPY
progressed, with the caveat that the total number of bytes may not be
available in some cases (e.g. when the input comes from the client).

Author: Josef Šimánek
Reviewed-by: Fujii Masao, Bharath Rupireddy, Vignesh C, Matthias van de Meent
Discussion: https://postgr.es/m/CAFp7QwqMGEi4OyyaLEK9DR0+E+oK3UtA4bEjDVCa4bNkwUY2PQ@mail.gmail.com
Discussion: https://postgr.es/m/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg+FEKet4XaTGSW=LitKQ@mail.gmail.com
2021-01-06 21:51:06 +01:00
Peter Eisentraut 4656e3d668 Replace CLOBBER_CACHE_ALWAYS with run-time GUC
Forced cache invalidation (CLOBBER_CACHE_ALWAYS) has been impractical
to use for testing in PostgreSQL because it's so slow and because it's
toggled on/off only at build time.  It is helpful when hunting bugs in
any code that uses the sycache/relcache because causes cache
invalidations to be injected whenever it would be possible for an
invalidation to occur, whether or not one was really pending.

Address this by providing run-time control over cache clobber
behaviour using the new debug_invalidate_system_caches_always GUC.
Support is not compiled in at all unless assertions are enabled or
CLOBBER_CACHE_ENABLED is explicitly defined at compile time.  It
defaults to 0 if compiled in, so it has negligible effect on assert
build performance by default.

When support is compiled in, test code can now set
debug_invalidate_system_caches_always=1 locally to a backend to test
specific queries, functions, extensions, etc.  Or tests can toggle it
globally for a specific test case while retaining normal performance
during test setup and teardown.

For backwards compatibility with existing test harnesses and scripts,
debug_invalidate_system_caches_always defaults to 1 if
CLOBBER_CACHE_ALWAYS is defined, and to 3 if CLOBBER_CACHE_RECURSIVE
is defined.

CLOBBER_CACHE_ENABLED is now visible in pg_config_manual.h, as is the
related RECOVER_RELATION_BUILD_MEMORY setting for the relcache.

Author: Craig Ringer <craig.ringer@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMsr+YF=+ctXBZj3ywmvKNUjWpxmuTuUKuv-rgbHGX5i5pLstQ@mail.gmail.com
2021-01-06 10:46:44 +01:00
Fujii Masao 8900b5a9d5 Detect the deadlocks between backends and the startup process.
The deadlocks that the recovery conflict on lock is involved in can
happen between hot-standby backends and the startup process.
If a backend takes an access exclusive lock on the table and which
finally triggers the deadlock, that deadlock can be detected
as expected. On the other hand, previously, if the startup process
took an access exclusive lock and which finally triggered the deadlock,
that deadlock could not be detected and could remain even after
deadlock_timeout passed. This is a bug.

The cause of this bug was that the code for handling the recovery
conflict on lock didn't take care of deadlock case at all. It assumed
that deadlocks involving the startup process and backends were able
to be detected by the deadlock detector invoked within backends.
But this assumption was incorrect. The startup process also should
have invoked the deadlock detector if necessary.

To fix this bug, this commit makes the startup process invoke
the deadlock detector if deadlock_timeout is reached while handling
the recovery conflict on lock. Specifically, in that case, the startup
process requests all the backends holding the conflicting locks to
check themselves for deadlocks.

Back-patch to v9.6. v9.5 has also this bug, but per discussion we decided
not to back-patch the fix to v9.5. Because v9.5 doesn't have some
infrastructure codes (e.g., 37c54863cf) that this bug fix patch depends on.
We can apply those codes for the back-patch, but since the next minor
version release is the final one for v9.5, it's risky to do that. If we
unexpectedly introduce new bug to v9.5 by the back-patch, there is no
chance to fix that. We determined that the back-patch to v9.5 would give
more risk than gain.

Author: Fujii Masao
Reviewed-by: Bertrand Drouvot, Masahiko Sawada, Kyotaro Horiguchi
Discussion: https://postgr.es/m/4041d6b6-cf24-a120-36fa-1294220f8243@oss.nttdata.com
2021-01-06 12:39:18 +09:00
Amit Kapila e02e840ff7 Fix typos in decode.c and logical.c.
Per report by Ajin Cherian in email:
https://postgr.es/m/CAFPTHDYnRKDvzgDxoMn_CKqXA-D0MtrbyJvfvjBsO4G=UHDXkg@mail.gmail.com
2021-01-06 08:56:19 +05:30
Tom Lane bf8a662c9a Introduce a new GUC_REPORT setting "in_hot_standby".
Aside from being queriable via SHOW, this value is sent to the client
immediately at session startup, and again later on if the server gets
promoted to primary during the session.  The immediate report will be
used in an upcoming patch to avoid an extra round trip when trying to
connect to a primary server.

Haribabu Kommi, Greg Nancarrow, Tom Lane; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.

Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
2021-01-05 16:18:05 -05:00
Dean Rasheed fead67c24a Add an explicit cast to double when using fabs().
Commit bc43b7c2c0 used fabs() directly on an int variable, which
apparently requires an explicit cast on some platforms.

Per buildfarm.
2021-01-05 11:52:42 +00:00
Dean Rasheed bc43b7c2c0 Fix numeric_power() when the exponent is INT_MIN.
In power_var_int(), the computation of the number of significant
digits to use in the computation used log(Abs(exp)), which isn't safe
because Abs(exp) returns INT_MIN when exp is INT_MIN. Use fabs()
instead of Abs(), so that the exponent is cast to a double before the
absolute value is taken.

Back-patch to 9.6, where this was introduced (by 7d9a4737c2).

Discussion: https://postgr.es/m/CAEZATCVd6pMkz=BrZEgBKyqqJrt2xghr=fNc8+Z=5xC6cgWrWA@mail.gmail.com
2021-01-05 11:15:28 +00:00
Peter Geoghegan 83e3239ee7 Standardize one aspect of rmgr desc output.
Bring heap and hash rmgr desc output in line with nbtree and GiST desc
output by using the name latestRemovedXid for all fields that output the
contents of the latestRemovedXid field from the WAL record's C struct
(stop using local variants).

This seems like a clear improvement because latestRemovedXid is a symbol
name that already appears across many different source files, and so is
probably much more recognizable.

Discussion: https://postgr.es/m/CAH2-Wzkt_Rs4VqPSCk87nyjPAAEmWL8STU9zgET_83EF5YfrLw@mail.gmail.com
2021-01-04 19:46:11 -08:00
Amit Kapila cd357c7629 Fix typo in origin.c.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+PsReyuvww_Fn1NN_Vsv0wBP1bnzuhzRFr_2=y1nNZrG7w@mail.gmail.com
2021-01-05 08:05:08 +05:30
Amit Kapila 9da2224ea2 Fix typo in reorderbuffer.c.
Author: Zhijie Hou
Reviewed-by: Sawada Masahiko
Discussion: https://postgr.es/m/ba88bb58aaf14284abca16aec04bf279@G08CNEXMBPEKD05.g08.fujitsu.local
2021-01-05 07:56:40 +05:30
Thomas Munro 034510c820 Replace remaining uses of "whitelist".
Instead describe the action that the list effects, or just use "list"
where the meaning is obvious from context.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue%40alap3.anarazel.de
2021-01-05 14:00:16 +13:00
Thomas Munro c0d4f6d897 Rename "enum blacklist" to "uncommitted enums".
We agreed to remove this terminology and use something more descriptive.

Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue%40alap3.anarazel.de
2021-01-05 12:38:48 +13:00
Tom Lane 4bd3fad80e Fix integer-overflow corner cases in substring() functions.
If the substring start index and length overflow when added together,
substring() misbehaved, either throwing a bogus "negative substring
length" error on a case that should succeed, or failing to complain that
a negative length is negative (and instead returning the whole string,
in most cases).  Unsurprisingly, the text, bytea, and bit variants of
the function all had this issue.  Rearrange the logic to ensure that
negative lengths are always rejected, and add an overflow check to
handle the other case.

Also install similar guards into detoast_attr_slice() (nee
heap_tuple_untoast_attr_slice()), since it's far from clear that
no other code paths leading to that function could pass it values
that would overflow.

Patch by myself and Pavel Stehule, per bug #16804 from Rafi Shamim.

Back-patch to v11.  While these bugs are old, the common/int.h
infrastructure for overflow-detecting arithmetic didn't exist before
commit 4d6ad3125, and it doesn't seem like these misbehaviors are bad
enough to justify developing a standalone fix for the older branches.

Discussion: https://postgr.es/m/16804-f4eeeb6c11ba71d4@postgresql.org
2021-01-04 18:32:44 -05:00
Tom Lane 1c1cbe279b Rethink the "read/write parameter" mechanism in pl/pgsql.
Performance issues with the preceding patch to re-implement array
element assignment within pl/pgsql led me to realize that the read/write
parameter mechanism is misdesigned.  Instead of requiring the assignment
source expression to be such that *all* its references to the target
variable could be passed as R/W, we really want to identify *one*
reference to the target variable to be passed as R/W, allowing any other
ones to be passed read/only as they would be by default.  As long as the
R/W reference is a direct argument to the top-level (hence last to be
executed) function in the expression, there is no harm in R/O references
being passed to other lower parts of the expression.  Nor is there any
use-case for more than one argument of the top-level function being R/W.

Hence, rewrite that logic to identify one single Param that references
the target variable, and make only that Param pass a read/write
reference, not any other Params referencing the target variable.

Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
2021-01-04 12:39:27 -05:00
Tom Lane c9d5298485 Re-implement pl/pgsql's expression and assignment parsing.
Invent new RawParseModes that allow the core grammar to handle
pl/pgsql expressions and assignments directly, and thereby get rid
of a lot of hackery in pl/pgsql's parser.  This moves a good deal
of knowledge about pl/pgsql into the core code: notably, we have to
invent a CoercionContext that matches pl/pgsql's (rather dubious)
historical behavior for assignment coercions.  That's getting away
from the original idea of pl/pgsql as an arm's-length extension of
the core, but really we crossed that bridge a long time ago.

The main advantage of doing this is that we can now use the core
parser to generate FieldStore and/or SubscriptingRef nodes to handle
assignments to pl/pgsql variables that are records or arrays.  That
fixes a number of cases that had never been implemented in pl/pgsql
assignment, such as nested records and array slicing, and it allows
pl/pgsql assignment to support the datatype-specific subscripting
behaviors introduced in commit c7aba7c14.

There are cosmetic benefits too: when a syntax error occurs in a
pl/pgsql expression, the error report no longer includes the confusing
"SELECT" keyword that used to get prefixed to the expression text.
Also, there seem to be some small speed gains.

Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
2021-01-04 11:52:00 -05:00
Tom Lane 844fe9f159 Add the ability for the core grammar to have more than one parse target.
This patch essentially allows gram.y to implement a family of related
syntax trees, rather than necessarily always parsing a list of SQL
statements.  raw_parser() gains a new argument, enum RawParseMode,
to say what to do.  As proof of concept, add a mode that just parses
a TypeName without any other decoration, and use that to greatly
simplify typeStringToTypeName().

In addition, invent a new SPI entry point SPI_prepare_extended() to
allow SPI users (particularly plpgsql) to get at this new functionality.
In hopes of making this the last variant of SPI_prepare(), set up its
additional arguments as a struct rather than direct arguments, and
promise that future additions to the struct can default to zero.
SPI_prepare_cursor() and SPI_prepare_params() can perhaps go away at
some point.

Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
2021-01-04 11:03:22 -05:00
Michael Paquier b49154b3b7 Simplify some comments in xml.c
Author: Justin Pryzby
Discussion: https://postgr.es/m/X/Ff7jfnvJUab013@paquier.xyz
2021-01-04 19:47:58 +09:00
Amit Kapila a271a1b50e Allow decoding at prepare time in ReorderBuffer.
This patch allows PREPARE-time decoding of two-phase transactions (if the
output plugin supports this capability), in which case the transactions
are replayed at PREPARE and then committed later when COMMIT PREPARED
arrives.

Now that we decode the changes before the commit, the concurrent aborts
may cause failures when the output plugin consults catalogs (both system
and user-defined).

We detect such failures with a special sqlerrcode
ERRCODE_TRANSACTION_ROLLBACK introduced by commit 7259736a6e and stop
decoding the remaining changes. Then we rollback the changes when rollback
prepared is encountered.

Author: Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich
Reviewed-by: Amit Kapila, Peter Smith, Sawada Masahiko, Arseny Sher, and Dilip Kumar
Tested-by: Takamichi Osumi
Discussion:
https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru
https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
2021-01-04 08:34:50 +05:30
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Peter Geoghegan 32d6287d2e Get heap page max offset with buffer lock held.
On further reflection it seems better to call PageGetMaxOffsetNumber()
after acquiring a buffer lock on the page.  This shouldn't really
matter, but doing it this way is cleaner.

Follow-up to commit 42288174.

Backpatch: 12-, just like commit 42288174
2020-12-30 17:21:42 -08:00
Peter Geoghegan 4228817449 Fix index deletion latestRemovedXid bug.
The logic for determining the latest removed XID for the purposes of
generating recovery conflicts in REDO routines was subtly broken.  It
failed to follow links from HOT chains, and so failed to consider all
relevant heap tuple headers in some cases.

To fix, expand the loop that deals with LP_REDIRECT line pointers to
also deal with HOT chains.  The new version of the loop is loosely based
on a similar loop from heap_prune_chain().

The impact of this bug is probably quite limited, since the horizon code
necessarily deals with heap tuples that are pointed to by LP_DEAD-set
index tuples.  The process of setting LP_DEAD index tuples (e.g. within
the kill_prior_tuple mechanism) is highly correlated with opportunistic
pruning of pointed-to heap tuples.  Plus the question of generating a
recovery conflict usually comes up some time after index tuple LP_DEAD
bits were initially set, unlike heap pruning, where a latestRemovedXid
is generated at the point of the pruning operation (heap pruning has no
deferred "would-be page split" style processing that produces conflicts
lazily).

Only backpatch to Postgres 12, the first version where this logic runs
during original execution (following commit 558a9165e0).  The index
latestRemovedXid mechanism has had the same bug since it first appeared
over 10 years ago (in commit a760893d), but backpatching to all
supported versions now seems like a bad idea on balance.  Running the
new improved code during recovery seems risky, especially given the lack
of complaints from the field.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=Eib393+HHcERK_9MtgNS7Ew1HY=RDC_g6GL46zM5C6Q@mail.gmail.com
Backpatch: 12-
2020-12-30 16:29:05 -08:00
Alexander Korotkov 16d531a30a Refactor multirange_in()
This commit preserves the logic of multirange_in() but makes it more clear
what's going on.  Also, this commit fixes the compiler warning spotted by the
buildfarm.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/2246043.1609290699%40sss.pgh.pa.us
2020-12-30 21:17:34 +03:00
Tom Lane 7ca37fb040 Use setenv() in preference to putenv().
Since at least 2001 we've used putenv() and avoided setenv(), on the
grounds that the latter was unportable and not in POSIX.  However,
POSIX added it that same year, and by now the situation has reversed:
setenv() is probably more portable than putenv(), since POSIX now
treats the latter as not being a core function.  And setenv() has
cleaner semantics too.  So, let's reverse that old policy.

This commit adds a simple src/port/ implementation of setenv() for
any stragglers (we have one in the buildfarm, but I'd not be surprised
if that code is never used in the field).  More importantly, extend
win32env.c to also support setenv().  Then, replace usages of putenv()
with setenv(), and get rid of some ad-hoc implementations of setenv()
wannabees.

Also, adjust our src/port/ implementation of unsetenv() to follow the
POSIX spec that it returns an error indicator, rather than returning
void as per the ancient BSD convention.  I don't feel a need to make
all the call sites check for errors, but the portability stub ought
to match real-world practice.

Discussion: https://postgr.es/m/2065122.1609212051@sss.pgh.pa.us
2020-12-30 12:56:06 -05:00
Alexander Korotkov 62097a4cc8 Fix selectivity estimation @> (anymultirange, anyrange) operator
Attempt to get selectivity estimation for @> (anymultirange, anyrange) operator
caused an error in buildfarm, because this operator was missed in switch()
of calc_hist_selectivity().  Fix that and also make regression tests reliably
check that selectivity estimation for (multi)ranges doesn't fall.  Previously,
whether we test selectivity estimation for (multi)ranges depended on
whether autovacuum managed to gather concurrently to the test.

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/X%2BwmgjRItuvHNBeV%40paquier.xyz
2020-12-30 20:31:15 +03:00
Tom Lane 860fe27ee1 Fix up usage of krb_server_keyfile GUC parameter.
secure_open_gssapi() installed the krb_server_keyfile setting as
KRB5_KTNAME unconditionally, so long as it's not empty.  However,
pg_GSS_recvauth() only installed it if KRB5_KTNAME wasn't set already,
leading to a troubling inconsistency: in theory, clients could see
different sets of server principal names depending on whether they
use GSSAPI encryption.  Always using krb_server_keyfile seems like
the right thing, so make both places do that.  Also fix up
secure_open_gssapi()'s lack of a check for setenv() failure ---
it's unlikely, surely, but security-critical actions are no place
to be sloppy.

Also improve the associated documentation.

This patch does nothing about secure_open_gssapi()'s use of setenv(),
and indeed causes pg_GSS_recvauth() to use it too.  That's nominally
against project portability rules, but since this code is only built
with --with-gssapi, I do not feel a need to do something about this
in the back branches.  A fix will be forthcoming for HEAD though.

Back-patch to v12 where GSSAPI encryption was introduced.  The
dubious behavior in pg_GSS_recvauth() goes back further, but it
didn't have anything to be inconsistent with, so let it be.

Discussion: https://postgr.es/m/2187460.1609263156@sss.pgh.pa.us
2020-12-30 11:38:42 -05:00
Michael Paquier e665769e6d Sanitize IF NOT EXISTS in EXPLAIN for CTAS and matviews
IF NOT EXISTS was ignored when specified in an EXPLAIN query for CREATE
MATERIALIZED VIEW or CREATE TABLE AS.  Hence, if this clause was
specified, the caller would get a failure if the relation already
exists instead of a success with a NOTICE message.

This commit makes the behavior of IF NOT EXISTS in EXPLAIN consistent
with the non-EXPLAIN'd DDL queries, preventing a failure with IF NOT
EXISTS if the relation to-be-created already exists.  The skip is done
before the SELECT query used for the relation is planned or executed,
and a "dummy" plan is generated instead depending on the format used by
EXPLAIN.

Author: Bharath Rupireddy
Reviewed-by: Zhijie Hou, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVa3oJ9O_wcGd+FtHWZds04dEKcakxphGz5POVgD4wC7Q@mail.gmail.com
2020-12-30 21:23:24 +09:00
Amit Kapila 0aa8a01d04 Extend the output plugin API to allow decoding of prepared xacts.
This adds six methods to the output plugin API, adding support for
streaming changes of two-phase transactions at prepare time.

* begin_prepare
* filter_prepare
* prepare
* commit_prepared
* rollback_prepared
* stream_prepare

Most of this is a simple extension of the existing methods, with the
semantic difference that the transaction is not yet committed and maybe
aborted later.

Until now two-phase transactions were translated into regular transactions
on the subscriber, and the GID was not forwarded to it. None of the
two-phase commands were communicated to the subscriber.

This patch provides the infrastructure for logical decoding plugins to be
informed of two-phase commands Like PREPARE TRANSACTION, COMMIT PREPARED
and ROLLBACK PREPARED commands with the corresponding GID.

This also extends the 'test_decoding' plugin, implementing these new
methods.

This commit simply adds these new APIs and the upcoming patch to "allow
the decoding at prepare time in ReorderBuffer" will use these APIs.

Author: Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich
Reviewed-by: Amit Kapila, Peter Smith, Sawada Masahiko, and Dilip Kumar
Discussion:
https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru
https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
2020-12-30 16:17:26 +05:30
Tom Lane 1f9158ba48 Suppress log spam from multiple reports of SIGQUIT shutdown.
When the postmaster sends SIGQUIT to its children, there's no real
need for all the children to log that fact; the postmaster already
made a log entry about it, so adding perhaps dozens or hundreds of
child-process log entries adds nothing of value.  So, let's introduce
a new ereport level to specify "WARNING, but never send to log" and
use that for these messages.

Such a change wouldn't have been desirable before commit 7e784d1dc,
because if someone manually SIGQUIT's a backend, we *do* want to log
that.  But now we can tell the difference between a signal that was
issued by the postmaster and one that was not with reasonable
certainty.

While we're here, also clear error_context_stack before ereport'ing,
to prevent error callbacks from being invoked in the signal-handler
context.  This should reduce the odds of getting hung up while trying
to notify the client.

Per a suggestion from Andres Freund.

Discussion: https://postgr.es/m/20201225230331.hru3u6obyy6j53tk@alap3.anarazel.de
2020-12-29 18:02:38 -05:00
Alexander Korotkov db6335b5b1 Add support of multirange matching to the existing range GiST indexes
6df7a9698b has introduced a set of operators between ranges and multiranges.
Existing GiST indexes for ranges could easily support majority of them.
This commit adds support for new operators to the existing range GiST indexes.
New operators resides the same strategy numbers as existing ones.  Appropriate
check function is determined using the subtype.

Catversion is bumped.
2020-12-29 23:36:43 +03:00
Alexander Korotkov d1d61a8b23 Improve the signature of internal multirange functions
There is a set of *_internal() functions exposed in
include/utils/multirangetypes.h.  This commit improves the signatures of these
functions in two ways.
 * Add const qualifies where applicable.
 * Replace multirange typecache argument with range typecache argument.
   Multirange typecache was used solely to find the range typecache.  At the
   same time, range typecache is easier for the caller to find.
2020-12-29 23:35:38 +03:00
Alexander Korotkov 4d7684cc75 Implement operators for checking if the range contains a multirange
We have operators for checking if the multirange contains a range but don't
have the opposite.  This commit improves completeness of the operator set by
adding two new operators: @> (anyrange,anymultirange) and
<@(anymultirange,anyrange).

Catversion is bumped.
2020-12-29 23:35:33 +03:00
Alexander Korotkov a5b81b6f00 Fix bugs in comparison functions for multirange_bsearch_match()
Two functions multirange_range_overlaps_bsearch_comparison() and
multirange_range_contains_bsearch_comparison() contain bugs of returning -1
instead of 1.  This commit fixes these bugs and adds corresponding regression
tests.
2020-12-29 23:35:26 +03:00
Tom Lane 3995c42498 Improve log messages related to pg_hba.conf not matching a connection.
Include details on whether GSS encryption has been activated;
since we added "hostgssenc" type HBA entries, that's relevant info.

Kyotaro Horiguchi and Tom Lane.  Back-patch to v12 where
GSS encryption was introduced.

Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
2020-12-28 17:58:58 -05:00
Tom Lane 622ae4621e Fix assorted issues in backend's GSSAPI encryption support.
Unrecoverable errors detected by GSSAPI encryption can't just be
reported with elog(ERROR) or elog(FATAL), because attempting to
send the error report to the client is likely to lead to infinite
recursion or loss of protocol sync.  Instead make this code do what
the SSL encryption code has long done, which is to just report any
such failure to the server log (with elevel COMMERROR), then pretend
we've lost the connection by returning errno = ECONNRESET.

Along the way, fix confusion about whether message translation is done
by pg_GSS_error() or its callers (the latter should do it), and make
the backend version of that function work more like the frontend
version.

Avoid allocating the port->gss struct until it's needed; we surely
don't need to allocate it in the postmaster.

Improve logging of "connection authorized" messages with GSS enabled.
(As part of this, I back-patched the code changes from dc11f31a1.)

Make BackendStatusShmemSize() account for the GSS-related space that
will be allocated by CreateSharedBackendStatus().  This omission
could possibly cause out-of-shared-memory problems with very high
max_connections settings.

Remove arbitrary, pointless restriction that only GSS authentication
can be used on a GSS-encrypted connection.

Improve documentation; notably, document the fact that libpq now
prefers GSS encryption over SSL encryption if both are possible.

Per report from Mikael Gustavsson.  Back-patch to v12 where
this code was introduced.

Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
2020-12-28 17:44:17 -05:00
Michael Paquier 643428c54b Fix inconsistent code with shared invalidations of snapshots
The code in charge of processing a single invalidation message has been
using since 568d413 the structure for relation mapping messages.  This
had fortunately no consequence as both locate the database ID at the
same location, but it could become a problem in the future if this area
of the code changes.

Author: Konstantin Knizhnik
Discussion: https://postgr.es/m/8044c223-4d3a-2cdb-42bf-29940840ce94@postgrespro.ru
Backpatch-through: 9.5
2020-12-28 22:16:49 +09:00
Bruce Momjian 3187ef7c46 Revert "Add key management system" (978f869b99) & later commits
The patch needs test cases, reorganization, and cfbot testing.
Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive)
and 08db7c63f3..ccbe34139b.

Reported-by: Tom Lane, Michael Paquier

Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
2020-12-27 21:37:42 -05:00
Jeff Davis 05c0258966 Fix bug #16784 in Disk-based Hash Aggregation.
Before processing tuples, agg_refill_hash_table() was setting all
pergroup pointers to NULL to signal to advance_aggregates() that it
should not attempt to advance groups that had spilled.

The problem was that it also set the pergroups for sorted grouping
sets to NULL, which caused rescanning to fail.

Instead, change agg_refill_hash_table() to only set the pergroups for
hashed grouping sets to NULL; and when compiling the expression, pass
doSort=false.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/16784-7ff169bf2c3d1588%40postgresql.org
Backpatch-through: 13
2020-12-26 17:25:30 -08:00
Bruce Momjian ba6725df36 auth commands: list specific commands to install in Makefile
Previously I used Makefile functions.

Backpatch-through: master
2020-12-26 09:25:05 -05:00
Bruce Momjian d7602afa2e Add scripts for retrieving the cluster file encryption key
Scripts are passphrase, direct, AWS, and two Yubikey ones.

Backpatch-through: master
2020-12-26 01:19:09 -05:00
Bruce Momjian 300e430c76 Allow ssl_passphrase_command to prompt the terminal
Previously the command could not access the terminal for a passphrase.

Backpatch-through: master
2020-12-25 20:41:06 -05:00
Noah Misch 08db7c63f3 Invalidate acl.c caches when pg_authid changes.
This makes existing sessions reflect "ALTER ROLE ... [NO]INHERIT" as
quickly as they have been reflecting "GRANT role_name".  Back-patch to
9.5 (all supported versions).

Reviewed by Nathan Bossart.

Discussion: https://postgr.es/m/20201221095028.GB3777719@rfd.leadboat.com
2020-12-25 10:41:59 -08:00
Bruce Momjian 978f869b99 Add key management system
This adds a key management system that stores (currently) two data
encryption keys of length 128, 192, or 256 bits.  The data keys are
AES256 encrypted using a key encryption key, and validated via GCM
cipher mode.  A command to obtain the key encryption key must be
specified at initdb time, and will be run at every database server
start.  New parameters allow a file descriptor open to the terminal to
be passed.  pg_upgrade support has also been added.

Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com
Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us

Author: Masahiko Sawada, me, Stephen Frost
2020-12-25 10:19:44 -05:00
Bruce Momjian c3826f831e move hex_decode() to /common so it can be called from frontend
This allows removal of a copy of hex_decode() from ecpg, and will be
used by the soon-to-be added pg_alterckey command.

Backpatch-through: master
2020-12-24 17:25:48 -05:00
Tom Lane 7519bd16d1 Fix race condition between shutdown and unstarted background workers.
If a database shutdown (smart or fast) is commanded between the time
some process decides to request a new background worker and the time
that the postmaster can launch that worker, then nothing happens
because the postmaster won't launch any bgworkers once it's exited
PM_RUN state.  This is fine ... unless the requesting process is
waiting for that worker to finish (or even for it to start); in that
case the requestor is stuck, and only manual intervention will get us
to the point of being able to shut down.

To fix, cancel pending requests for workers when the postmaster sends
shutdown (SIGTERM) signals, and similarly cancel any new requests that
arrive after that point.  (We can optimize things slightly by only
doing the cancellation for workers that have waiters.)  To fit within
the existing bgworker APIs, the "cancel" is made to look like the
worker was started and immediately stopped, causing deregistration of
the bgworker entry.  Waiting processes would have to deal with
premature worker exit anyway, so this should introduce no bugs that
weren't there before.  We do have a side effect that registration
records for restartable bgworkers might disappear when theoretically
they should have remained in place; but since we're shutting down,
that shouldn't matter.

Back-patch to v10.  There might be value in putting this into 9.6
as well, but the management of bgworkers is a bit different there
(notably see 8ff518699) and I'm not convinced it's worth the effort
to validate the patch for that branch.

Discussion: https://postgr.es/m/661570.1608673226@sss.pgh.pa.us
2020-12-24 17:00:43 -05:00
Tom Lane 7e784d1dc1 Improve client error messages for immediate-stop situations.
Up to now, if the DBA issued "pg_ctl stop -m immediate", the message
sent to clients was the same as for a crash-and-restart situation.
This is confusing, not least because the message claims that the
database will soon be up again, something we have no business
predicting.

Improve things so that we can generate distinct messages for the two
cases (and also recognize an ad-hoc SIGQUIT, should somebody try that).
To do that, add a field to pmsignal.c's shared memory data structure
that the postmaster sets just before broadcasting SIGQUIT to its
children.  No interlocking seems to be necessary; the intervening
signal-sending and signal-receipt should sufficiently serialize accesses
to the field.  Hence, this isn't any riskier than the existing usages
of pmsignal.c.

We might in future extend this idea to improve other
postmaster-to-children signal scenarios, although none of them
currently seem to be as badly overloaded as SIGQUIT.

Discussion: https://postgr.es/m/559291.1608587013@sss.pgh.pa.us
2020-12-24 12:58:32 -05:00
Michael Paquier 90fbf7c57d Fix typos and grammar in docs and comments
This fixes several areas of the documentation and some comments in
matters of style, grammar, or even format.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20201222041153.GK30237@telsasoft.com
2020-12-24 17:05:49 +09:00
Michael Paquier 6db27037b9 Fix portability issues with parsing of recovery_target_xid
The parsing of this parameter has been using strtoul(), which is not
portable across platforms.  On most Unix platforms, unsigned long has a
size of 64 bits, while on Windows it is 32 bits.  It is common in
recovery scenarios to rely on the output of txid_current() or even the
newer pg_current_xact_id() to get a transaction ID for setting up
recovery_target_xid.  The value returned by those functions includes the
epoch in the computed result, which would cause strtoul() to fail where
unsigned long has a size of 32 bits once the epoch is incremented.

WAL records and 2PC data include only information about 32-bit XIDs and
it is not possible to have XIDs across more than one epoch, so
discarding the high bits from the transaction ID set has no impact on
recovery.  On the contrary, the use of strtoul() prevents a consistent
behavior across platforms depending on the size of unsigned long.

This commit changes the parsing of recovery_target_xid to use
pg_strtouint64() instead, available down to 9.6.  There is one TAP test
stressing recovery with recovery_target_xid, where a tweak based on
pg_reset{xlog,wal} is added to bump the XID epoch so as this change gets
tested, as per an idea from Alexander Lakhin.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/16780-107fd0c0385b1035@postgresql.org
Backpatch-through: 9.6
2020-12-23 12:51:22 +09:00
Tomas Vondra 1ca2eb1031 Improve find_em_expr_usable_for_sorting_rel comment
Clarify the relationship between find_em_expr_usable_for_sorting_rel and
prepare_sort_from_pathkeys, i.e. what restrictions need to be shared
between those two places.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com
2020-12-22 02:00:51 +01:00
Tomas Vondra 9aff4dc01f Don't search for volatile expr in find_em_expr_usable_for_sorting_rel
While prepare_sort_from_pathkeys has to be concerned about matching up
a volatile expression to the proper tlist entry, we don't need to do
that in find_em_expr_usable_for_sorting_rel becausee such a sort will
have to be postponed anyway.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com
2020-12-21 20:05:11 +01:00
Tomas Vondra fac1b470a9 Disallow SRFs when considering sorts below Gather Merge
While we do allow SRFs in ORDER BY, scan/join processing should not
consider such cases - such sorts should only happen via final Sort atop
a ProjectSet. So make sure we don't try adding such sorts below Gather
Merge, just like we do for expressions that are volatile and/or not
parallel safe.

Backpatch to PostgreSQL 13, where this code was introduced as part of
the Incremental Sort patch.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
Discussion: https://postgr.es/m/295524.1606246314%40sss.pgh.pa.us
2020-12-21 19:36:22 +01:00
Tom Lane ff5d5611c0 Remove "invalid concatenation of jsonb objects" error case.
The jsonb || jsonb operator arbitrarily rejected certain combinations
of scalar and non-scalar inputs, while being willing to concatenate
other combinations.  This was of course quite undocumented.  Rather
than trying to document it, let's just remove the restriction,
creating a uniform rule that unless we are handling an object-to-object
concatenation, non-array inputs are converted to one-element arrays,
resulting in an array-to-array concatenation.  (This does not change
the behavior for any case that didn't throw an error before.)

Per complaint from Joel Jacobson.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/163099.1608312033@sss.pgh.pa.us
2020-12-21 13:11:50 -05:00
Tomas Vondra 86b7cca72d Check parallel safety in generate_useful_gather_paths
Commit ebb7ae839d ensured we ignore pathkeys with volatile expressions
when considering adding a sort below a Gather Merge. Turns out we need
to care about parallel safety of the pathkeys too, otherwise we might
try sorting e.g. on results of a correlated subquery (as demonstrated
by a report from Luis Roberto).

Initial investigation by Tom Lane, patch by James Coleman. Backpatch
to 13, where the code was instroduced (as part of Incremental Sort).

Reported-by: Luis Roberto
Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/622580997.37108180.1604080457319.JavaMail.zimbra%40siscobra.com.br
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
2020-12-21 18:29:49 +01:00
Tomas Vondra f4a3c0b062 Consider unsorted paths in generate_useful_gather_paths
generate_useful_gather_paths used to skip unsorted paths (without any
pathkeys), but that is unnecessary - the later code actually can handle
such paths just fine by adding a Sort node. This is clearly a thinko,
preventing construction of useful plans.

Backpatch to 13, where Incremental Sort was introduced.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
2020-12-21 18:10:20 +01:00
Alexander Korotkov 29f8f54676 Fix compiler warning in multirange_constructor0()
Discussion: https://postgr.es/m/X%2BBP8XE0UpIB6Yvh%40paquier.xyz
Author: Michael Paquier
2020-12-21 14:25:32 +03:00
Michael Paquier 93e8ff8701 Refactor logic to check for ASCII-only characters in string
The same logic was present for collation commands, SASLprep and
pgcrypto, so this removes some code.

Author: Michael Paquier
Reviewed-by: Stephen Frost, Heikki Linnakangas
Discussion: https://postgr.es/m/X9womIn6rne6Gud2@paquier.xyz
2020-12-21 09:37:11 +09:00
Alexander Korotkov 4e1ee79e31 Fix typalign in rangetypes statistics
6df7a9698b introduces multirange types, whose typanalyze function shares
infrastructure with range types typanalyze function.  Since 6df7a9698b,
information about type gathered by statistics is filled from typcache.
But typalign is mistakenly always set to double.  This commit fixes this
oversight.
2020-12-21 00:31:11 +03:00
Tom Lane ed6329cfa9 Avoid memcpy() with same source and destination in pgstat_recv_replslot.
Same type of issue as in commit 53d4f5fef and earlier fixes; also
found by apparently-more-picky-than-the-buildfarm valgrind testing.
This one is an oversight in commit 986816750.  Since that's new in
HEAD, no need for a back-patch.
2020-12-20 12:38:32 -05:00
Alexander Korotkov 11072e8693 Fix compiler warning introduced in 6df7a9698b 2020-12-20 16:27:01 +03:00
Alexander Korotkov 6df7a9698b Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.

Since v14, each range type automatically gets a corresponding multirange
datatype.  There are both manual and automatic mechanisms for naming multirange
types.  Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE.  Otherwise, a multirange type name is generated
automatically.  If the range type name contains "range" then we change that to
"multirange".  Otherwise, we add "_multirange" to the end.

Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids.  Altogether this format allows fetching a particular range by its index
in O(n).

Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.

Catversion is bumped.

Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 07:20:33 +03:00
Amit Kapila 20659fd8e5 Update comment atop of ReorderBufferQueueMessage().
The comments atop of this function describes behaviour in case of a
transactional WAL message only, but it accepts both transactional and
non-transactional WAL messages. Update the comments to describe
behaviour in case of non-transactional WAL message as well.

Ashutosh Bapat, rephrased by Amit Kapila
Discussion: https://postgr.es/m/CAGEoWWTTzNzHOi8bj0wfAo1siGi-YEh6wqH1oaz4DrkTJ6HbTQ@mail.gmail.com
2020-12-19 10:08:46 +05:30
Tom Lane 53d4f5fef0 Avoid memcpy() with same source and destination during relmapper init.
A narrow reading of the C standard says that memcpy(x,x,n) is undefined,
although it's hard to envision an implementation that would really
misbehave.  However, analysis tools such as valgrind might whine about
this; accordingly, let's band-aid relmapper.c to not do it.

See also 5b630501e, d3f4e8a8a, ad7b48ea0, and other similar fixes.
Apparently, none of those folk tried valgrinding initdb?  This has been
like this for long enough that I'm surprised it hasn't been reported
before.

Back-patch, just in case anybody wants to use a back branch on a platform
that complains about this; we back-patched those earlier fixes too.

Discussion: https://postgr.es/m/161790.1608310142@sss.pgh.pa.us
2020-12-18 15:46:44 -05:00
Fujii Masao 00f690a239 Revert "Get rid of the dedicated latch for signaling the startup process".
Revert ac22929a26, as well as the followup fix 113d3591b8. Because it broke
the assumption that the startup process waiting for the recovery conflict
on buffer pin should be waken up only by buffer unpin or the timeout enabled
in ResolveRecoveryConflictWithBufferPin(). It caused, for example,
SIGHUP signal handler or walreceiver process to wake that startup process
up unnecessarily frequently.

Additionally, add the comments about why that dedicated latch that
the reverted patch tried to get rid of should not be removed.

Thanks to Kyotaro Horiguchi for the discussion.

Author: Fujii Masao
Discussion: https://postgr.es/m/d8c0c608-021b-3c73-fffd-3240829ee986@oss.nttdata.com
2020-12-17 18:06:51 +09:00
Peter Geoghegan 41ddc27f66 Remove obsolete btrescan() comment.
"Ordering stuff" refered to a _bt_first() call to _bt_orderkeys().
However, the _bt_orderkeys() function was renamed to
_bt_preprocess_keys() by commit fa5c8a055a.

_bt_preprocess_keys() is directly referenced just after the removed
comment already, which seems sufficient.
2020-12-15 15:55:07 -08:00
Alvaro Herrera a18422a3ad
Remove useless variable stores
Mistakenly introduced in 4cbe3ac3e867; bug repaired in 148e632c05 but
the stores were accidentally.
2020-12-15 19:51:16 -03:00
Tomas Vondra 6bc2769832 Error out when Gather Merge input is not sorted
To build Gather Merge path, the input needs to be sufficiently sorted.
Ensuring this is the responsibility of the code constructing the paths,
but create_gather_merge_plan tried to handle unsorted paths by adding
an explicit Sort. In light of the recent issues related to Incremental
Sort, this is rather fragile. Some of the expressions may be volatile
or parallel unsafe, in which case we can't add the Sort here.

We could do more checks and add the Sort in at least some cases, but
it seems cleaner to just error out and make it clear this is a bug in
code constructing those paths.

Author: James Coleman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com
Discussion: https://postgr.es/m/CAJGNTeNaxpXgBVcRhJX%2B2vSbq%2BF2kJqGBcvompmpvXb7pq%2BoFA%40mail.gmail.com
2020-12-15 23:19:41 +01:00
Tom Lane b3817f5f77 Improve hash_create()'s API for some added robustness.
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which
was formerly the default; and add assertions insisting that exactly
one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set.
This is in hopes of preventing recurrences of the type of oversight
fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS).

Also, when HASH_STRINGS is specified, insist that the keysize be
more than 8 bytes.  This is a heuristic, but it should catch
accidental use of HASH_STRINGS for integer or pointer keys.
(Nearly all existing use-cases set the keysize to NAMEDATALEN or
more, so there's little reason to think this restriction should
be problematic.)

Tweak hash_create() to insist that the HASH_ELEM flag be set, and
remove the defaults it had for keysize and entrysize.  Since those
defaults were undocumented and basically useless, no callers
omitted HASH_ELEM anyway.

Also, remove memset's zeroing the HASHCTL parameter struct from
those callers that had one.  This has never been really necessary,
and while it wasn't a bad coding convention it was confusing that
some callers did it and some did not.  We might as well save a few
cycles by standardizing on "not".

Also improve the documentation for hash_create().

In passing, improve reinit.c's usage of a hash table by storing
the key as a binary Oid rather than a string; and, since that's
a temporary hash table, allocate it in CurrentMemoryContext for
neatness.

Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
2020-12-15 11:38:53 -05:00
Jeff Davis a58db3aa10 Revert "Cannot use WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE."
This reverts commit 3a9e64aa0d.

Commit 4bad60e3 fixed the root of the problem that 3a9e64aa worked
around.

This enables proper pipelining of commands after terminating
replication, eliminating an undocumented limitation.

Discussion: https://postgr.es/m/3d57bc29-4459-578b-79cb-7641baf53c57%40iki.fi
Backpatch-through: 9.5
2020-12-14 23:47:30 -08:00
Michael Paquier df9274adf3 Add some checkpoint/restartpoint status to ps display
This is done for end-of-recovery and shutdown checkpoints/restartpoints
(end-of-recovery restartpoints don't exist) rather than all types of
checkpoints, in cases where it may not be possible to rely on
pg_stat_activity to get a status from the startup or checkpointer
processes.

For example, at the end of a crash recovery, this is useful to know if a
checkpoint is running in the startup process, while previously the ps
display may only show some information about "recovering" something,
that can be confusing while a checkpoint runs.

Author: Justin Pryzby
Reviewed-by: Nathan Bossart, Kirk Jamison, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/20200818225238.GP17022@telsasoft.com
2020-12-14 11:53:58 +09:00
Noah Misch a1b8aa1e4e Use HASH_BLOBS for xidhash.
This caused BufFile errors on buildfarm member sungazer, and SIGSEGV was
possible.  Conditions for reaching those symptoms were more frequent on
big-endian systems.

Discussion: https://postgr.es/m/20201129214441.GA691200@rfd.leadboat.com
2020-12-12 21:38:36 -08:00
Noah Misch 73aae4522b Correct behavior descriptions in comments, and correct a test name. 2020-12-12 20:12:25 -08:00
Tom Lane 8c15a29745 Allow ALTER TYPE to update an existing type's typsubscript value.
This is essential if we'd like to allow existing extension data types
to support subscripting in future, since dropping and recreating the
type isn't a practical thing for an extension upgrade script, and
direct manipulation of pg_type isn't a great answer either.

There was some discussion about also allowing alteration of typelem,
but it's less clear whether that's a good idea or not, so for now
I forebore.

Discussion: https://postgr.es/m/3724341.1607551174@sss.pgh.pa.us
2020-12-11 18:58:21 -05:00
Tom Lane 653aa603f5 Provide an error cursor for "can't subscript" error messages.
Commit c7aba7c14 didn't add this, but after more fooling with the
feature I feel that it'd be useful.  To make this possible, refactor
getSubscriptingRoutines() so that the caller is responsible for
throwing any error.  (In clauses.c, I just chose to make the
most conservative assumption rather than throwing an error.  We don't
expect failures there anyway really, so the code space for an error
message would be a poor investment.)
2020-12-11 18:58:21 -05:00
Tom Lane c7aba7c14e Support subscripting of arbitrary types, not only arrays.
This patch generalizes the subscripting infrastructure so that any
data type can be subscripted, if it provides a handler function to
define what that means.  Traditional variable-length (varlena) arrays
all use array_subscript_handler(), while the existing fixed-length
types that support subscripting use raw_array_subscript_handler().
It's expected that other types that want to use subscripting notation
will define their own handlers.  (This patch provides no such new
features, though; it only lays the foundation for them.)

To do this, move the parser's semantic processing of subscripts
(including coercion to whatever data type is required) into a
method callback supplied by the handler.  On the execution side,
replace the ExecEvalSubscriptingRef* layer of functions with direct
calls to callback-supplied execution routines.  (Thus, essentially
no new run-time overhead should be caused by this patch.  Indeed,
there is room to remove some overhead by supplying specialized
execution routines.  This patch does a little bit in that line,
but more could be done.)

Additional work is required here and there to remove formerly
hard-wired assumptions about the result type, collation, etc
of a SubscriptingRef expression node; and to remove assumptions
that the subscript values must be integers.

One useful side-effect of this is that we now have a less squishy
mechanism for identifying whether a data type is a "true" array:
instead of wiring in weird rules about typlen, we can look to see
if pg_type.typsubscript == F_ARRAY_SUBSCRIPT_HANDLER.  For this
to be bulletproof, we have to forbid user-defined types from using
that handler directly; but there seems no good reason for them to
do so.

This patch also removes assumptions that the number of subscripts
is limited to MAXDIM (6), or indeed has any hard-wired limit.
That limit still applies to types handled by array_subscript_handler
or raw_array_subscript_handler, but to discourage other dependencies
on this constant, I've moved it from c.h to utils/array.h.

Dmitry Dolgov, reviewed at various times by Tom Lane, Arthur Zakirov,
Peter Eisentraut, Pavel Stehule

Discussion: https://postgr.es/m/CA+q6zcVDuGBv=M0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w@mail.gmail.com
Discussion: https://postgr.es/m/CA+q6zcVovR+XY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA@mail.gmail.com
2020-12-09 12:40:37 -05:00
Peter Eisentraut 8b069ef5dc Change get_constraint_index() to use pg_constraint.conindid
It was still using a scan of pg_depend instead of using the conindid
column that has been added since.

Since it is now just a catalog lookup wrapper and not related to
pg_depend, move from pg_depend.c to lsyscache.c.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/4688d55c-9a2e-9a5a-d166-5f24fe0bf8db%40enterprisedb.com
2020-12-09 15:41:45 +01:00
Andres Freund df99ddc70b jit: Reference function pointer types via llvmjit_types.c.
It is error prone (see 5da871bfa1) and verbose to manually create function
types. Add a helper that can reference a function pointer type via
llvmjit_types.c and and convert existing instances of manual creation.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20201207212142.wz5tnbk2jsaqzogb@alap3.anarazel.de
2020-12-08 16:55:20 -08:00
Tom Lane 62ee703313 Teach contain_leaked_vars that assignment SubscriptingRefs are leaky.
array_get_element and array_get_slice qualify as leakproof, since
they will silently return NULL for bogus subscripts.  But
array_set_element and array_set_slice throw errors for such cases,
making them clearly not leakproof.  contain_leaked_vars was evidently
written with only the former case in mind, as it gave the wrong answer
for assignment SubscriptingRefs (nee ArrayRefs).

This would be a live security bug, were it not that assignment
SubscriptingRefs can only occur in INSERT and UPDATE target lists,
while we only care about leakproofness for qual expressions; so the
wrong answer can't occur in practice.  Still, that's a rather shaky
answer for a security-related question; and maybe in future somebody
will want to ask about leakproofness of a tlist.  So it seems wise to
fix and even back-patch this correction.

(We would need some change here anyway for the upcoming
generic-subscripting patch, since extensions might make different
tradeoffs about whether to throw errors.  Commit 558d77f20 attempted
to lay groundwork for that by asking check_functions_in_node whether a
SubscriptingRef contains leaky functions; but that idea fails now that
the implementation methods of a SubscriptingRef are not SQL-visible
functions that could be marked leakproof or not.)

Back-patch to 9.6.  While 9.5 has the same issue, the code's a bit
different.  It seems quite unlikely that we'd introduce any actual bug
in the short time 9.5 has left to live, so the work/risk/reward balance
isn't attractive for changing 9.5.

Discussion: https://postgr.es/m/3143742.1607368115@sss.pgh.pa.us
2020-12-08 17:50:54 -05:00
Tom Lane a676386b58 Remove operator_precedence_warning.
This GUC was always intended as a temporary solution to help with
finding 9.4-to-9.5 migration issues.  Now that all pre-9.5 branches
are out of support, and 9.5 will be too before v14 is released,
it seems like it's okay to drop it.  Doing so allows removal of
several hundred lines of poorly-tested code in parse_expr.c,
which have been a fertile source of bugs when people did use this.

Discussion: https://postgr.es/m/2234320.1607117945@sss.pgh.pa.us
2020-12-08 16:29:52 -05:00
Dean Rasheed 4f5760d4af Improve estimation of ANDs under ORs using extended statistics.
Formerly, extended statistics only handled clauses that were
RestrictInfos. However, the restrictinfo machinery doesn't create
sub-AND RestrictInfos for AND clauses underneath OR clauses.
Therefore teach extended statistics to handle bare AND clauses,
looking for compatible RestrictInfo clauses underneath them.

Dean Rasheed, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com
2020-12-08 20:10:11 +00:00