Commit Graph

20734 Commits

Author SHA1 Message Date
Tom Lane
2f48ede080 Avoid using a cursor in plpgsql's RETURN QUERY statement.
plpgsql has always executed the query given in a RETURN QUERY command
by opening it as a cursor and then fetching a few rows at a time,
which it turns around and dumps into the function's result tuplestore.
The point of this was to keep from blowing out memory with an oversized
SPITupleTable result (note that while a tuplestore can spill tuples
to disk, SPITupleTable cannot).  However, it's rather inefficient, both
because of extra data copying and because of executor entry/exit
overhead.  In recent versions, a new performance problem has emerged:
use of a cursor prevents use of a parallel plan for the executed query.

We can improve matters by skipping use of a cursor and having the
executor push result tuples directly into the function's result
tuplestore.  However, a moderate amount of new infrastructure is needed
to make that idea work:

* We can use the existing tstoreReceiver.c DestReceiver code to funnel
executor output to the tuplestore, but it has to be extended to support
plpgsql's requirement for possibly applying a tuple conversion map.

* SPI needs to be extended to allow use of a caller-supplied
DestReceiver instead of its usual receiver that puts tuples into
a SPITupleTable.  Two new API calls are needed to handle both the
RETURN QUERY and RETURN QUERY EXECUTE cases.

I also felt that I didn't want these new API calls to use the legacy
method of specifying query parameter values with "char" null flags
(the old ' '/'n' convention); rather they should accept ParamListInfo
objects containing the parameter type and value info.  This required
a bit of additional new infrastructure since we didn't yet have any
parse analysis callback that would interpret $N parameter symbols
according to type data supplied in a ParamListInfo.  There seems to be
no harm in letting makeParamList install that callback by default,
rather than leaving a new ParamListInfo's parserSetup hook as NULL.
(Indeed, as of HEAD, I couldn't find anyplace that was using the
parserSetup field at all; plpgsql was using parserSetupArg for its
own purposes, but parserSetup seemed to be write-only.)

We can actually get plpgsql out of the business of using legacy null
flags altogether, and using ParamListInfo instead of its ad-hoc
PreparedParamsData structure; but this requires inventing one more
SPI API call that can replace SPI_cursor_open_with_args.  That seems
worth doing, though.

SPI_execute_with_args and SPI_cursor_open_with_args are now unused
anywhere in the core PG distribution.  Perhaps someday we could
deprecate/remove them.  But cleaning up the crufty bits of the SPI
API is a task for a different patch.

Per bug #16040 from Jeremy Smith.  This is unfortunately too invasive to
consider back-patching.  Patch by me; thanks to Hamid Akhtar for review.

Discussion: https://postgr.es/m/16040-eaacad11fecfb198@postgresql.org
2020-06-12 12:14:32 -04:00
Michael Paquier
aaf8c99050 Fix typos and some format mistakes in comments
Author: Justin Pryzby
Discussion: https://postgr.es/m/20200612023709.GC14879@telsasoft.com
2020-06-12 21:05:10 +09:00
Peter Eisentraut
ffd2582297 Make more use of RELKIND_HAS_STORAGE()
Make use of RELKIND_HAS_STORAGE() where appropriate, instead of
listing out the relkinds individually.  No behavior change intended.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/7a22bf51-2480-d999-1794-191ba67ff47c%402ndquadrant.com
2020-06-12 09:10:26 +02:00
Thomas Munro
7aa4fb5925 Improve comments for [Heap]CheckForSerializableConflictOut().
Rewrite the documentation of these functions, in light of recent bug fix
commit 5940ffb2.

Back-patch to 13 where the check-for-conflict-out code was split up into
AM-specific and generic parts, and new documentation was added that now
looked wrong.

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443%40jepsen.io
2020-06-12 10:55:38 +12:00
Tom Lane
77a3be32f7 Fix mishandling of NaN counts in numeric_[avg_]combine.
When merging two NumericAggStates, the code missed adding the new
state's NaNcount unless its N was also nonzero; since those counts
are independent, this is wrong.

This would only have visible effect if some partial aggregate scans
found only NaNs while earlier ones found only non-NaNs; then we could
end up falsely deciding that there were no NaNs and fail to return a
NaN final result as expected.  That's pretty improbable, so it's no
surprise this hasn't been reported from the field.  Still, it's a bug.

I didn't try to produce a regression test that would show the bug,
but I did notice that these functions weren't being reached at all
in our regression tests, so I improved the tests to at least
exercise them.  With these additions, I see pretty complete code
coverage on the aggregation-related functions in numeric.c.

Back-patch to 9.6 where this code was introduced.  (I only added
the improved test case as far back as v10, though, since the
relevant part of aggregates.sql isn't there at all in 9.6.)
2020-06-11 17:38:42 -04:00
Jeff Davis
92c58fd948 Rework HashAgg GUCs.
Eliminate enable_groupingsets_hash_disk, which was primarily useful
for testing grouping sets that use HashAgg and spill. Instead, hack
the table stats to convince the planner to choose hashed aggregation
for grouping sets that will spill to disk. Suggested by Melanie
Plageman.

Rename enable_hashagg_disk to hashagg_avoid_disk_plan, and invert the
meaning of on/off. The new name indicates more strongly that it only
affects the planner. Also, the word "avoid" is less definite, which
should avoid surprises when HashAgg still needs to use the
disk. Change suggested by Justin Pryzby, though I chose a different
GUC name.

Discussion: https://postgr.es/m/CAAKRu_aisiENMsPM2gC4oUY1hHG3yrCwY-fXUg22C6_MJUwQdA%40mail.gmail.com
Discussion: https://postgr.es/m/20200610021544.GA14879@telsasoft.com
Backpatch-through: 13
2020-06-11 12:57:43 -07:00
Peter Geoghegan
5940ffb221 Avoid update conflict out serialization anomalies.
SSI's HeapCheckForSerializableConflictOut() test failed to correctly
handle conditions involving a concurrently inserted tuple which is later
concurrently updated by a separate transaction .  A SELECT statement
that called HeapCheckForSerializableConflictOut() could end up using the
same XID (updater's XID) for both the original tuple, and the successor
tuple, missing the XID of the xact that created the original tuple
entirely.  This only happened when neither tuple from the chain was
visible to the transaction's MVCC snapshot.

The observable symptoms of this bug were subtle.  A pair of transactions
could commit, with the later transaction failing to observe the effects
of the earlier transaction (because of the confusion created by the
update to the non-visible row).  This bug dates all the way back to
commit dafaa3ef, which added SSI.

To fix, make sure that we check the xmin of concurrently inserted tuples
that happen to also have been updated concurrently.

Author: Peter Geoghegan
Reported-By: Kyle Kingsbury
Reviewed-By: Thomas Munro
Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443@jepsen.io
Backpatch: All supported versions
2020-06-11 10:09:47 -07:00
Peter Eisentraut
3fbd4bb6f4 Refactor DROP LANGUAGE grammar
Fold it into the generic DropStmt.

Discussion: https://www.postgresql.org/message-id/flat/163c00a5-f634-ca52-fc7c-0e53deda8735%402ndquadrant.com
2020-06-11 11:18:15 +02:00
Peter Eisentraut
5333e014ab Remove deprecated syntax from CREATE/DROP LANGUAGE
Remove the option to specify the language name as a single-quoted
string.  This has been obsolete since ee8ed85da3.  Removing it allows
better grammar refactoring.

The syntax of the CREATE FUNCTION LANGUAGE clause is not changed.

Discussion: https://www.postgresql.org/message-id/flat/163c00a5-f634-ca52-fc7c-0e53deda8735%402ndquadrant.com
2020-06-11 10:26:12 +02:00
Peter Eisentraut
c4325cefba Fold AlterForeignTableStmt into AlterTableStmt
All other relation types are handled by AlterTableStmt, so it's
unnecessary to make a different statement for foreign tables.

Discussion: https://www.postgresql.org/message-id/flat/163c00a5-f634-ca52-fc7c-0e53deda8735%402ndquadrant.com
2020-06-11 08:21:24 +02:00
Peter Eisentraut
c2bd1fec32 Remove redundant grammar symbols
access_method, database_name, and index_name are all just name, and
they are not used consistently for their alleged purpose, so remove
them.  They have been around since ancient times but have no current
reason for existing.  Removing them can simplify future grammar
refactoring.

Discussion: https://www.postgresql.org/message-id/flat/163c00a5-f634-ca52-fc7c-0e53deda8735%402ndquadrant.com
2020-06-10 22:58:46 +02:00
Peter Eisentraut
c7eab0e97e Change default of password_encryption to scram-sha-256
Also, the legacy values on/true/yes/1 for password_encryption that
mapped to md5 are removed.  The only valid values are now
scram-sha-256 and md5.

Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Discussion: https://www.postgresql.org/message-id/flat/d5b0ad33-7d94-bdd1-caac-43a1c782cab2%402ndquadrant.com
2020-06-10 16:42:55 +02:00
Peter Eisentraut
5a4ada71a8 Update description of parameter password_encryption
The previous description string still described the pre-PostgreSQL
10 (pre eb61136dc7) behavior of
selecting between encrypted and unencrypted, but it is now choosing
between encryption algorithms.
2020-06-10 11:57:41 +02:00
Amit Kapila
c5c000b103 Fix ReorderBuffer memory overflow check.
Commit cec2edfa78 introduced logical_decoding_work_mem to limit
ReorderBuffer memory usage. We spill the changes once the memory occupied
by changes exceeds logical_decoding_work_mem.  There was an assumption
in the code that by evicting the largest (sub)transaction we will come
under the memory limit as the selected transaction will be at least as
large as the most recent change (which caused us to go over the memory
limit).  However, that is not true because a user can reduce the
logical_decoding_work_mem to a smaller value before the most recent
change.

We fix it by allowing to evict the transactions until we reach under the
memory limit.

Reported-by: Fujii Masao
Author: Amit Kapila
Reviewed-by: Fujii Masao
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/2b7ba291-22e0-a187-d167-9e5309a3458d@oss.nttdata.com
2020-06-10 10:20:10 +05:30
Peter Eisentraut
350f47786c Spelling adjustments
similar to 0fd2a79a63
2020-06-09 10:41:41 +02:00
Peter Eisentraut
b1d32d3e32 Unify drop-by-OID functions
There are a number of Remove${Something}ById() functions that are
essentially identical in structure and only different in which catalog
they are working on.  Refactor this to be one generic function.  The
information about which oid column, index, etc. to use was already
available in ObjectProperty for most catalogs, in a few cases it was
easily added.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/331d9661-1743-857f-1cbb-d5728bcd62cb%402ndquadrant.com
2020-06-09 09:39:46 +02:00
David Rowley
b27c90bbe4 Fix invalid function references in a few comments
These appear to have been forgotten when the functions were renamed in
1fd687a03.

Backpatch-through: 13, where the functions were renamed
2020-06-09 18:43:15 +12:00
Jeff Davis
1b2c29469a Fix HashAgg regression from choosing too many initial buckets.
Diagnosis by Andres.

Reported-by: Pavel Stehule
Discussion: https://postgr.es/m/CAFj8pRDLVakD5Aagt3yZeEQeTeEWaS3YE5h8XC3Q3qJ6TYkc2Q%40mail.gmail.com
Backpatch-through: 13
2020-06-08 21:04:16 -07:00
Peter Eisentraut
cbcc8726bb Update snowball
Update to snowball tag v2.0.0.  Major changes are new stemmers for
Basque, Catalan, and Hindi.

Discussion: https://www.postgresql.org/message-id/flat/a8eeabd6-2be1-43fe-401e-a97594c38478%402ndquadrant.com
2020-06-08 08:07:15 +02:00
Thomas Munro
57cb806308 Fix locking bugs that could corrupt pg_control.
The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire
ControlFileLock before modifying ControlFile->checkPointCopy, or the
checkpointer could write out a control file with a bad checksum.

Likewise, XLogReportParameters() must acquire ControlFileLock before
modifying ControlFile and calling UpdateControlFile().

Back-patch to all supported releases.

Author: Nathan Bossart <bossartn@amazon.com>
Author: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
2020-06-08 13:57:24 +12:00
Michael Paquier
879ad9f90e Fix crash in WAL sender when starting physical replication
Since database connections can be used with WAL senders in 9.4, it is
possible to use physical replication.  This commit fixes a crash when
starting physical replication with a WAL sender using a database
connection, caused by the refactoring done in 850196b.

There have been discussions about forbidding the use of physical
replication in a database connection, but this is left for later,
taking care only of the crash new to 13.

While on it, add a test to check for a failure when attempting logical
replication if the WAL sender does not have a database connection.  This
part is extracted from a larger patch by Kyotaro Horiguchi.

Reported-by: Vladimir Sitnikov
Author: Michael Paquier, Kyotaro Horiguchi
Reviewed-by: Kyotaro Horiguchi, Álvaro Herrera
Discussion: https://postgr.es/m/CAB=Je-GOWMj1PTPkeUhjqQp-4W3=nW-pXe2Hjax6rJFffB5_Aw@mail.gmail.com
Backpatch-through: 13
2020-06-08 10:12:24 +09:00
Tom Lane
b5d69b7c22 pgindent run prior to branching v13.
pgperltidy and reformat-dat-files too, though those didn't
find anything to change.
2020-06-07 16:57:08 -04:00
Jeff Davis
1fbb6c93df Fix platform-specific performance regression in logtape.c.
Commit 24d85952 made a change that indirectly caused a performance
regression by triggering a change in the way GCC optimizes memcpy() on
some platforms.

The behavior seemed to contradict a GCC document, so I filed a report:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95556

This patch implements a narrow workaround which eliminates the
regression I observed. The workaround is benign enough that it seems
unlikely to cause a different regression on another platform.

Discussion: https://postgr.es/m/99b2eab335c1592c925d8143979c8e9e81e1575f.camel@j-davis.com
2020-06-07 09:25:55 -07:00
Peter Eisentraut
0fd2a79a63 Spelling adjustments 2020-06-07 15:06:51 +02:00
Peter Eisentraut
f4c88ce1a2 Formatting and punctuation improvements in postgresql.conf.sample 2020-06-07 14:35:12 +02:00
Tom Lane
0c882e52a8 Improve ineq_histogram_selectivity's behavior for non-default orderings.
ineq_histogram_selectivity() can be invoked in situations where the
ordering we care about is not that of the column's histogram.  We could
be considering some other collation, or even more drastically, the
query operator might not agree at all with what was used to construct
the histogram.  (We'll get here for anything using scalarineqsel-based
estimators, so that's quite likely to happen for extension operators.)

Up to now we just ignored this issue and assumed we were dealing with
an operator/collation whose sort order exactly matches the histogram,
possibly resulting in junk estimates if the binary search gets confused.
It's past time to improve that, since the use of nondefault collations
is increasing.  What we can do is verify that the given operator and
collation match what's recorded in pg_statistic, and use the existing
code only if so.  When they don't match, instead execute the operator
against each histogram entry, and take the fraction of successes as our
selectivity estimate.  This gives an estimate that is probably good to
about 1/histogram_size, with no assumptions about ordering.  (The quality
of the estimate is likely to degrade near the ends of the value range,
since the two orderings probably don't agree on what is an extremal value;
but this is surely going to be more reliable than what we did before.)

At some point we might further improve matters by storing more than one
histogram calculated according to different orderings.  But this code
would still be good fallback logic when no matches exist, so that is
not an argument for not doing this.

While here, also improve get_variable_range() to deal more honestly
with non-default collations.

This isn't back-patchable, because it requires adding another argument
to ineq_histogram_selectivity, and because it might have significant
impact on the estimation results for extension operators relying on
scalarineqsel --- mostly for the better, one hopes, but in any case
destabilizing plan choices in back branches is best avoided.

Per investigation of a report from James Lucas.

Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
2020-06-05 16:55:27 -04:00
Tom Lane
044c99bc56 Use query collation, not column's collation, while examining statistics.
Commit 5e0928005 changed the planner so that, instead of blindly using
DEFAULT_COLLATION_OID when invoking operators for selectivity estimation,
it would use the collation of the column whose statistics we're
considering.  This was recognized as still being not quite the right
thing, but it seemed like a good incremental improvement.  However,
shortly thereafter we introduced nondeterministic collations, and that
creates cases where operators can fail if they're passed the wrong
collation.  We don't want planning to fail in cases where the query itself
would work, so this means that we *must* use the query's collation when
invoking operators for estimation purposes.

The only real problem this creates is in ineq_histogram_selectivity, where
the binary search might produce a garbage answer if we perform comparisons
using a different collation than the column's histogram is ordered with.
However, when the query's collation is significantly different from the
column's default collation, the estimate we previously generated would be
pretty irrelevant anyway; so it's not clear that this will result in
noticeably worse estimates in practice.  (A follow-on patch will improve
this situation in HEAD, but it seems too invasive for back-patch.)

The patch requires changing the signatures of mcv_selectivity and allied
functions, which are exported and very possibly are used by extensions.
In HEAD, I just did that, but an API/ABI break of this sort isn't
acceptable in stable branches.  Therefore, in v12 the patch introduces
"mcv_selectivity_ext" and so on, with signatures matching HEAD, and makes
the old functions into wrappers that assume DEFAULT_COLLATION_OID should
be used.  That does not match the prior behavior, but it should avoid risk
of failure in most cases.  (In practice, I think most extension datatypes
aren't collation-aware, so the change probably doesn't matter to them.)

Per report from James Lucas.  Back-patch to v12 where the problem was
introduced.

Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
2020-06-05 16:18:50 -04:00
Michael Paquier
1127f0e392 Preserve pg_index.indisreplident across REINDEX CONCURRENTLY
If the flag value is lost, logical decoding would work the same way as
REPLICA IDENTITY NOTHING, meaning that no old tuple values would be
included in the changes anymore produced by logical decoding.

Author: Michael Paquier
Reviewed-by: Euler Taveira
Discussion: https://postgr.es/m/20200603065340.GK89559@paquier.xyz
Backpatch-through: 12
2020-06-05 10:26:02 +09:00
Tom Lane
a9632830bb Reject "23:59:60.nnn" in datetime input.
It's intentional that we don't allow values greater than 24 hours,
while we do allow "24:00:00" as well as "23:59:60" as inputs.
However, the range check was miscoded in such a way that it would
accept "23:59:60.nnn" with a nonzero fraction.  For time or timetz,
the stored result would then be greater than "24:00:00" which would
fail dump/reload, not to mention possibly confusing other operations.

Fix by explicitly calculating the result and making sure it does not
exceed 24 hours.  (This calculation is redundant with what will happen
later in tm2time or tm2timetz.  Maybe someday somebody will find that
annoying enough to justify refactoring to avoid the duplication; but
that seems too invasive for a back-patched bug fix, and the cost is
probably unmeasurable anyway.)

Note that this change also rejects such input as the time portion
of a timestamp(tz) value.

Back-patch to v10.  The bug is far older, but to change this pre-v10
we'd need to ensure that the logic behaves sanely with float timestamps,
which is possibly nontrivial due to roundoff considerations.
Doesn't really seem worth troubling with.

Per report from Christoph Berg.

Discussion: https://postgr.es/m/20200520125807.GB296739@msg.df7cb.de
2020-06-04 16:42:23 -04:00
Michael Paquier
3fa44a3004 Fix comment in be-secure-openssl.c
Since 573bd08, hardcoded DH parameters have been moved to a different
file, making the comment on top of load_dh_buffer() incorrect.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/D9492CCB-9A91-4181-A847-1779630BE2A7@yesql.se
2020-06-04 13:02:59 +09:00
Michael Paquier
c1669fd581 Fix instance of elog() called while holding a spinlock
This broke the project rule to not call any complex code while a
spinlock is held.  Issue introduced by b89e151.

Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com
Backpatch-through: 9.5
2020-06-04 10:17:49 +09:00
Tom Lane
f88bd3139f Don't call palloc() while holding a spinlock, either.
Fix some more violations of the "only straight-line code inside a
spinlock" rule.  These are hazardous not only because they risk
holding the lock for an excessively long time, but because it's
possible for palloc to throw elog(ERROR), leaving a stuck spinlock
behind.

copy_replication_slot() had two separate places that did pallocs
while holding a spinlock.  We can make the code simpler and safer
by copying the whole ReplicationSlot struct into a local variable
while holding the spinlock, and then referencing that copy.
(While that's arguably more cycles than we really need to spend
holding the lock, the struct isn't all that big, and this way seems
far more maintainable than copying fields piecemeal.  Anyway this
is surely much cheaper than a palloc.)  That bug goes back to v12.

InvalidateObsoleteReplicationSlots() not only did a palloc while
holding a spinlock, but for extra sloppiness then leaked the memory
--- probably for the lifetime of the checkpointer process, though
I didn't try to verify that.  Fortunately that silliness is new
in HEAD.

pg_get_replication_slots() had a cosmetic violation of the rule,
in that it only assumed it's safe to call namecpy() while holding
a spinlock.  Still, that's a hazard waiting to bite somebody, and
there were some other cosmetic coding-rule violations in the same
function, so clean it up.  I back-patched this as far as v10; the
code exists before that but it looks different, and this didn't
seem important enough to adapt the patch further back.

Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com
2020-06-03 12:36:23 -04:00
Fujii Masao
caa3c4242c Don't call elog() while holding spinlock.
Previously UpdateSpillStats() called elog(DEBUG2) while holding
the spinlock even though the local variables that the elog() accesses
don't need to be protected by the lock. Since spinlocks are intended
for very short-term locks, they should not be used when calling
elog(DEBUG2). So this commit moves that elog() out of spinlock period.

Author: Kyotaro Horiguchi
Reviewed-by: Amit Kapila and Fujii Masao
Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com
2020-06-02 19:21:04 +09:00
Peter Eisentraut
42181b1015 Use correct and consistent unit abbreviation 2020-06-01 21:18:36 +02:00
Michael Paquier
ce1c5b9ae8 Fix use-after-release mistake in currtid() and currtid2() for views
This issue has been present since the introduction of this code as of
a3519a2 from 2002, and has been found by buildfarm member prion that
uses RELCACHE_FORCE_RELEASE via the tests introduced recently in
e786be5.

Discussion: https://postgr.es/m/20200601022055.GB4121@paquier.xyz
Backpatch-through: 9.5
2020-06-01 14:41:18 +09:00
Michael Paquier
e786be5fcb Fix crashes with currtid() and currtid2()
A relation that has no storage initializes rd_tableam to NULL, which
caused those two functions to crash because of a pointer dereference.
Note that in 11 and older versions, this has always failed with a
confusing error "could not open file".

These two functions are used by the Postgres ODBC driver, which requires
them only when connecting to a backend strictly older than 8.1.  When
connected to 8.2 or a newer version, the driver uses a RETURNING clause
instead whose support has been added in 8.2, so it should be possible to
just remove both functions in the future.  This is left as an issue to
address later.

While on it, add more regression tests for those functions as we never
really had coverage for them, and for aggregates of TIDs.

Reported-by: Jaime Casanova, via sqlsmith
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/CAJGNTeO93u-5APMga6WH41eTZ3Uee9f3s8dCpA-GSSqNs1b=Ug@mail.gmail.com
Backpatch-through: 12
2020-06-01 10:32:06 +09:00
Tomas Vondra
4cad2534da Use CP_SMALL_TLIST for hash aggregate
Commit 1f39bce021 added disk-based hash aggregation, which may spill
incoming tuples to disk. It however did not request projection to make
the tuples as narrow as possible, which may mean having to spill much
more data than necessary (increasing I/O, pushing other stuff from page
cache, etc.).

This adds CP_SMALL_TLIST in places that may use hash aggregation - we do
that only for AGG_HASHED. It's unnecessary for AGG_SORTED, because that
either uses explicit Sort (which already does projection) or pre-sorted
input (which does not need spilling to disk).

Author: Tomas Vondra
Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/20200519151202.u2p2gpiawoaznsv2%40development
2020-05-31 14:43:13 +02:00
Andres Freund
6a4a335b84 llvmjit: Fix building against LLVM 11 by removing unnecessary include.
LLVM has removed this header, in the branch that will become llvm
11. But as it turns out we didn't actually need it, so just remove it.

Author: Jesse Zhang <sbjesse@gmail.com>
Discussion: https://postgr.es/m/CAGf+fX7bvtP0YXMu7pOsu_NwhxW6dArTkxb=jt7M2-UJkyJ_3g@mail.gmail.com
Backpatch: 11, where JIT support using llvm was introduced.
2020-05-28 15:24:28 -07:00
Joe Conway
887cdff4dc Add CHECK_FOR_INTERRUPTS() to the repeat() function
The repeat() function loops for potentially a long time without
ever checking for interrupts. This prevents, for example, a query
cancel from interrupting until the work is all done. Fix by
inserting a CHECK_FOR_INTERRUPTS() into the loop.

Backpatch to all supported versions.

Discussion: https://www.postgresql.org/message-id/flat/8692553c-7fe8-17d9-cbc1-7cddb758f4c6%40joeconway.com
2020-05-28 13:19:00 -04:00
Heikki Linnakangas
5b1c61e8b8 Add missing error code to "cannot attach index ..." error.
ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE was used in an ereport with the
same message but different errdetail a few lines earlier, so use that
here as well.

Backpatch-through: 11
2020-05-28 12:37:00 +03:00
Michael Paquier
55ca50deb8 Fix some mentions to memory units in postgresql.conf.sample
The default unit for max_slot_wal_keep_size is megabytes.  While on it,
also change temp_file_limit to use a more consistent wording.

Reported-by: Jeff Janes, Fujii Masao
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAMkU=1wWZhhjpwRFKJ9waQGxxROeC0P6UqPvb90fAaGz7dhoHA@mail.gmail.com
2020-05-28 15:39:05 +09:00
Jeff Davis
896ddf9b3c Avoid fragmentation of logical tapes when writing concurrently.
Disk-based HashAgg relies on writing to multiple tapes
concurrently. Avoid fragmentation of the tapes' blocks by
preallocating many blocks for a tape at once. No file operations are
performed during preallocation; only the block numbers are reserved.

Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20200519151202.u2p2gpiawoaznsv2%40development
2020-05-26 16:49:43 -07:00
Peter Eisentraut
add4211600 Add lcov exclusion markers to jsonpath scanner
This was done for all scanners in
4211673622 but not added to the new one.
2020-05-26 14:09:36 +02:00
Bruce Momjian
ac5852fb30 gss: add missing references to hostgssenc and hostnogssenc
These were missed when these were added to pg_hba.conf in PG 12;
updates docs and pg_hba.conf.sample.

Reported-by: Arthur Nascimento

Bug: 16380

Discussion: https://postgr.es/m/20200421182736.GG19613@momjian.us

Backpatch-through: 12
2020-05-25 20:19:28 -04:00
Noah Misch
587322de36 Reconcile nodes/*funcs.c.
The stmt_len changes do not affect behavior.  LimitPath has no other
support functions, so that part changes only debugging output.
2020-05-25 16:23:48 -07:00
Michael Paquier
a995b371ae Add missing invocations to object access hooks
The following commands have been missing calls to object access hooks
InvokeObjectPost{Create|Alter}Hook normally applied to all commands:
- ALTER RULE RENAME TO
- ALTER USER MAPPING
- CREATE ACCESS METHOD
- CREATE STATISTICS

Thanks also to Robert Haas for the discussion.

Author: Mark Dilger
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/435CD295-F409-44E0-91EC-DF32C7AFCD76@enterprisedb.com
2020-05-23 14:03:04 +09:00
Alvaro Herrera
c99cec96b8
Fix two typos in a comment
They were introduced in 898e5e3290a7; backpatch to 12.
2020-05-22 17:39:16 -04:00
Peter Eisentraut
574925bfd0 Remove unnecessary cast
Probably copied from nearby calls where it is necessary.  But this one
also casts away constness, so it was doubly annoying.
2020-05-22 10:36:49 +02:00
Etsuro Fujita
bb2ae6fa47 Adjust indentation in src/backend/optimizer/README.
The previous indentation of optimizer functions was unclear; adjust the
indentation dashes so that a deeper level of indentation indicates that
the outer optimizer function calls the inner one.

Author: Richard Guo, with additional change by me
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAMbWs4-U-ogzpchGsP2BBMufCss1hktm%2B%2BeTJK_dUC196pw0cQ%40mail.gmail.com
2020-05-22 15:45:00 +09:00
Noah Misch
3350fb5d1f Clear some style deviations. 2020-05-21 08:31:16 -07:00
Tom Lane
c7d65a252c part_strategy does not need its very own keyword classification.
This should be plain old ColId.  Making it so makes the grammar less
complicated, and makes the compiled tables a kilobyte or so smaller
(likely because they don't have to deal with a keyword classification
that's not used anyplace else).
2020-05-19 20:09:59 -04:00
Peter Geoghegan
67b0b2dbf9 Reconsider nbtree page deletion assertion.
Commit 624686abcf added an assertion that verified that _bt_search
successfully relocated the leaf page undergoing deletion.  Page deletion
cannot deal with the case where the descent stack is to the right of the
page, so this seemed critical (deletion can only handle the case where
the descent stack is to the left of the leaf/target page).  However, the
assertion went a bit too far.

Since only a buffer pin is held on the leaf page throughout the call to
_bt_search, nothing guarantees that it can't have split during this
small window.  And if does actually split, _bt_search may end up
"relocating" a page to the right of the original target leaf page.  This
scenario seems extremely unlikely, but it must still be considered.
Remove the assertion, and document how we cope in this scenario.
2020-05-19 15:04:34 -07:00
Alvaro Herrera
c301c2e739
WITH TIES: number of rows is optional and defaults to one
FETCH FIRST .. ONLY implements this correctly, but we missed to include
it for FETCH FIRST .. WITH TIES in commit 357889eb17.

Author: Vik Fearing
Discussion: https://postgr.es/m/6aa690ef-551d-e24f-2690-c38c2442947c@postgresfriends.org
2020-05-18 19:28:46 -04:00
Peter Eisentraut
ac449d8801 Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 031ca65d7825c3e539a3e62ea9d6630af12e6b6b
2020-05-18 12:49:30 +02:00
Magnus Hagander
a01debe3db Fix typos in README
Author: Daniel Gustafsson
2020-05-18 11:55:35 +02:00
Amit Kapila
7e041b0c1d Fix comment in slot.c.
Reported-by: Sawada Masahiko
Author: Sawada Masahiko
Reviewed-by: Amit Kapila
Backpatch-through: 9.5
Discussion: https://postgr.es/m/CA+fd4k4Ws7M7YQ8PqSym5WB1y75dZeBTd1sZJUQdfe0KJQ-iSA@mail.gmail.com
2020-05-18 07:53:26 +05:30
Tom Lane
3048898e73 Mop-up for wait event naming issues.
Synchronize the event names for parallel hash join waits with other
event names, by getting rid of the slashes and dropping "-ing"
suffixes.  Rename ClogGroupUpdate to XactGroupUpdate, to match the
new SLRU name.  Move the ProcSignalBarrier event to the IPC category;
it doesn't belong under IO.

Also a bit more wordsmithing in the wait event documentation tables.

Discussion: https://postgr.es/m/4505.1589640417@sss.pgh.pa.us
2020-05-16 21:00:11 -04:00
Michael Paquier
2c8dd05d6c Make pg_stat_wal_receiver consistent with the WAL receiver's shmem info
d140f2f3 has renamed receivedUpto to flushedUpto, and has added
writtenUpto to the WAL receiver's shared memory information, but
pg_stat_wal_receiver was not consistent with that.  This commit renames
received_lsn to flushed_lsn, and adds a new column called written_lsn.

Bump catalog version.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20200515090817.GA212736@paquier.xyz
2020-05-17 09:22:07 +09:00
Tom Lane
fa27dd40d5 Run pgindent with new pg_bsd_indent version 2.1.1.
Thomas Munro fixed a longstanding annoyance in pg_bsd_indent, that
it would misformat lines containing IsA() macros on the assumption
that the IsA() call should be treated like a cast.  This improves
some other cases involving field/variable names that match typedefs,
too.  The only places that get worse are a couple of uses of the
OpenSSL macro STACK_OF(); we'll gladly take that trade-off.

Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
2020-05-16 11:54:51 -04:00
Tom Lane
e02ad575d8 Final pgindent run with pg_bsd_indent version 2.1.
This is just to provide a clean basis for comparison of the results
of the new version.  I did fix a typo that crept into 242dfcbaf.

Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
2020-05-16 11:49:14 -04:00
Michael Paquier
7ccb2f54d9 Fix assertion with relation using REPLICA IDENTITY FULL in subscriber
In a logical replication subscriber, a table using REPLICA IDENTITY FULL
which has a primary key would try to use the primary key's index
available to scan for a tuple, but an assertion only assumed as correct
the case of an index associated to REPLICA IDENTITY USING INDEX.  This
commit corrects the assertion so as the use of a primary key index is a
valid case.

Reported-by: Dilip Kumar
Analyzed-by: Dilip Kumar
Author: Euler Taveira
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/CAFiTN-u64S5bUiPL1q5kwpHNd0hRnf1OE-bzxNiOs5zo84i51w@mail.gmail.com
Backpatch-through: 10
2020-05-16 18:15:18 +09:00
Tom Lane
474e7da648 Change locktype "speculative token" to "spectoken".
It's just weird that this name wasn't chosen to look like an
identifier.  The suspicion that it wasn't thought about too
hard is reinforced by the fact that it wasn't documented in
the pg_locks view (until I did so, a day or two back).

Update, and add a comment reminding future adjusters of this
array to fix the docs too.

Do some desultory wordsmithing on various entries in the wait
events tables.

Discussion: https://postgr.es/m/24595.1589326879@sss.pgh.pa.us
2020-05-15 21:47:34 -04:00
Alvaro Herrera
1d3743023e
Fix walsender error cleanup code
In commit 850196b610 I (Álvaro) failed to handle the case of walsender
shutting down on an error before setting up its 'xlogreader' pointer;
the error handling code dereferences the pointer, causing a crash.
Fix by testing the pointer before trying to dereference it.

Kyotaro authored the code fix; I adopted Nathan's test case to be used
by the TAP tests and added the necessary PostgresNode change.

Reported-by: Nathan Bossart <bossartn@amazon.com>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/C04FC24E-903D-4423-B312-6910E4D846E5@amazon.com
2020-05-15 20:00:52 -04:00
Tom Lane
14a9101091 Drop the redundant "Lock" suffix from LWLock wait event names.
This was mostly confusing, especially since some wait events in
this class had the suffix and some did not.

While at it, stop exposing MainLWLockNames[] as a globally visible
name; any code using that directly is almost certainly wrong, as
its name has been misleading for some time.
(GetLWLockIdentifier() is what to use instead.)

Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 19:55:56 -04:00
Tom Lane
8048404939 Fix bogus initialization of replication origin shared memory state.
The previous coding zeroed out offsetof(ReplicationStateCtl, states)
more bytes than it was entitled to, as a consequence of starting the
zeroing from the wrong pointer (or, if you prefer, using the wrong
calculation of how much to zero).

It's unsurprising that this has not caused any reported problems,
since it can be expected that the newly-allocated block is at the end
of what we've used in shared memory, and we always make the shmem
block substantially bigger than minimally necessary.  Nonetheless,
this is wrong and it could bite us someday; plus it's a dangerous
model for somebody to copy.

This dates back to the introduction of this code (commit 5aa235042),
so back-patch to all supported branches.
2020-05-15 19:05:39 -04:00
Tom Lane
36ac359d36 Rename assorted LWLock tranches.
Choose names that fit into the conventions for wait event names
(particularly, that multi-word names are in the style MultiWordName)
and hopefully convey more information to non-hacker users than the
previous names did.

Also rename SerializablePredicateLockListLock to
SerializablePredicateListLock; the old name was long enough to cause
table formatting problems, plus the double occurrence of "Lock" seems
confusing/error-prone.

Also change a couple of particularly opaque LWLock field names.

Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 18:11:07 -04:00
Alvaro Herrera
a0ab4f4909
Add comments linking pg_strftime to timestamptz_to_str 2020-05-15 18:05:34 -04:00
Alvaro Herrera
242dfcbafa
Avoid killing btree items that are already dead
_bt_killitems marks btree items dead when a scan leaves the page where
they live, but it does so with only share lock (to improve concurrency).
This was historicall okay, since killing a dead item has no
consequences.  However, with the advent of data checksums and
wal_log_hints, this action incurs a WAL full-page-image record of the
page.  Multiple concurrent processes would write the same page several
times, leading to WAL bloat.  The probability of this happening can be
reduced by only killing items if they're not already dead, so change the
code to do that.

The problem could eliminated completely by having _bt_killitems upgrade
to exclusive lock upon seeing a killable item, but that would reduce
concurrency so it's considered a cure worse than the disease.

Backpatch all the way back to 9.5, since wal_log_hints was introduced in
9.4.

Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: https://postgr.es/m/CA+fd4k6PeRj2CkzapWNrERkja5G0-6D-YQiKfbukJV+qZGFZ_Q@mail.gmail.com
2020-05-15 16:50:34 -04:00
Tom Lane
5da14938f7 Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care.  Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing).  Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.

For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects.  This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.

(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)

Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.

Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 14:28:25 -04:00
Amit Kapila
a9cf48a4cf Make COPY TO keep locks until the transaction end.
COPY TO released the ACCESS SHARE lock immediately when it was done rather
than holding on to it until the end of the transaction.

This breaks the case where a REPEATABLE READ transaction could see an
empty table if it repeats a COPY statement and somebody truncated the
table in the meantime.

Before 4dded12faa the lock was also released after COPY FROM, but the
commit failed to notice the irregularity in COPY TO.

This is old behavior but doesn't seem important enough to backpatch.

Author: Laurenz Albe, based on suggestion by Robert Haas and Tom Lane
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/7bcfc39d4176faf85ab317d0c26786953646a411.camel@cybertec.at
2020-05-15 08:10:00 +05:30
Michael Paquier
ff87fabef2 Remove duplicated comment block in event_trigger.c
The reasons why event triggers are disabled in standalone mode are
documented in the code path of ddl_command_start, and other places
checking if standalone mode is enabled or not mention to refer to the
comment for ddl_command_start, except for table_rewrite that duplicated
the same explanation.

Reported-by: David G. Johnston
Discussion: https://postgr.es/m/CAKFQuwYqHtXpvr2mBJRwH9f+Y5y1GXw3rhbaAu0Dk2MoNevsmA@mail.gmail.com
2020-05-15 08:19:30 +09:00
Tom Lane
5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Tom Lane
29c3e2dd5a Collect built-in LWLock tranche names statically, not dynamically.
There is little point in using the LWLockRegisterTranche mechanism for
built-in tranche names.  It wastes cycles, it creates opportunities for
bugs (since failing to register a tranche name is a very hard-to-detect
problem), and the lack of any centralized list of names encourages
sloppy nonconformity in name choices.  Moreover, since we have a
centralized list of the tranches anyway in enum BuiltinTrancheIds, we're
certainly not buying any flexibility in return for these disadvantages.

Hence, nuke all the backend-internal LWLockRegisterTranche calls,
and instead provide a const array of the builtin tranche names.

(I have in mind to change a bunch of these names shortly, but this
patch is just about getting them into one place.)

Discussion: https://postgr.es/m/9056.1589419765@sss.pgh.pa.us
2020-05-14 11:10:31 -04:00
Heikki Linnakangas
e8abf585ab Move check for fsync=off so that pendingOps still gets cleared.
Commit 3eb77eba5a moved the loop and refactored it, and inadvertently
changed the effect of fsync=off so that it also skipped removing entries
from the pendingOps table. That was not intentional, and leads to an
assertion failure if you turn fsync on while the server is running and
reload the config.

Backpatch-through: 12-
Reviewed-By: Thomas Munro
Discussion: https://www.postgresql.org/message-id/3cbc7f4b-a5fa-56e9-9591-c886deb07513%40iki.fi
2020-05-14 08:39:26 +03:00
Amit Kapila
a169155453 Fix the MSVC build for versions 2015 and later.
Visual Studio 2015 and later versions should still be able to do the same
as Visual Studio 2012, but the declaration of locale_name is missing in
_locale_t, causing the code compilation to fail, hence this falls back
instead on to enumerating all system locales by using EnumSystemLocalesEx
to find the required locale name.  If the input argument is in Unix-style
then we can get ISO Locale name directly by using GetLocaleInfoEx() with
LCType as LOCALE_SNAME.

In passing, change the documentation references of the now obsolete links.

Note that this problem occurs only with NLS enabled builds.

Author: Juan José Santamaría Flecha, Davinder Singh and Amit Kapila
Reviewed-by: Ranier Vilela and Amit Kapila
Backpatch-through: 9.5
Discussion: https://postgr.es/m/CAHzhFSFoJEWezR96um4-rg5W6m2Rj9Ud2CNZvV4NWc9tXV7aXQ@mail.gmail.com
2020-05-14 09:24:33 +05:30
Tom Lane
7fd89f4d7a Fix async.c to not register any SLRU stats counts in the postmaster.
Previously, AsyncShmemInit forcibly initialized the first page of the
async SLRU, to save dealing with that case in asyncQueueAddEntries.
But this is a poor tradeoff, since many installations do not ever use
NOTIFY; for them, expending those cycles in AsyncShmemInit is a
complete waste.  Besides, this only saves a couple of instructions
in asyncQueueAddEntries, which hardly seems likely to be measurable.

The real reason to change this now, though, is that now that we track
SLRU access stats, the existing code is causing the postmaster to
accumulate some access counts, which then get inherited into child
processes by fork(), messing up the statistics.  Delaying the
initialization into the first child that does a NOTIFY fixes that.

Hence, we can revert f3d23d83e, which was an incorrect attempt at
fixing that issue.  Also, add an Assert to pgstat.c that should
catch any future errors of the same sort.

Discussion: https://postgr.es/m/8367.1589391884@sss.pgh.pa.us
2020-05-13 22:48:26 -04:00
Alvaro Herrera
17cc133f01
Dial back -Wimplicit-fallthrough to level 3
The additional pain from level 4 is excessive for the gain.

Also revert all the source annotation changes to their original
wordings, to avoid back-patching pain.

Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
2020-05-13 15:31:14 -04:00
Tom Lane
81ca868630 Improve management of SLRU statistics collection.
Instead of re-identifying which statistics bucket to use for a given
SLRU on every counter increment, do it once during shmem initialization.
This saves a fair number of cycles, and there's no real cost because
we could not have a bucket assignment that varies over time or across
backends anyway.

Also, get rid of the ill-considered decision to let pgstat.c pry
directly into SLRU's shared state; it's cleaner just to have slru.c
pass the stats bucket number.

In consequence of these changes, there's no longer any need to store
an SLRU's LWLock tranche info in shared memory, so get rid of that,
making this a net reduction in shmem consumption.  (That partly
reverts fe702a7b3.)

This is basically code review for 28cac71bd, so I also cleaned up
some comments, removed a dangling extern declaration, fixed some
things that should be static and/or const, etc.

Discussion: https://postgr.es/m/3618.1589313035@sss.pgh.pa.us
2020-05-13 13:08:23 -04:00
Alvaro Herrera
850196b610
Adjust walsender usage of xlogreader, simplify APIs
* Have both physical and logical walsender share a 'xlogreader' state
  struct for tracking state.  This replaces the existing globals sendSeg
  and sendCxt.

* Change WALRead not to receive XLogReaderState->seg and ->segcxt as
  separate arguments anymore; just use the ones from 'state'.  This is
  made possible by the above change.

* have the XLogReader segment_open contract require the callbacks to
  install the file descriptor in the state struct themselves instead of
  returning it.  xlogreader was already ignoring any possible failed
  return from the callbacks, relying solely on them never returning.

  (This point is not altogether excellent, as it means the callbacks
  have to know more of XLogReaderState; but to really improve on that
  we would have to pass back error info from the callbacks to
  xlogreader.  And the complexity would not be saved but instead just
  transferred to the callers of WALRead, which would have to learn how
  to throw errors from the open_segment callback in addition of, as
  currently, from pg_pread.)

* segment_open no longer receives the 'segcxt' as a separate argument,
  since it's part of the XLogReaderState argument.

Per comments from Kyotaro Horiguchi.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200511203336.GA9913@alvherre.pgsql
2020-05-13 12:17:08 -04:00
Fujii Masao
043e3e0401 Use proper GetDatum function in pg_stat_get_slru().
This commit changes pg_stat_get_slru() so that it uses
TimestampTzGetDatum() for stats_reset field because that field
stores the timestamp with time zone value. Previously
Int64GetDatum() was used.

Author: Fujii Masao
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/b8784fe6-1401-ab35-aa14-d57b5bb8e312@oss.nttdata.com
2020-05-13 22:20:37 +09:00
Fujii Masao
f3d23d83ef Initialize SLRU stats entries to zero.
Previously since SLRUStats was not initialized, SLRU stats counters
could begin with non-zero value. Which could lead to incorrect results
in pg_stat_slru view.

Author: Fujii Masao
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/976bbb73-a112-de3c-c488-b34b64609793@oss.nttdata.com
2020-05-13 22:19:25 +09:00
Alvaro Herrera
3e9744465d
Add -Wimplicit-fallthrough to CFLAGS and CXXFLAGS
Use it at level 4, a bit more restrictive than the default level, and
tweak our commanding comments to FALLTHROUGH.

(However, leave zic.c alone, since it's external code; to avoid the
warnings that would appear there, change CFLAGS for that file in the
Makefile.)

Author: Julien Rouhaud <rjuju123@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol
Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org
2020-05-12 16:07:30 -04:00
Tomas Vondra
6a918c3ac8 Rework EXPLAIN format for incremental sort
The explain format used by incremental sort was somewhat inconsistent
with other nodes, making it harder to parse and understand. This commit
addresses that by

 - adding an extra space to better separate groups of values

 - using colons instead of equal signs to separate key/value

 - properly capitalizing first letter of a key

 - using separate lines for full and pre-sorted groups

These changes were proposed by Justin Pryzby and mostly copy the final
explain format used to report WAL usage.

Author: Justin Pryzby
Reviewed-by: James Coleman
Discussion: https://postgr.es/m/20200419023625.GP26953@telsasoft.com
2020-05-12 20:04:39 +02:00
Tomas Vondra
1a40d37a9f Fix typos and improve incremental sort comments
Author: Justin Pryzby, James Coleman
Discussion: https://postgr.es/m/20200419023625.GP26953@telsasoft.com
2020-05-12 19:37:13 +02:00
Etsuro Fujita
2793bbe75e Remove unnecessary #include.
My oversight in commit c8434d64c.
2020-05-12 19:55:55 +09:00
Michael Paquier
078c9cd258 Fix comment in xlogutils.c
The existing callers of XLogReadDetermineTimeline() performing recovery
need to check a replay LSN position when determining on which timeline
to read a WAL page.  A portion of the comment describing this function
said exactly that, while referring to a routine for fetching a write
LSN, something not available in recovery.

Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20200511.101619.2043820539323292957.horikyota.ntt@gmail.com
2020-05-12 14:43:57 +09:00
Peter Geoghegan
624686abcf Adjust "root of to-be-deleted subtree" function.
Restructure the function that locates the root of the to-be-deleted
subtree during nbtree page deletion.  Handle the conditions that make
page deletion unsafe in a slightly more uniform way, and acknowledge the
fact that the behavior with incomplete splits on internal pages is
different (as pointed out in the nbtree README as of commit 35bc0ec7).
Also invent new terminology that avoids ambiguity around which pages are
about to be deleted.  Consistently use the term "to-be-deleted subtree",
not the ambiguous term "branch".

We were calling the subtree parent page the "top parent page", but that
was quite misleading.  The top parent page usually refers to a page
unlinked from its siblings and marked deleted (during the second stage
of page deletion).  There was one kind of top parent page that we merely
removed a downlink from, and another kind of top parent page that we
actually marked deleted.  Eliminate the ambiguity by inventing a new
term ("subtree parent page") that refers to the former kind of page
only.
2020-05-11 11:01:07 -07:00
Alvaro Herrera
a8be5364ac
Fix obsolete references to "XLogRead"
The one in xlogreader.h was pointed out by Antonin Houska; I (Álvaro) noticed the
others by grepping.

Author: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/28250.1589186654@antos
2020-05-11 12:46:41 -04:00
Peter Eisentraut
7a9c9ce641 Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 80d8f54b3c5533ec036404bd3c3b24ff4825d037
2020-05-11 13:14:32 +02:00
Michael Paquier
e111c9f90a Remove smgrdounlink() in smgr.c from the code tree
The last caller of this routine was removed in b416691, and as a wise
man said one day, dead code tends to silently break.

Per discussion between Fujii Masao, Peter Geoghegan, Vignesh C and me.

Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wz=sg5H8-vG4d5UmAofdcRMpeTDt2K-NUWp4GSfhenRGAQ@mail.gmail.com
2020-05-10 10:58:54 +09:00
Tomas Vondra
ebeb3dea77 Simplify show_incremental_sort_info a bit
Incremental sort always processes at least one full group group before
switching to prefix groups, so it's enough to check just the number of
full groups. There was no risk of division by zero due to the extra
condition, but it made the code harder to understand.

Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAp+7qoS92-4V1vLChpdY3vEkLCbf+gye6P-4cirE-0z0A@mail.gmail.com
2020-05-09 19:41:42 +02:00
Tomas Vondra
9155b4be9a Do no reset bounded before incremental sort rescan
ExecReScanIncrementalSort was resetting bounded=false, which means the
optimization would be disabled on all rescans. This happens because
ExecSetTupleBound is called before the rescan, not after it.

Author: James Coleman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20200414065336.GI1492@paquier.xyz
2020-05-09 19:41:36 +02:00
Tomas Vondra
c442722648 Fix handling of REWIND/MARK/BACKWARD in incremental sort
The executor flags were not handled entirely correctly, although the
bugs were mostly harmless and it was mostly comment inaccuracy. We don't
need to strip any of the flags for child nodes.

Incremental sort does not support backward scans of mark/restore, so
MARK/BACKWARDS flags should not be possible. So we simply ensure this
using an assert, and we don't bother removing them when initializing
the child node.

With REWIND it's a bit less clear - incremental sort does not support
REWIND, but there is no way to signal this - it's legal to just ignore
the flag. We however continue passing the flag to child nodes, because
they might be useful to leverage that.

Reported-by: Michael Paquier
Author: James Coleman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20200414065336.GI1492@paquier.xyz
2020-05-09 19:41:18 +02:00
Alvaro Herrera
b060dbe000
Rework XLogReader callback system
Code review for 0dc8ead463, prompted by a bug closed by 91c40548d5.

XLogReader's system for opening and closing segments had gotten too
complicated, with callbacks being passed at both the XLogReaderAllocate
level (read_page) as well as at the WALRead level (segment_open).  This
was confusing and hard to follow, so restructure things so that these
callbacks are passed together at XLogReaderAllocate time, and add
another callback to the set (segment_close) to make it a coherent whole.
Also, ensure XLogReaderState is an argument to all the callbacks, so
that they can grab at the ->private data if necessary.

Document the whole arrangement more clearly.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20200422175754.GA19858@alvherre.pgsql
2020-05-08 15:40:11 -04:00
Peter Eisentraut
086ffddf36 Fix several DDL issues of generated columns versus inheritance
Several combinations of generated columns and inheritance in CREATE
TABLE were not handled correctly.  Specifically:

- Disallow a child column specifying a generation expression if the
  parent column is a generated column.  The child column definition
  must be unadorned and the parent column's generation expression will
  be copied.

- Prohibit a child column of a generated parent column specifying
  default values or identity.

- Allow a child column of a not-generated parent column specifying
  itself as a generated column.  This previously did not work, but it
  was possible to arrive at the state via other means (involving ALTER
  TABLE), so it seems sensible to support it.

Add tests for each case.  Also add documentation about the rules
involving generated columns and inheritance.

Discussion:
    https://www.postgresql.org/message-id/flat/15830.1575468847%40sss.pgh.pa.us
    https://www.postgresql.org/message-id/flat/2678bad1-048f-519a-ef24-b12962f41807%40enterprisedb.com
    https://www.postgresql.org/message-id/flat/CAJvUf_u4h0DxkCMCeEKAWCuzGUTnDP-G5iVmSwxLQSXn0_FWNQ%40mail.gmail.com
2020-05-08 11:31:57 +02:00
Peter Eisentraut
501e41dd3c Propagate ALTER TABLE ... SET STORAGE to indexes
When creating a new index, the attstorage setting of the table column
is copied to regular (non-expression) index columns.  But a later
ALTER TABLE ... SET STORAGE is not propagated to indexes, thus
creating an inconsistent and undumpable state.

Discussion: https://www.postgresql.org/message-id/flat/9765d72b-37c0-06f5-e349-2a580aafd989%402ndquadrant.com
2020-05-08 08:39:17 +02:00
Fujii Masao
f2ff203596 Report missing wait event for timeline history file.
TimelineHistoryRead and TimelineHistoryWrite wait events are reported
during waiting for a read and write of a timeline history file, respectively.
However, previously, TimelineHistoryRead wait event was not reported
while readTimeLineHistory() was reading a timeline history file. Also
TimelineHistoryWrite was not reported while writeTimeLineHistory() was
writing one line with the details of the timeline split, at the end.
This commit fixes these issues.

Back-patch to v10 where wait events for a timeline history file was added.

Author: Masahiro Ikeda
Reviewed-by: Michael Paquier, Fujii Masao
Discussion: https://postgr.es/m/d11b0c910b63684424e06772eb844ab5@oss.nttdata.com
2020-05-08 10:36:40 +09:00
Peter Geoghegan
cd8c73a38a Refactor nbtree deletion INCOMPLETE_SPLIT check.
Factor out code common to _bt_lock_branch_parent() and _bt_pagedel()
into a new utility function.  This new function is used to check that
the left sibling of a deletion target page does not have the
INCOMPLETE_SPLIT page flag set.  If it is set then deletion is unsafe;
there won't be a usable pivot tuple (with a downlink) in the parent page
that points to the deletion target page.  The page deletion algorithm is
not prepared to deal with that.  Also restructure an existing, related
utility function that checks if the right sibling of the target page has
the ISHALFDEAD page flag set.

This organization highlights the symmetry between the two cases.  The
goal is to make the design of page deletion clearer.  Both functions
involve a sibling page with a flag that indicates that there was an
interrupted operation (a page split or a page deletion) that resulted in
a page pointed to by sibling pages, but not pointed to in the parent.
And, both functions indicate if page deletion is unsafe due to the
absence of a particular downlink in the parent page.
2020-05-07 16:08:54 -07:00
Tom Lane
db89f0e3a4 Fix YA text phrase search bug.
checkcondition_str() failed to report multiple matches for a prefix
pattern correctly: it would dutifully merge the match positions, but
then after exiting that loop, if the last prefix-matching word had
had no suitable positions, it would report there were no matches.
The upshot would be failing to recognize a match that the query
should match.

It looks like you need all of these conditions to see the bug:
* a phrase search (else we don't ask for match position details)
* a prefix search item (else we don't get to this code)
* a weight restriction (else checkclass_str won't fail)

Noted while investigating a problem report from Pavel Borisov,
though this is distinct from the issue he was on about.

Back-patch to 9.6 where phrase search was added.
2020-05-07 15:59:51 -04:00
Alvaro Herrera
5be594caf8
Heed lock protocol in DROP OWNED BY
We were acquiring object locks then deleting objects one by one, instead
of acquiring all object locks first, ignoring those that did not exist,
and then deleting all objects together.   The latter is the correct
protocol to use, and what this commits changes to code to do.  Failing
to follow that leads to "cache lookup failed for relation XYZ" error
reports when DROP OWNED runs concurrently with other DDL -- for example,
a session termination that removes some temp tables.

Author: Álvaro Herrera
Reported-by: Mithun Chicklore Yogendra (Mithun CY)
Reviewed-by: Ahsan Hadi, Tom Lane
Discussion: https://postgr.es/m/CADq3xVZTbzK4ZLKq+dn_vB4QafXXbmMgDP3trY-GuLnib2Ai1w@mail.gmail.com
2020-05-06 12:29:41 -04:00
Tom Lane
46da7bf671 Fix severe memory leaks in GSSAPI encryption support.
Both the backend and libpq leaked buffers containing encrypted data
to be transmitted, so that the process size would grow roughly as
the total amount of data sent.

There were also far-less-critical leaks of the same sort in GSSAPI
session establishment.

Oversight in commit b0b39f72b, which I failed to notice while
reviewing the code in 2c0cdc818.

Per complaint from pmc@citylink.
Back-patch to v12 where this code was introduced.

Discussion: https://postgr.es/m/20200504115649.GA77072@gate.oper.dinoex.org
2020-05-05 13:10:17 -04:00
Amit Kapila
69bfaf2e1d Change the display of WAL usage statistics in Explain.
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain.  The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field.  Change the format to make WAL usage
statistics consistent with Buffer usage statistics.

This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.

Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-05-05 08:00:53 +05:30
Alexander Korotkov
9f87ae38ea Fix typo in comment
Reported-by: Oleg Bartunov
2020-05-03 12:19:31 +03:00
Peter Geoghegan
9dc7251417 Refactor btvacuumpage().
Remove one of the arguments to btvacuumpage(), and give up on the idea
that it's a recursive function.  We now use the term "backtracking" to
refer to the case where an earlier block must be visited to make sure no
tuples that need to be removed were missed.

Advertising btvacuumpage() as a recursive function was unhelpful.  In
reality the function always simulates recursion with a loop (it doesn't
actually call itself).  This wasn't just necessary as a precaution (per
the comments mentioning tail recursion), though.  There is no reliable
natural limit on the number of times we can backtrack.

There are important behavioral difference when "recursing"/backtracking,
mostly related to page deletion.  We don't perform page deletion when
backtracking due to the extra complexity.  And when we recurse, we're
not performing a physical order scan anymore, so we expect fairly
different conditions to hold for the page.  Structuring the code like
this makes it clearer how _bt_pagedel() cooperates with btvacuumpage()
and btvacuumscan() (as established in commit b0229f26 and commit
73a076b0).

Author: Peter Geoghegan
Reviewed-By: Masahiko Sawada
Discussion: https://postgr.es/m/CAH2-WzmRGMDWiLMcb+zagG9652PboNN4Gfcq1Gc_wJL6A716MA@mail.gmail.com
2020-05-02 14:04:33 -07:00
Stephen Frost
b68a560f8e Fix GSS client to non-GSS server connection
If the client is compiled with GSSAPI support and tries to start up GSS
with the server, but the server is not compiled with GSSAPI support, we
would mistakenly end up falling through to call ProcessStartupPacket
with secure_done = true, but the client might then try to perform SSL,
which the backend wouldn't understand and we'd end up failing the
connection with:

FATAL:  unsupported frontend protocol 1234.5679: server supports 2.0 to 3.0

Fix by arranging to track ssl_done independently from gss_done, instead
of trying to use the same boolean for both.

Author: Andrew Gierth
Discussion: https://postgr.es/m/87h82kzwqn.fsf@news-spur.riddles.org.uk
Backpatch: 12-, where GSSAPI encryption was added.
2020-05-02 11:39:26 -04:00
Tomas Vondra
d5d09692ea Remove superfluous memset from pgstat_recv_resetslrucounter
The extra memset meant pg_stat_reset_slru() always reset all the entries
even when reset of a single entry was requested, but the timestamp was
left uninitialized.

Reported-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/CACZ0uYFe16pjZxQYaTn53mspyM7dgMPYL3DJLjjPw69GMCC2Ow%40mail.gmail.com
2020-05-02 15:30:10 +02:00
Tomas Vondra
60fbb4d762 Simplify cost_incremental_sort a bit
Commit de0dc1a847 added code to cost_incremental_sort to handle varno 0.
Explicitly removing the RelabelType is not really necessary, because the
pull_varnos handles that just fine, which simplifies the code a bit.

Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_3_D2J5XxOuw68hvn0-gJsw9FXNSGcZka9aTymn9UJ8A%40mail.gmail.com
Discussion: https://postgr.es/m/20200411214639.GK2228%40telsasoft.com
2020-05-02 01:33:51 +02:00
Tomas Vondra
2e08d314ed Remove pg_xact entry from SLRU stats
The "pg_xact" entry was duplicate with "clog" and was added by mistake.

Reported-by: Fujii Masao
Discussion: https://postgr.es/m/20200119143707.gyinppnigokesjok@development
2020-05-02 00:36:25 +02:00
Tom Lane
0da06d9faf Get rid of trailing semicolons in C macro definitions.
Writing a trailing semicolon in a macro is almost never the right thing,
because you almost always want to write a semicolon after each macro
call instead.  (Even if there was some reason to prefer not to, pgindent
would probably make a hash of code formatted that way; so within PG the
rule should basically be "don't do it".)  Thus, if we have a semi inside
the macro, the compiler sees "something;;".  Much of the time the extra
empty statement is harmless, but it could lead to mysterious syntax
errors at call sites.  In perhaps an overabundance of neatnik-ism, let's
run around and get rid of the excess semicolons whereever possible.

The only thing worse than a mysterious syntax error is a mysterious
syntax error that only happens in the back branches; therefore,
backpatch these changes where relevant, which is most of them because
most of these mistakes are old.  (The lack of reported problems shows
that this is largely a hypothetical issue, but still, it could bite
us in some future patch.)

John Naylor and Tom Lane

Discussion: https://postgr.es/m/CACPNZCs0qWTqJ2QUSGJ07B7uvAvzMb-KbG2q+oo+J3tsWN5cqw@mail.gmail.com
2020-05-01 17:28:00 -04:00
Peter Geoghegan
69cf853fe7 Clear up issue with FSM and oldest bpto.xact.
On further reflection, code comments added by commit b0229f26 slightly
misrepresented how we determine the oldest bpto.xact for the index.
btvacuumpage() does not treat the bpto.xact of a page that it put in the
FSM as a candidate to be the oldest deleted page (the delete-marked page
that has the oldest bpto.xact XID among all pages encountered).

The definition of a deleted page for the purposes of the bpto.xact
calculation is different from the definition used by the bulk delete
statistics.  The bulk delete statistics don't distinguish between pages
that were deleted by the current VACUUM, pages deleted by a previous
VACUUM operation but not yet recyclable/reusable, and pages that are
reusable (though reusable pages are counted separately).

Backpatch: 11-, just like commit b0229f26.
2020-05-01 12:19:44 -07:00
Peter Geoghegan
4e21f8b633 Reorder function prototypes for consistency. 2020-05-01 10:03:38 -07:00
Peter Geoghegan
73a076b03f Fix undercounting in VACUUM VERBOSE output.
The logic for determining how many nbtree pages in an index are deleted
pages sometimes undercounted pages.  Pages that were deleted by the
current VACUUM operation (as opposed to some previous VACUUM operation
whose deleted pages have yet to be reused) were sometimes overlooked.
The final count is exposed to users through VACUUM VERBOSE's "%u index
pages have been deleted" output.

btvacuumpage() avoided double-counting when _bt_pagedel() deleted more
than one page by assuming that only one page was deleted, and that the
additional deleted pages would get picked up during a future call to
btvacuumpage() by the same VACUUM operation.  _bt_pagedel() can
legitimately delete pages that the btvacuumscan() scan will not visit
again, though, so that assumption was slightly faulty.

Fix the accounting by teaching _bt_pagedel() about its caller's
requirements.  It now only reports on pages that it knows btvacuumscan()
won't visit again (including the current btvacuumpage() page), so
everything works out in the end.

This bug has been around forever.  Only backpatch to v11, though, to
keep _bt_pagedel() is sync on the branches that have today's bugfix
commit b0229f26da.  Note that this commit changes the signature of
_bt_pagedel(), just like commit b0229f26da.

Author: Peter Geoghegan
Reviewed-By: Masahiko Sawada
Discussion: https://postgr.es/m/CAH2-WzkrXBcMQWAYUJMFTTvzx_r4q=pYSjDe07JnUXhe+OZnJA@mail.gmail.com
Backpatch: 11-
2020-05-01 09:51:09 -07:00
Peter Geoghegan
b0229f26da Fix bug in nbtree VACUUM "skip full scan" feature.
Commit 857f9c36cd (which taught nbtree VACUUM to skip a scan of the
index from btcleanup in situations where it doesn't seem worth it) made
VACUUM maintain the oldest btpo.xact among all deleted pages for the
index as a whole.  It failed to handle all the details surrounding pages
that are deleted by the current VACUUM operation correctly (though pages
deleted by some previous VACUUM operation were processed correctly).

The most immediate problem was that the special area of the page was
examined without a buffer pin at one point.  More fundamentally, the
handling failed to account for the full range of _bt_pagedel()
behaviors.  For example, _bt_pagedel() sometimes deletes internal pages
in passing, as part of deleting an entire subtree with btvacuumpage()
caller's page as the leaf level page.  The original leaf page passed to
_bt_pagedel() might not be the page that it deletes first in cases where
deletion can take place.

It's unclear how disruptive this bug may have been, or what symptoms
users might want to look out for.  The issue was spotted during
unrelated code review.

To fix, push down the logic for maintaining the oldest btpo.xact to
_bt_pagedel().  btvacuumpage() is now responsible for pages that were
fully deleted by a previous VACUUM operation, while _bt_pagedel() is now
responsible for pages that were deleted by the current VACUUM operation
(this includes half-dead pages from a previous interrupted VACUUM
operation that become fully deleted in _bt_pagedel()).  Note that
_bt_pagedel() should never encounter an existing deleted page.

This commit theoretically breaks the ABI of a stable release by changing
the signature of _bt_pagedel().  However, if any third party extension
is actually affected by this, then it must already be completely broken
(since there are numerous assumptions made in _bt_pagedel() that cannot
be met outside of VACUUM).  It seems highly unlikely that such an
extension actually exists, in any case.

Author: Peter Geoghegan
Reviewed-By: Masahiko Sawada
Discussion: https://postgr.es/m/CAH2-WzkrXBcMQWAYUJMFTTvzx_r4q=pYSjDe07JnUXhe+OZnJA@mail.gmail.com
Backpatch: 11-, where the "skip full scan" feature was introduced.
2020-05-01 08:39:52 -07:00
Peter Geoghegan
dd1f645cc8 Fix AddressSanitizer use-after-scope complaint.
XLogRegisterBufData() does not copy data pointed to by caller's pointer
argument.

Oversight in commit 0d861bbb70.

Author: Peter Eisentraut
Reported-By: Peter Eisentraut
Discussion: https://postgr.es/m/21800dbe-a13e-22f7-d423-b81db9d249f5@2ndquadrant.com
2020-04-30 12:31:56 -07:00
Peter Eisentraut
eb892102e0 Make SQL/JSON error code names match SQL standard
see also a00c53b0cb
2020-04-30 09:34:54 +02:00
Peter Geoghegan
ab2343d4cb Remove redundant _bt_killitems() buffer check.
_bt_getbuf() cannot return an invalid buffer.

Oversight in commit 2ed5b87f96.
2020-04-29 18:17:49 -07:00
Michael Paquier
e30b0b5cfa Fix check for conflicting SSL min/max protocol settings
Commit 79dfa8a has introduced a check to catch when the minimum protocol
version was set higher than the maximum version, however an error was
getting generated when both bounds are set even if they are able to
work, causing a backend to not use a new SSL context but keep the old
one.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/14BFD060-8C9D-43B4-897D-D5D9AA6FC92B@yesql.se
2020-04-30 08:14:02 +09:00
Alvaro Herrera
1816a1c6ff
Fix checkpoint signalling
Checkpointer uses its MyLatch to wake up when a checkpoint request is
received.  But before commit c655077639 the latch was not used for
anything else, so the code could just go to sleep after each loop
without rechecking the sleeping condition.  That commit added a separate
ResetLatch in its code path[1], which can cause a checkpoint to go
unnoticed for potentially a long time.

Fix by skipping sleep if any checkpoint flags are set.  Also add a test
to verify this; authored by Kyotaro Horiguchi.

[1] CreateCheckPoint -> InvalidateObsoleteReplicationSlots ->
ConditionVariableTimeSleep

Report and diagnosis by Kyotaro Horiguchi.
Co-authored-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200408.141956.891237856186513376.horikyota.ntt@gmail.com
2020-04-29 18:46:42 -04:00
Alvaro Herrera
d0abe78d84
Check slot->restart_lsn validity in a few more places
Lack of these checks could cause visible misbehavior, including
assertion failures.  This was missed in commit c655077639, whereby
restart_lsn becomes invalid when the size limit is exceeded.

Also reword some existing error messages, and add errdetail(), so that
the reported errors all match in spirit.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200408.093710.447591748588426656.horikyota.ntt@gmail.com
2020-04-28 20:39:04 -04:00
Peter Eisentraut
6baa17fbd1 Add missing gettext triggers
Some translatable strings have been moved to scanner_yyerror(), so we
need to add that, too.
2020-04-28 13:35:40 +02:00
Alexander Korotkov
ef11051bbe Fix definition of pg_statio_all_tables view
pg_statio_all_tables view appears to have a wrong grouping.  As the result
numbers of toast index blocks read and hit were multiplied to the number of
table indexes.  This commit fixes the view definition.

Backpatching this appears difficult.  We don't have a mechanism to patch a
system catalog of existing instances in minor upgrade.  We can write a
release notes instruction to do this manually.  But per discussion this is
probably not so critical bug for doing such an intrusive fix.

Reported-by: Andrei Zubkov
Discussion: https://postgr.es/m/CAPpHfdtMYkkNudLMG9G0dxX_B%3Dn5sfKzOyxxrvWYtSicaGW0Lw%40mail.gmail.com
2020-04-28 11:30:33 +03:00
Tom Lane
e81e5741a6 Fix full text search to handle NOT above a phrase search correctly.
Queries such as '!(foo<->bar)' failed to find matching rows when
implemented as a GiST or GIN index search.  That's because of
failing to handle phrase searches as tri-valued when considering
a query without any position information for the target tsvector.
We can only say that the phrase operator might match, not that it
does match; and therefore its NOT also might match.  The previous
coding incorrectly inverted the approximate phrase result to
decide that there was certainly no match.

To fix, we need to make TS_phrase_execute return a real ternary result,
and then bubble that up accurately in TS_execute.  As long as we have
to do that anyway, we can simplify the baroque things TS_phrase_execute
was doing internally to manage tri-valued searching with only a bool
as explicit result.

For now, I left the externally-visible result of TS_execute as a plain
bool.  There do not appear to be any outside callers that need to
distinguish a three-way result, given that they passed in a flag
saying what to do in the absence of position data.  This might need
to change someday, but we wouldn't want to back-patch such a change.

Although tsginidx.c has its own TS_execute_ternary implementation for
use at upper index levels, that sadly managed to get this case wrong
as well :-(.  Fixing it is a lot easier fortunately.

Per bug #16388 from Charles Offenbacher.  Back-patch to 9.6 where
phrase search was introduced.

Discussion: https://postgr.es/m/16388-98cffba38d0b7e6e@postgresql.org
2020-04-27 12:21:04 -04:00
Michael Paquier
641b76d9d1 Fix some typos
Author: Justin Pryzby
Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
2020-04-27 14:59:36 +09:00
Peter Eisentraut
f057980149 Fix typo
from 303640199d
2020-04-26 13:48:33 +02:00
Peter Geoghegan
7154aa16a6 Fix another minor page deletion buffer lock issue.
Avoid accessing the leaf page's top parent tuple without a buffer lock
held during the second phase of nbtree page deletion.  The old approach
was safe, though only because VACUUM never drops its buffer pin (and
because only VACUUM itself can modify a half-dead page).  Even still, it
seems like a good idea to be strict here.  Tighten things up by copying
the top parent page's block number to a local variable before releasing
the buffer lock on the leaf page -- not after.

This is a follow-up to commit fa7ff642, which fixed a similar issue in
the first phase of nbtree page deletion.

Update some related comments in passing.

Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
2020-04-25 16:45:20 -07:00
Peter Geoghegan
fa7ff642c2 Fix minor nbtree page deletion buffer lock issue.
Avoid accessing the deletion target page's special area during nbtree
page deletion at a point where there is no buffer lock held.  This issue
was detected by a patch that extends Valgrind's memcheck tool to mark
nbtree pages that are unsafe to access (due to not having a buffer lock
or buffer pin) as NOACCESS.

We do hold a buffer pin at this point, and only access the special area,
so the old approach was safe.  Even still, it seems like a good idea to
tighten up the rules in this area.  There is no reason to not simply
insist on always holding a buffer lock (not just a pin) when accessing
nbtree pages.

Update some related comments in passing.

Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
2020-04-25 14:17:02 -07:00
Noah Misch
f246ea3b2a In caught-up logical walsender, sleep only in WalSndWaitForWal().
Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write
< sentPtr.  When the latest physical LSN yields no logical replication
messages (a common case), that keepalive elicits a reply.  Processing
the reply updates pg_stat_replication.replay_lsn.  WalSndLoop() lacks
that; when WalSndLoop() slept, replay_lsn advancement could stall until
wal_receiver_status_interval elapsed.  This sometimes stalled
src/test/subscription/t/001_rep_changes.pl for up to 10s.

Reviewed by Fujii Masao and Michael Paquier.

Discussion: https://postgr.es/m/20200418070142.GA1075445@rfd.leadboat.com
2020-04-25 10:18:12 -07:00
Noah Misch
72a3dc321d Revert "When WalSndCaughtUp, sleep only in WalSndWaitForWal()."
This reverts commit 4216858122.  It caused
idle physical walsenders to busy-wait, as reported by Fujii Masao.

Discussion: https://postgr.es/m/20200417054146.GA1061007@rfd.leadboat.com
2020-04-25 10:17:26 -07:00
Andrew Gierth
d9a4cce29d Fix error case for CREATE ROLE ... IN ROLE.
CreateRole() was passing a Value node, not a RoleSpec node, for the
newly-created role name when adding the role as a member of existing
roles for the IN ROLE syntax.

This mistake went unnoticed because the node in question is used only
for error messages and is not accessed on non-error paths.

In older pg versions (such as 9.5 where this was found), this results
in an "unexpected node type" error in place of the real error. That
node type check was removed at some point, after which the code would
accidentally fail to fail on 64-bit platforms (on which accessing the
Value node as if it were a RoleSpec would be mostly harmless) or give
an "unexpected role type" error on 32-bit platforms.

Fix the code to pass the correct node type, and add an lfirst_node
assertion just in case.

Per report on irc from user m1chelangelo.

Backpatch all the way, because this error has been around for a long
time.
2020-04-25 05:09:30 +01:00
Tom Lane
baf17ad9df Repair performance regression in information_schema.triggers view.
Commit 32ff26911 introduced use of rank() into the triggers view to
calculate the spec-mandated action_order column.  As written, this
prevents query constraints on the table-name column from being pushed
below the window aggregate step.  That's bad for performance of this
typical usage pattern, since the view now has to be evaluated for all
tables not just the one(s) the user wants to see.  It's also the cause
of some recent buildfarm failures, in which trying to evaluate the view
rows for triggers in process of being dropped resulted in "cache lookup
failed for function NNN" errors.  Those rows aren't of interest to the
test script doing the query, but the filter that would eliminate them
is being applied too late.  None of this happened before the rank()
call was there, so it's a regression compared to v10 and before.

We can improve matters by changing the rank() call so that instead of
partitioning by OIDs, it partitions by nspname and relname, casting
those to sql_identifier so that they match the respective view output
columns exactly.  The planner has enough intelligence to know that
constraints on partitioning columns are safe to push down, so this
eliminates the performance problem and the regression test failure
risk.  We could make the other partitioning columns match view outputs
as well, but it'd be more complicated and the performance benefits
are questionable.

Side note: as this stands, the planner will push down constraints on
event_object_table and trigger_schema, but not on event_object_schema,
because it checks for ressortgroupref matches not expression
equivalence.  That might be worth improving someday, but it's not
necessary to fix the immediate concern.

Back-patch to v11 where the rank() call was added.  Ordinarily we'd not
change information_schema in released branches, but the test failure has
been seen in v12 and presumably could happen in v11 as well, so we need
to do this to keep the buildfarm happy.  The change is harmless so far
as users are concerned.  Some might wish to apply it to existing
installations if performance of this type of query is of concern,
but those who don't are no worse off.

I bumped catversion in HEAD as a pro forma matter (there's no
catalog incompatibility that would really require a re-initdb).
Obviously that can't be done in the back branches.

Discussion: https://postgr.es/m/5891.1587594470@sss.pgh.pa.us
2020-04-24 12:02:36 -04:00
Michael Paquier
4e87c4836a Fix handling of WAL segments ready to be archived during crash recovery
78ea8b5 has fixed an issue related to the recycling of WAL segments on
standbys depending on archive_mode.  However, it has introduced a
regression with the handling of WAL segments ready to be archived during
crash recovery, causing those files to be recycled without getting
archived.

This commit fixes the regression by tracking in shared memory if a live
cluster is either in crash recovery or archive recovery as the handling
of WAL segments ready to be archived is different in both cases (those
WAL segments should not be removed during crash recovery), and by using
this new shared memory state to decide if a segment can be recycled or
not.  Previously, it was not possible to know if a cluster was in crash
recovery or archive recovery as the shared state was able to track only
if recovery was happening or not, leading to the problem.

A set of TAP tests is added to close the gap here, making sure that WAL
segments ready to be archived are correctly handled when a cluster is in
archive or crash recovery with archive_mode set to "on" or "always", for
both standby and primary.

Reported-by: Benoît Lobréau
Author: Jehan-Guillaume de Rorthais
Reviewed-by: Kyotaro Horiguchi, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/20200331172229.40ee00dc@firost
Backpatch-through: 9.5
2020-04-24 08:48:28 +09:00
Tom Lane
3436c5e283 Remove ACLDEBUG #define and associated code.
In the footsteps of aaf069aa3, remove ACLDEBUG, which was the only
other remaining undocumented symbol in pg_config_manual.h.  The fact
that nobody had bothered to document it in seventeen years is a good
clue to its usefulness.  In practice, none of the tracing logic it
enabled would be of any value without additional effort.

Discussion: https://postgr.es/m/6631.1587565046@sss.pgh.pa.us
2020-04-23 15:38:04 -04:00
Tom Lane
ee88ef55db Remove useless (and broken) logging logic in memory context functions.
Nobody really uses this stuff, especially not since we created
valgrind-based infrastructure that does the same thing better.
It is thus unsurprising that the generation.c and slab.c versions
were actually broken.  Rather than fix 'em, let's just remove 'em.

Alexander Lakhin

Discussion: https://postgr.es/m/8936216c-3492-3f6e-634b-d638fddc5f91@gmail.com
2020-04-23 15:27:37 -04:00
Robert Haas
3989dbdf12 Rename exposed identifiers to say "backup manifest".
Function names declared "extern" now use BackupManifest in the name
rather than just Manifest, and data types use backup_manifest
rather than just manifest.

Per note from Michael Paquier.

Discussion: http://postgr.es/m/20200418125713.GG350229@paquier.xyz
2020-04-23 08:44:06 -04:00
Andres Freund
299298bc87 Fix transient memory leak for SRFs in FROM.
In a9c35cf85c I changed ExecMakeTableFunctionResult() to dynamically
allocate the FunctionCallInfo used to call the SRF. Unfortunately I
did not account for the fact that the surrounding memory context has
query lifetime, leading to a leak till the end of the query.

In most cases the leak is fairly inconsequential, but if the
FunctionScan is done many times in the query, the leak can add
up. This happens e.g. if the function scan is on the inner side of a
nested loop, due to a lateral join.
EXPLAIN SELECT sum(f) FROM generate_series(1, 100000000) g(i), generate_series(i, i+1) f;
quickly shows the leak.

Instead of explicitly freeing the FunctionCallInfo it seems better to
make sure all the per-set temporary state in
ExecMakeTableFunctionResult() is cleaned up wholesale. Currently
that's probably just the FunctionCallInfo allocation, but since
there's some initialization work, and since there's already an
appropriate context, this seems like a more robust approach.

Bug: #16112
Reported-By: Ben Cornett
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/16112-4448bbf55a404189%40postgresql.org
Backpatch: 12, a9c35cf85c
2020-04-22 19:53:06 -07:00
Tomas Vondra
de0dc1a847 Fix cost_incremental_sort for expressions with varno 0
When estimating the number of pre-sorted groups in cost_incremental_sort
we must not pass Vars with varno 0 to estimate_num_groups, which would
cause failues in find_base_rel. This may happen when sorting output of
set operations, thanks to generate_append_tlist.

Unlike recurse_set_operations we can't easily access the original target
list, so if we find any Vars with varno 0, we fall back to the default
estimate DEFAULT_NUM_DISTINCT.

Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20200411214639.GK2228%40telsasoft.com
2020-04-23 00:15:24 +02:00
David Rowley
9f2c4edec2 Remove bogus Assert in foreign key cloning code
This Assert was trying to ensure that the number of columns in the foreign
key being cloned was the same number of attributes in the parentRel.  Of
course, it's perfectly valid to have columns in the table which are not
part of the foreign key constraint. It appears that this Assert was
misunderstanding that.

Reported-by: Rajkumar Raghuwanshi
Reviewed-by: amul sul
Discussion: https://postgr.es/m/CAKcux6=z1dtiWw5BOpqDx-U6KTiq+zD0Y2m810zUtWL+giVXWA@mail.gmail.com
2020-04-22 22:12:19 +12:00
Peter Eisentraut
aaf069aa34 Remove HEAPDEBUGALL
This has been broken since PostgreSQL 12 and was probably never really
used.  PostgreSQL 12 added an analogous HEAPAMSLOTDEBUGALL, which
still works right now, but it's also not very useful, so remove that
as well.

Discussion: https://www.postgresql.org/message-id/flat/645c0646-4218-d4c3-409a-a7003a0c108d%402ndquadrant.com
2020-04-22 08:35:33 +02:00
Tom Lane
d12bdba77b Fix possible crash during FATAL exit from reindexing.
index.c supposed that it could just use a PG_TRY block to clean up the
state associated with an active REINDEX operation.  However, that code
doesn't run if we do a FATAL exit --- for example, due to a SIGTERM
shutdown signal --- while the REINDEX is happening.  And that state does
get consulted during catalog accesses, which makes it problematic if we
do any catalog accesses during shutdown --- for example, to clean up any
temp tables created in the session.

If this combination of circumstances occurred, we could find ourselves
trying to access already-freed memory.  In debug builds that'd fairly
reliably cause an assertion failure.  In production we might often
get away with it, but with some bad luck it could cause a core dump.

Another possible bad outcome is an erroneous conclusion that an
index-to-be-accessed is being reindexed; but it looks like that would
be unlikely to have any consequences worse than failing to drop temp
tables right away.  (They'd still get dropped by the next session that
uses that temp schema.)

To fix, get rid of the use of PG_TRY here, and instead hook into
the transaction abort mechanisms to clean up reindex state.

Per bug #16378 from Alexander Lakhin.  This has been wrong for a
very long time, so back-patch to all supported branches.

Discussion: https://postgr.es/m/16378-7a70ca41b3ec2009@postgresql.org
2020-04-21 15:58:42 -04:00
Tom Lane
5836d32655 Fix minor violations of FunctionCallInvoke usage protocol.
Working on commit 1c455078b led me to check through FunctionCallInvoke
call sites to see if every one was being honest about (a) making sure
that fcinfo.isnull is initially false, and (b) checking its state after
the call.  Sure enough, I found some violations.

The main one is that finalize_partialaggregate re-used serialfn_fcinfo
without resetting isnull, even though it clearly intends to cater for
serialfns that return NULL.  There would only be an issue with a
non-strict serialfn, since it's unlikely that a serialfn would return
NULL for non-null input.  We have no non-strict serialfns in core, and
there may be none in the wild either, which would account for the lack
of complaints.  Still, it's clearly wrong, so back-patch that fix to
9.6 where finalize_partialaggregate was introduced.

Also, arrayfuncs.c and rowtypes.c contained various callers that were
not bothering to check for result nulls.  While what's being called is
a comparison or hash function that probably *shouldn't* return null,
that's a lousy excuse for not having any check at all.  There are
existing places that just Assert(!fcinfo->isnull) in comparable
situations, so I added that to the places that were calling btree
comparison or hash support functions.  In the places calling
boolean-returning equality functions, it's quite cheap to have them
treat isnull as FALSE, so make those places do that.  Also remove some
"locfcinfo->isnull = false" assignments that are unnecessary given the
assumption that no previous call returned null.  These changes seem like
mostly neatnik-ism or debugging support, so I didn't back-patch.
2020-04-21 14:23:53 -04:00
Alvaro Herrera
afccd76f1c
Fix detaching partitions with cloned row triggers
When a partition is detached, any triggers that had been cloned from its
parent were not properly disentangled from its parent triggers.
This resulted in triggers that could not be dropped because they
depended on the trigger in the trigger in the no-longer-parent table:
  ALTER TABLE t DETACH PARTITION t1;
  DROP TRIGGER trig ON t1;
    ERROR:  cannot drop trigger trig on table t1 because trigger trig on table t requires it
    HINT:  You can drop trigger trig on table t instead.

Moreover the table can no longer be re-attached to its parent, because
the trigger name is already taken:
  ALTER TABLE t ATTACH PARTITION t1 FOR VALUES FROM (1)TO(2);
    ERROR:  trigger "trig" for relation "t1" already exists

The former is a bug introduced in commit 86f575948c.  (The latter is
not necessarily a bug, but it makes the bug more uncomfortable.)

To avoid the complexity that would be needed to tell whether the trigger
has a local definition that has to be merged with the one coming from
the parent table, establish the behavior that the trigger is removed
when the table is detached.

Backpatch to pg11.

Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/20200408152412.GZ2228@telsasoft.com
2020-04-21 13:57:00 -04:00
Peter Geoghegan
1542e16f2c Consider outliers in split interval calculation.
Commit 0d861bbb, which introduced deduplication to nbtree, added some
logic to take large posting list tuples into account when choosing a
split point.  We subtract firstright posting list overhead from the
projected new high key size when calculating leftfree/rightfree values
for an affected candidate split point.  Posting list tuples aren't
special to nbtsplitloc.c, but taking them into account like this makes a
huge difference in practice.  Posting list tuples are frequently tuple
size outliers.

However, commit 0d861bbb missed a closely related issue: split interval
itself is calculated based on the assumption that tuples on the page
being split are roughly equisized.  That assumption was acceptable back
when commit fab25024 taught the logic for choosing a split point about
suffix truncation, but it's pretty questionable now that very large
tuple sizes are common.  This oversight led to unbalanced page splits in
low cardinality multi-column indexes when deduplication was used: page
splits that don't give sufficient weight to how unbalanced the split is
when the interval happens to include some large posting list tuples (and
when most other tuples on the page are not so large).

Nail this down by calculating an initial split interval in a way that's
attuned to the actual cost that we want to keep under control (not a
fuzzy proxy for the cost): apply a leftfree + rightfree evenness test to
each candidate split point that actually gets included in the split
interval (for the default strategy).  This replaces logic that used a
percentage of all legal split points for the page as the basis of the
initial split interval.

Discussion: https://postgr.es/m/CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com
2020-04-21 09:59:24 -07:00
Tom Lane
1c455078b0 Allow matchingsel() to be used with operators that might return NULL.
Although selfuncs.c will never call a target operator with null inputs,
some functions might return null anyway.  The existing coding will fail
if that happens (since FunctionCall2Coll will punt), which seems
undesirable given that matchingsel() has such a broad range of potential
applicability --- in fact, we already have a problem because we apply it
to jsonb_path_exists_opr, which can return null.  Hence, rejigger the
underlying functions mcv_selectivity and histogram_selectivity to cope,
treating a null result as false.

While we are at it, we can move the InitFunctionCallInfoData overhead
out of the inner loops, which isn't a huge number of cycles but might
save something considering we are likely calling functions as cheap
as int4eq().  Plus, the number of loop cycles to be expected is much
more than it was when this code was written, since typical settings
of default_statistics_target are higher.

In view of that consideration, let's apply the same change to
var_eq_const, eqjoinsel_inner, and eqjoinsel_semi.  We do not expect
equality functions to ever return null for non-null inputs (and
certainly that code has been that way a long time without complaints),
but the cycle savings seem attractive, especially in the eqjoinsel loops
where there's potentially an O(N^2) savings.

Similar code exists in ineq_histogram_selectivity and
get_variable_range, but I forebore from changing those for now.
The performance argument for changing ineq_histogram_selectivity
is really weak anyway, since that will only iterate log2(N) times.

Nikita Glukhov and Tom Lane

Discussion: https://postgr.es/m/9d3b0959-95d6-c37e-2c0b-287bcfe5c705@postgrespro.ru
2020-04-21 12:56:55 -04:00
Tom Lane
9d25e1aa31 Clean up cpluspluscheck violation.
"operator" is a reserved word in C++, so per project conventions,
don't use it as an identifier in header files.

My oversight in commit a80818605.
2020-04-21 11:21:15 -04:00
Robert Haas
079ac29d4d Move the server's backup manifest code to a separate file.
basebackup.c is already a pretty big and complicated file, so it
makes more sense to keep the backup manifest support routines
in a separate file, for clarity and ease of maintenance.

Discussion: http://postgr.es/m/CA+TgmoavRak5OdP76P8eJExDYhPEKWjMb0sxW7dF01dWFgE=uA@mail.gmail.com
2020-04-20 14:38:15 -04:00
Alvaro Herrera
5fc703946b
Add ALTER .. NO DEPENDS ON
Commit f2fcad27d5 (9.6 era) added the ability to mark objects as
dependent an extension, but forgot to add a way for such dependencies to
be removed.  This commit fixes that oversight.

Strictly speaking this should be backpatched to 9.6, but due to lack of
demand we're not doing so at this time.

Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
Reviewed-by: ahsan hadi <ahsan.hadi@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
2020-04-20 13:42:12 -04:00
Magnus Hagander
7e4e574744 Allow pg_read_all_stats to access all stats views again
The views pg_stat_progress_* had not gotten the memo that
pg_read_all_stats is supposed to be able to read all statistics. Also
make a pass over all text-returning pg_stat_xyz functions that could
return "insufficient privilege" and make sure they also respect
pg_read_all_status.

Reported-by: Andrey M. Borodin
Reviewed-by: Andrey M. Borodin, Kyotaro Horiguchi
Discussion: https://postgr.es/m/13145F2F-8458-4977-9D2D-7B2E862E5722@yandex-team.ru
2020-04-20 12:53:40 +02:00
Jeff Davis
0cacb2b79d Fix missing pfree() in logtape.c, missed by 24d85952. 2020-04-19 10:33:06 -07:00
Tom Lane
f332241a60 Fix race conditions in synchronous standby management.
We have repeatedly seen the buildfarm reach the Assert(false) in
SyncRepGetSyncStandbysPriority.  This apparently is due to failing to
consider the possibility that the sync_standby_priority values in
shared memory might be inconsistent; but they will be whenever only
some of the walsenders have updated their values after a change in
the synchronous_standby_names setting.  That function is vastly too
complex for what it does, anyway, so rewriting it seems better than
trying to apply a band-aid fix.

Furthermore, the API of SyncRepGetSyncStandbys is broken by design:
it returns a list of WalSnd array indexes, but there is nothing
guaranteeing that the contents of the WalSnd array remain stable.
Thus, if some walsender exits and then a new walsender process
takes over that WalSnd array slot, a caller might make use of
WAL position data that it should not, potentially leading to
incorrect decisions about whether to release transactions that
are waiting for synchronous commit.

To fix, replace SyncRepGetSyncStandbys with a new function
SyncRepGetCandidateStandbys that copies all the required data
from shared memory while holding the relevant mutexes.  If the
associated walsender process then exits, this data is still safe to
make release decisions with, since we know that that much WAL *was*
sent to a valid standby server.  This incidentally means that we no
longer need to treat sync_standby_priority as protected by the
SyncRepLock rather than the per-walsender mutex.

SyncRepGetSyncStandbys is no longer used by the core code, so remove
it entirely in HEAD.  However, it seems possible that external code is
relying on that function, so do not remove it from the back branches.
Instead, just remove the known-incorrect Assert.  When the bug occurs,
the function will return a too-short list, which callers should treat
as meaning there are not enough sync standbys, which seems like a
reasonably safe fallback until the inconsistent state is resolved.
Moreover it's bug-compatible with what has been happening in non-assert
builds.  We cannot do anything about the walsender-replacement race
condition without an API/ABI break.

The bogus assertion exists back to 9.6, but 9.6 is sufficiently
different from the later branches that the patch doesn't apply at all.
I chose to just remove the bogus assertion in 9.6, feeling that the
probability of a bad outcome from the walsender-replacement race
condition is too low to justify rewriting the whole patch for 9.6.

Discussion: https://postgr.es/m/21519.1585272409@sss.pgh.pa.us
2020-04-18 14:02:44 -04:00
David Rowley
3cb02e307e Fix possible crash with GENERATED ALWAYS columns
In some corner cases, this could also lead to corrupted values being
included in the tuple.

Users who are concerned that they are affected by this should first
upgrade and then perform a base backup of their database and restore onto
an off-line server. They should then query each table with generated
columns to ensure there are no rows where the generated expression does
not match a newly calculated version of the GENERATED ALWAYS expression.
If no crashes occur and no rows are returned then you're not affected.

Fixes bug #16369.

Reported-by: Cameron Ezell
Discussion: https://postgr.es/m/16369-5845a6f1bef59884@postgresql.org
Backpatch-through: 12 (where GENERATED ALWAYS columns were added.)
2020-04-18 14:10:37 +12:00