Commit Graph

20848 Commits

Author SHA1 Message Date
Thomas Munro 7897e3bb90 Fix buffile.c error handling.
Convert buffile.c error handling to use ereport.  This fixes cases where
I/O errors were indistinguishable from EOF or not reported.  Also remove
"%m" from error messages where errno would be bogus.  While we're
modifying those strings, add block numbers and short read byte counts
where appropriate.

Back-patch to all supported releases.

Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com
2020-06-16 16:59:07 +12:00
Tom Lane 5674eb9876 Fix power() for large inputs yet more.
Buildfarm results for commit e532b1d57 reveal the error in my thinking
about the unexpected-EDOM case.  I'd supposed this was no longer really
a live issue, but it seems the fix for glibc's bug #3866 is not all that
old, and we still have at least one buildfarm animal (lapwing) with the
bug.  Hence, resurrect essentially the previous logic (but, I hope, less
opaquely presented), and explain what it is we're really doing here.

Also, blindly try to fix fossa's failure by tweaking the logic that
figures out whether y is an odd integer when x is -inf.  This smells
a whole lot like a compiler bug, but I lack access to icc to try to
pin it down.  Maybe doing division instead of multiplication will
dodge the issue.

Discussion: https://postgr.es/m/E1jkU7H-00024V-NZ@gemulon.postgresql.org
2020-06-15 19:10:33 -04:00
Robert Haas 2961c9711c Assorted cleanup of tar-related code.
Introduce TAR_BLOCK_SIZE and replace many instances of 512 with
the new constant. Introduce function tarPaddingBytesRequired
and use it to replace numerous repetitions of (x + 511) & ~511.

Add preprocessor guards against multiple inclusion to pgtar.h.

Reformat the prototype for tarCreateHeader so it doesn't extend
beyond 80 characters.

Discussion: http://postgr.es/m/CA+TgmobWbfReO9-XFk8urR1K4wTNwqoHx_v56t7=T8KaiEoKNw@mail.gmail.com
2020-06-15 15:28:49 -04:00
Tom Lane e532b1d57d Fix power() for infinity inputs some more.
Buildfarm results for commit decbe2bfb show that AIX and illumos
have non-POSIX-compliant pow() functions, as do ancient NetBSD
and HPUX releases.  While it's dubious how much we should care
about the latter two platforms, the former two are probably enough
reason to put in manual handling of infinite-input cases.  Hence,
do so, and clean up the post-pow() error handling to reflect its
now-more-limited scope.  (Notably, while we no longer expect to
ever see EDOM from pow(), report it as a domain error if we do.
The former coding had the net effect of expensively converting the
error to ERANGE, which seems highly questionable: if pow() wanted
to report ERANGE, it would have done so.)

Patch by me; thanks to Michael Paquier for review.

Discussion: https://postgr.es/m/E1jkU7H-00024V-NZ@gemulon.postgresql.org
2020-06-15 12:15:56 -04:00
Michael Paquier 7a3543c2ea Fix some comments referring to past features
Timestamp can only be an int64 since b9d092c, and support for WITH OIDS
has been removed as of 578b229.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20200612023709.GC14879@telsasoft.com
2020-06-15 21:18:14 +09:00
Tom Lane decbe2bfb1 Fix behavior of exp() and power() for infinity inputs.
Previously, these functions tended to throw underflow errors for
negative-infinity exponents.  The correct thing per POSIX is to
return 0, so let's do that instead.  (Note that the SQL standard
is silent on such issues, as it lacks the concepts of either Inf
or NaN; so our practice is to follow POSIX whenever a corresponding
C-library function exists.)

Also, add a bunch of test cases verifying that exp() and power()
actually do follow POSIX for Inf and NaN inputs.  While this patch
should guarantee that exp() passes the tests, power() will not unless
the platform's pow(3) is fully POSIX-compliant.  I already know that
gaur fails some of the tests, and I am suspicious that the Windows
animals will too; the extent of compliance of other old platforms
remains to be seen.  We might choose to drop failing test cases, or
to work harder at overriding pow(3) for these cases, but first let's
see just how good or bad the situation is.

Discussion: https://postgr.es/m/582552.1591917752@sss.pgh.pa.us
2020-06-14 11:00:07 -04:00
Michael Paquier cc072641d4 Replace superuser check by ACLs for replication origin functions
This patch removes the hardcoded check for superuser privileges when
executing replication origin functions.  Instead, execution is revoked
from public, meaning that those functions can be executed by a superuser
and that access to them can be granted.

Author: Martín Marqués
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Masahiko Sawada
Discussion: https:/postgr.es/m/CAPdiE1xJMZOKQL3dgHMUrPqysZkgwzSMXETfKkHYnBAB7-0VRQ@mail.gmail.com
2020-06-14 12:40:37 +09:00
Tom Lane 23cbeda50b Sync behavior of var_samp and stddev_samp for single NaN inputs.
var_samp(numeric) and stddev_samp(numeric) disagreed with their float
cousins about what to do for a single non-null input value that is NaN.
The float versions return NULL on the grounds that the calculation is
only defined for more than one non-null input, which seems like the
right answer.  But the numeric versions returned NaN, as a result of
dealing with edge cases in the wrong order.  Fix that.  The patch
also gets rid of an insignificant memory leak in such cases.

This inconsistency is of long standing, but on the whole it seems best
not to back-patch the change into stable branches; nobody's complained
and it's such an obscure point that nobody's likely to complain.
(Note that v13 and v12 now contain test cases that will notice if we
accidentally back-patch this behavior change in future.)

Report and patch by me; thanks to Dean Rasheed for review.

Discussion: https://postgr.es/m/353062.1591898766@sss.pgh.pa.us
2020-06-13 14:01:46 -04:00
Tom Lane 03109a5302 Fix behavior of float aggregates for single Inf or NaN inputs.
When there is just one non-null input value, and it is infinity or NaN,
aggregates such as stddev_pop and covar_pop should produce a NaN
result, because the calculation is not well-defined.  They used to do
so, but since we adopted Youngs-Cramer aggregation in commit e954a727f,
they produced zero instead.  That's an oversight, so fix it.  Add tests
exercising these edge cases.

Affected aggregates are

 var_pop(double precision)
 stddev_pop(double precision)
 var_pop(real)
 stddev_pop(real)
 regr_sxx(double precision,double precision)
 regr_syy(double precision,double precision)
 regr_sxy(double precision,double precision)
 regr_r2(double precision,double precision)
 regr_slope(double precision,double precision)
 regr_intercept(double precision,double precision)
 covar_pop(double precision,double precision)
 corr(double precision,double precision)

Back-patch to v12 where the behavior change was accidentally introduced.

Report and patch by me; thanks to Dean Rasheed for review.

Discussion: https://postgr.es/m/353062.1591898766@sss.pgh.pa.us
2020-06-13 13:43:40 -04:00
Peter Geoghegan d64f1cdf2f Silence _bt_check_unique compiler warning.
Reported-By: Tom Lane
Discussion: https://postgr.es/m/841649.1592065060@sss.pgh.pa.us
2020-06-13 09:33:33 -07:00
Peter Eisentraut 8f5b596744 Refactor AlterExtensionContentsStmt grammar
Make use of the general object support already used by COMMENT, DROP,
and SECURITY LABEL.

Discussion: https://www.postgresql.org/message-id/flat/163c00a5-f634-ca52-fc7c-0e53deda8735%402ndquadrant.com
2020-06-13 09:19:30 +02:00
Peter Eisentraut a332b366d4 Grammar object type refactoring
Unify the grammar of COMMENT, DROP, and SECURITY LABEL further.  They
all effectively just take an object address for later processing, so
we can make the grammar more generalized.  Some extra checking about
which object types are supported can be done later in the statement
execution.

Discussion: https://www.postgresql.org/message-id/flat/163c00a5-f634-ca52-fc7c-0e53deda8735%402ndquadrant.com
2020-06-13 09:19:30 +02:00
David Rowley dad75eb4a8 Have pg_itoa, pg_ltoa and pg_lltoa return the length of the string
Core by no means makes excessive use of these functions, but quite a large
number of those usages do require the caller to call strlen() on the
returned string.  This is quite wasteful since these functions do already
have a good idea of the length of the string, so we might as well just
have them return that.

Reviewed-by: Andrew Gierth
Discussion: https://postgr.es/m/CAApHDvrm2A5x2uHYxsqriO2cUaGcFvND%2BksC9e7Tjep0t2RK_A%40mail.gmail.com
2020-06-13 12:32:00 +12:00
David Rowley 9a7fccd9ea Add missing extern keyword for a couple of numutils functions
In passing, also remove a few surplus empty lines from pg_ltoa and
pg_ulltoa_n in numutils.c

Reported-by: Andrew Gierth
Discussion: https://postgr.es/m/87y2ou3xuh.fsf@news-spur.riddles.org.uk
Backpatch-through: 13, where these changes were introduced
2020-06-13 11:27:25 +12:00
Tom Lane 2f48ede080 Avoid using a cursor in plpgsql's RETURN QUERY statement.
plpgsql has always executed the query given in a RETURN QUERY command
by opening it as a cursor and then fetching a few rows at a time,
which it turns around and dumps into the function's result tuplestore.
The point of this was to keep from blowing out memory with an oversized
SPITupleTable result (note that while a tuplestore can spill tuples
to disk, SPITupleTable cannot).  However, it's rather inefficient, both
because of extra data copying and because of executor entry/exit
overhead.  In recent versions, a new performance problem has emerged:
use of a cursor prevents use of a parallel plan for the executed query.

We can improve matters by skipping use of a cursor and having the
executor push result tuples directly into the function's result
tuplestore.  However, a moderate amount of new infrastructure is needed
to make that idea work:

* We can use the existing tstoreReceiver.c DestReceiver code to funnel
executor output to the tuplestore, but it has to be extended to support
plpgsql's requirement for possibly applying a tuple conversion map.

* SPI needs to be extended to allow use of a caller-supplied
DestReceiver instead of its usual receiver that puts tuples into
a SPITupleTable.  Two new API calls are needed to handle both the
RETURN QUERY and RETURN QUERY EXECUTE cases.

I also felt that I didn't want these new API calls to use the legacy
method of specifying query parameter values with "char" null flags
(the old ' '/'n' convention); rather they should accept ParamListInfo
objects containing the parameter type and value info.  This required
a bit of additional new infrastructure since we didn't yet have any
parse analysis callback that would interpret $N parameter symbols
according to type data supplied in a ParamListInfo.  There seems to be
no harm in letting makeParamList install that callback by default,
rather than leaving a new ParamListInfo's parserSetup hook as NULL.
(Indeed, as of HEAD, I couldn't find anyplace that was using the
parserSetup field at all; plpgsql was using parserSetupArg for its
own purposes, but parserSetup seemed to be write-only.)

We can actually get plpgsql out of the business of using legacy null
flags altogether, and using ParamListInfo instead of its ad-hoc
PreparedParamsData structure; but this requires inventing one more
SPI API call that can replace SPI_cursor_open_with_args.  That seems
worth doing, though.

SPI_execute_with_args and SPI_cursor_open_with_args are now unused
anywhere in the core PG distribution.  Perhaps someday we could
deprecate/remove them.  But cleaning up the crufty bits of the SPI
API is a task for a different patch.

Per bug #16040 from Jeremy Smith.  This is unfortunately too invasive to
consider back-patching.  Patch by me; thanks to Hamid Akhtar for review.

Discussion: https://postgr.es/m/16040-eaacad11fecfb198@postgresql.org
2020-06-12 12:14:32 -04:00
Michael Paquier aaf8c99050 Fix typos and some format mistakes in comments
Author: Justin Pryzby
Discussion: https://postgr.es/m/20200612023709.GC14879@telsasoft.com
2020-06-12 21:05:10 +09:00
Peter Eisentraut ffd2582297 Make more use of RELKIND_HAS_STORAGE()
Make use of RELKIND_HAS_STORAGE() where appropriate, instead of
listing out the relkinds individually.  No behavior change intended.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/7a22bf51-2480-d999-1794-191ba67ff47c%402ndquadrant.com
2020-06-12 09:10:26 +02:00
Thomas Munro 7aa4fb5925 Improve comments for [Heap]CheckForSerializableConflictOut().
Rewrite the documentation of these functions, in light of recent bug fix
commit 5940ffb2.

Back-patch to 13 where the check-for-conflict-out code was split up into
AM-specific and generic parts, and new documentation was added that now
looked wrong.

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443%40jepsen.io
2020-06-12 10:55:38 +12:00
Tom Lane 77a3be32f7 Fix mishandling of NaN counts in numeric_[avg_]combine.
When merging two NumericAggStates, the code missed adding the new
state's NaNcount unless its N was also nonzero; since those counts
are independent, this is wrong.

This would only have visible effect if some partial aggregate scans
found only NaNs while earlier ones found only non-NaNs; then we could
end up falsely deciding that there were no NaNs and fail to return a
NaN final result as expected.  That's pretty improbable, so it's no
surprise this hasn't been reported from the field.  Still, it's a bug.

I didn't try to produce a regression test that would show the bug,
but I did notice that these functions weren't being reached at all
in our regression tests, so I improved the tests to at least
exercise them.  With these additions, I see pretty complete code
coverage on the aggregation-related functions in numeric.c.

Back-patch to 9.6 where this code was introduced.  (I only added
the improved test case as far back as v10, though, since the
relevant part of aggregates.sql isn't there at all in 9.6.)
2020-06-11 17:38:42 -04:00
Jeff Davis 92c58fd948 Rework HashAgg GUCs.
Eliminate enable_groupingsets_hash_disk, which was primarily useful
for testing grouping sets that use HashAgg and spill. Instead, hack
the table stats to convince the planner to choose hashed aggregation
for grouping sets that will spill to disk. Suggested by Melanie
Plageman.

Rename enable_hashagg_disk to hashagg_avoid_disk_plan, and invert the
meaning of on/off. The new name indicates more strongly that it only
affects the planner. Also, the word "avoid" is less definite, which
should avoid surprises when HashAgg still needs to use the
disk. Change suggested by Justin Pryzby, though I chose a different
GUC name.

Discussion: https://postgr.es/m/CAAKRu_aisiENMsPM2gC4oUY1hHG3yrCwY-fXUg22C6_MJUwQdA%40mail.gmail.com
Discussion: https://postgr.es/m/20200610021544.GA14879@telsasoft.com
Backpatch-through: 13
2020-06-11 12:57:43 -07:00
Peter Geoghegan 5940ffb221 Avoid update conflict out serialization anomalies.
SSI's HeapCheckForSerializableConflictOut() test failed to correctly
handle conditions involving a concurrently inserted tuple which is later
concurrently updated by a separate transaction .  A SELECT statement
that called HeapCheckForSerializableConflictOut() could end up using the
same XID (updater's XID) for both the original tuple, and the successor
tuple, missing the XID of the xact that created the original tuple
entirely.  This only happened when neither tuple from the chain was
visible to the transaction's MVCC snapshot.

The observable symptoms of this bug were subtle.  A pair of transactions
could commit, with the later transaction failing to observe the effects
of the earlier transaction (because of the confusion created by the
update to the non-visible row).  This bug dates all the way back to
commit dafaa3ef, which added SSI.

To fix, make sure that we check the xmin of concurrently inserted tuples
that happen to also have been updated concurrently.

Author: Peter Geoghegan
Reported-By: Kyle Kingsbury
Reviewed-By: Thomas Munro
Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443@jepsen.io
Backpatch: All supported versions
2020-06-11 10:09:47 -07:00
Peter Eisentraut 3fbd4bb6f4 Refactor DROP LANGUAGE grammar
Fold it into the generic DropStmt.

Discussion: https://www.postgresql.org/message-id/flat/163c00a5-f634-ca52-fc7c-0e53deda8735%402ndquadrant.com
2020-06-11 11:18:15 +02:00
Peter Eisentraut 5333e014ab Remove deprecated syntax from CREATE/DROP LANGUAGE
Remove the option to specify the language name as a single-quoted
string.  This has been obsolete since ee8ed85da3.  Removing it allows
better grammar refactoring.

The syntax of the CREATE FUNCTION LANGUAGE clause is not changed.

Discussion: https://www.postgresql.org/message-id/flat/163c00a5-f634-ca52-fc7c-0e53deda8735%402ndquadrant.com
2020-06-11 10:26:12 +02:00
Peter Eisentraut c4325cefba Fold AlterForeignTableStmt into AlterTableStmt
All other relation types are handled by AlterTableStmt, so it's
unnecessary to make a different statement for foreign tables.

Discussion: https://www.postgresql.org/message-id/flat/163c00a5-f634-ca52-fc7c-0e53deda8735%402ndquadrant.com
2020-06-11 08:21:24 +02:00
Peter Eisentraut c2bd1fec32 Remove redundant grammar symbols
access_method, database_name, and index_name are all just name, and
they are not used consistently for their alleged purpose, so remove
them.  They have been around since ancient times but have no current
reason for existing.  Removing them can simplify future grammar
refactoring.

Discussion: https://www.postgresql.org/message-id/flat/163c00a5-f634-ca52-fc7c-0e53deda8735%402ndquadrant.com
2020-06-10 22:58:46 +02:00
Peter Eisentraut c7eab0e97e Change default of password_encryption to scram-sha-256
Also, the legacy values on/true/yes/1 for password_encryption that
mapped to md5 are removed.  The only valid values are now
scram-sha-256 and md5.

Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Discussion: https://www.postgresql.org/message-id/flat/d5b0ad33-7d94-bdd1-caac-43a1c782cab2%402ndquadrant.com
2020-06-10 16:42:55 +02:00
Peter Eisentraut 5a4ada71a8 Update description of parameter password_encryption
The previous description string still described the pre-PostgreSQL
10 (pre eb61136dc7) behavior of
selecting between encrypted and unencrypted, but it is now choosing
between encryption algorithms.
2020-06-10 11:57:41 +02:00
Amit Kapila c5c000b103 Fix ReorderBuffer memory overflow check.
Commit cec2edfa78 introduced logical_decoding_work_mem to limit
ReorderBuffer memory usage. We spill the changes once the memory occupied
by changes exceeds logical_decoding_work_mem.  There was an assumption
in the code that by evicting the largest (sub)transaction we will come
under the memory limit as the selected transaction will be at least as
large as the most recent change (which caused us to go over the memory
limit).  However, that is not true because a user can reduce the
logical_decoding_work_mem to a smaller value before the most recent
change.

We fix it by allowing to evict the transactions until we reach under the
memory limit.

Reported-by: Fujii Masao
Author: Amit Kapila
Reviewed-by: Fujii Masao
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/2b7ba291-22e0-a187-d167-9e5309a3458d@oss.nttdata.com
2020-06-10 10:20:10 +05:30
Peter Eisentraut 350f47786c Spelling adjustments
similar to 0fd2a79a63
2020-06-09 10:41:41 +02:00
Peter Eisentraut b1d32d3e32 Unify drop-by-OID functions
There are a number of Remove${Something}ById() functions that are
essentially identical in structure and only different in which catalog
they are working on.  Refactor this to be one generic function.  The
information about which oid column, index, etc. to use was already
available in ObjectProperty for most catalogs, in a few cases it was
easily added.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/331d9661-1743-857f-1cbb-d5728bcd62cb%402ndquadrant.com
2020-06-09 09:39:46 +02:00
David Rowley b27c90bbe4 Fix invalid function references in a few comments
These appear to have been forgotten when the functions were renamed in
1fd687a03.

Backpatch-through: 13, where the functions were renamed
2020-06-09 18:43:15 +12:00
Jeff Davis 1b2c29469a Fix HashAgg regression from choosing too many initial buckets.
Diagnosis by Andres.

Reported-by: Pavel Stehule
Discussion: https://postgr.es/m/CAFj8pRDLVakD5Aagt3yZeEQeTeEWaS3YE5h8XC3Q3qJ6TYkc2Q%40mail.gmail.com
Backpatch-through: 13
2020-06-08 21:04:16 -07:00
Peter Eisentraut cbcc8726bb Update snowball
Update to snowball tag v2.0.0.  Major changes are new stemmers for
Basque, Catalan, and Hindi.

Discussion: https://www.postgresql.org/message-id/flat/a8eeabd6-2be1-43fe-401e-a97594c38478%402ndquadrant.com
2020-06-08 08:07:15 +02:00
Thomas Munro 57cb806308 Fix locking bugs that could corrupt pg_control.
The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire
ControlFileLock before modifying ControlFile->checkPointCopy, or the
checkpointer could write out a control file with a bad checksum.

Likewise, XLogReportParameters() must acquire ControlFileLock before
modifying ControlFile and calling UpdateControlFile().

Back-patch to all supported releases.

Author: Nathan Bossart <bossartn@amazon.com>
Author: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
2020-06-08 13:57:24 +12:00
Michael Paquier 879ad9f90e Fix crash in WAL sender when starting physical replication
Since database connections can be used with WAL senders in 9.4, it is
possible to use physical replication.  This commit fixes a crash when
starting physical replication with a WAL sender using a database
connection, caused by the refactoring done in 850196b.

There have been discussions about forbidding the use of physical
replication in a database connection, but this is left for later,
taking care only of the crash new to 13.

While on it, add a test to check for a failure when attempting logical
replication if the WAL sender does not have a database connection.  This
part is extracted from a larger patch by Kyotaro Horiguchi.

Reported-by: Vladimir Sitnikov
Author: Michael Paquier, Kyotaro Horiguchi
Reviewed-by: Kyotaro Horiguchi, Álvaro Herrera
Discussion: https://postgr.es/m/CAB=Je-GOWMj1PTPkeUhjqQp-4W3=nW-pXe2Hjax6rJFffB5_Aw@mail.gmail.com
Backpatch-through: 13
2020-06-08 10:12:24 +09:00
Tom Lane b5d69b7c22 pgindent run prior to branching v13.
pgperltidy and reformat-dat-files too, though those didn't
find anything to change.
2020-06-07 16:57:08 -04:00
Jeff Davis 1fbb6c93df Fix platform-specific performance regression in logtape.c.
Commit 24d85952 made a change that indirectly caused a performance
regression by triggering a change in the way GCC optimizes memcpy() on
some platforms.

The behavior seemed to contradict a GCC document, so I filed a report:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95556

This patch implements a narrow workaround which eliminates the
regression I observed. The workaround is benign enough that it seems
unlikely to cause a different regression on another platform.

Discussion: https://postgr.es/m/99b2eab335c1592c925d8143979c8e9e81e1575f.camel@j-davis.com
2020-06-07 09:25:55 -07:00
Peter Eisentraut 0fd2a79a63 Spelling adjustments 2020-06-07 15:06:51 +02:00
Peter Eisentraut f4c88ce1a2 Formatting and punctuation improvements in postgresql.conf.sample 2020-06-07 14:35:12 +02:00
Tom Lane 0c882e52a8 Improve ineq_histogram_selectivity's behavior for non-default orderings.
ineq_histogram_selectivity() can be invoked in situations where the
ordering we care about is not that of the column's histogram.  We could
be considering some other collation, or even more drastically, the
query operator might not agree at all with what was used to construct
the histogram.  (We'll get here for anything using scalarineqsel-based
estimators, so that's quite likely to happen for extension operators.)

Up to now we just ignored this issue and assumed we were dealing with
an operator/collation whose sort order exactly matches the histogram,
possibly resulting in junk estimates if the binary search gets confused.
It's past time to improve that, since the use of nondefault collations
is increasing.  What we can do is verify that the given operator and
collation match what's recorded in pg_statistic, and use the existing
code only if so.  When they don't match, instead execute the operator
against each histogram entry, and take the fraction of successes as our
selectivity estimate.  This gives an estimate that is probably good to
about 1/histogram_size, with no assumptions about ordering.  (The quality
of the estimate is likely to degrade near the ends of the value range,
since the two orderings probably don't agree on what is an extremal value;
but this is surely going to be more reliable than what we did before.)

At some point we might further improve matters by storing more than one
histogram calculated according to different orderings.  But this code
would still be good fallback logic when no matches exist, so that is
not an argument for not doing this.

While here, also improve get_variable_range() to deal more honestly
with non-default collations.

This isn't back-patchable, because it requires adding another argument
to ineq_histogram_selectivity, and because it might have significant
impact on the estimation results for extension operators relying on
scalarineqsel --- mostly for the better, one hopes, but in any case
destabilizing plan choices in back branches is best avoided.

Per investigation of a report from James Lucas.

Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
2020-06-05 16:55:27 -04:00
Tom Lane 044c99bc56 Use query collation, not column's collation, while examining statistics.
Commit 5e0928005 changed the planner so that, instead of blindly using
DEFAULT_COLLATION_OID when invoking operators for selectivity estimation,
it would use the collation of the column whose statistics we're
considering.  This was recognized as still being not quite the right
thing, but it seemed like a good incremental improvement.  However,
shortly thereafter we introduced nondeterministic collations, and that
creates cases where operators can fail if they're passed the wrong
collation.  We don't want planning to fail in cases where the query itself
would work, so this means that we *must* use the query's collation when
invoking operators for estimation purposes.

The only real problem this creates is in ineq_histogram_selectivity, where
the binary search might produce a garbage answer if we perform comparisons
using a different collation than the column's histogram is ordered with.
However, when the query's collation is significantly different from the
column's default collation, the estimate we previously generated would be
pretty irrelevant anyway; so it's not clear that this will result in
noticeably worse estimates in practice.  (A follow-on patch will improve
this situation in HEAD, but it seems too invasive for back-patch.)

The patch requires changing the signatures of mcv_selectivity and allied
functions, which are exported and very possibly are used by extensions.
In HEAD, I just did that, but an API/ABI break of this sort isn't
acceptable in stable branches.  Therefore, in v12 the patch introduces
"mcv_selectivity_ext" and so on, with signatures matching HEAD, and makes
the old functions into wrappers that assume DEFAULT_COLLATION_OID should
be used.  That does not match the prior behavior, but it should avoid risk
of failure in most cases.  (In practice, I think most extension datatypes
aren't collation-aware, so the change probably doesn't matter to them.)

Per report from James Lucas.  Back-patch to v12 where the problem was
introduced.

Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
2020-06-05 16:18:50 -04:00
Michael Paquier 1127f0e392 Preserve pg_index.indisreplident across REINDEX CONCURRENTLY
If the flag value is lost, logical decoding would work the same way as
REPLICA IDENTITY NOTHING, meaning that no old tuple values would be
included in the changes anymore produced by logical decoding.

Author: Michael Paquier
Reviewed-by: Euler Taveira
Discussion: https://postgr.es/m/20200603065340.GK89559@paquier.xyz
Backpatch-through: 12
2020-06-05 10:26:02 +09:00
Tom Lane a9632830bb Reject "23:59:60.nnn" in datetime input.
It's intentional that we don't allow values greater than 24 hours,
while we do allow "24:00:00" as well as "23:59:60" as inputs.
However, the range check was miscoded in such a way that it would
accept "23:59:60.nnn" with a nonzero fraction.  For time or timetz,
the stored result would then be greater than "24:00:00" which would
fail dump/reload, not to mention possibly confusing other operations.

Fix by explicitly calculating the result and making sure it does not
exceed 24 hours.  (This calculation is redundant with what will happen
later in tm2time or tm2timetz.  Maybe someday somebody will find that
annoying enough to justify refactoring to avoid the duplication; but
that seems too invasive for a back-patched bug fix, and the cost is
probably unmeasurable anyway.)

Note that this change also rejects such input as the time portion
of a timestamp(tz) value.

Back-patch to v10.  The bug is far older, but to change this pre-v10
we'd need to ensure that the logic behaves sanely with float timestamps,
which is possibly nontrivial due to roundoff considerations.
Doesn't really seem worth troubling with.

Per report from Christoph Berg.

Discussion: https://postgr.es/m/20200520125807.GB296739@msg.df7cb.de
2020-06-04 16:42:23 -04:00
Michael Paquier 3fa44a3004 Fix comment in be-secure-openssl.c
Since 573bd08, hardcoded DH parameters have been moved to a different
file, making the comment on top of load_dh_buffer() incorrect.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/D9492CCB-9A91-4181-A847-1779630BE2A7@yesql.se
2020-06-04 13:02:59 +09:00
Michael Paquier c1669fd581 Fix instance of elog() called while holding a spinlock
This broke the project rule to not call any complex code while a
spinlock is held.  Issue introduced by b89e151.

Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com
Backpatch-through: 9.5
2020-06-04 10:17:49 +09:00
Tom Lane f88bd3139f Don't call palloc() while holding a spinlock, either.
Fix some more violations of the "only straight-line code inside a
spinlock" rule.  These are hazardous not only because they risk
holding the lock for an excessively long time, but because it's
possible for palloc to throw elog(ERROR), leaving a stuck spinlock
behind.

copy_replication_slot() had two separate places that did pallocs
while holding a spinlock.  We can make the code simpler and safer
by copying the whole ReplicationSlot struct into a local variable
while holding the spinlock, and then referencing that copy.
(While that's arguably more cycles than we really need to spend
holding the lock, the struct isn't all that big, and this way seems
far more maintainable than copying fields piecemeal.  Anyway this
is surely much cheaper than a palloc.)  That bug goes back to v12.

InvalidateObsoleteReplicationSlots() not only did a palloc while
holding a spinlock, but for extra sloppiness then leaked the memory
--- probably for the lifetime of the checkpointer process, though
I didn't try to verify that.  Fortunately that silliness is new
in HEAD.

pg_get_replication_slots() had a cosmetic violation of the rule,
in that it only assumed it's safe to call namecpy() while holding
a spinlock.  Still, that's a hazard waiting to bite somebody, and
there were some other cosmetic coding-rule violations in the same
function, so clean it up.  I back-patched this as far as v10; the
code exists before that but it looks different, and this didn't
seem important enough to adapt the patch further back.

Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com
2020-06-03 12:36:23 -04:00
Fujii Masao caa3c4242c Don't call elog() while holding spinlock.
Previously UpdateSpillStats() called elog(DEBUG2) while holding
the spinlock even though the local variables that the elog() accesses
don't need to be protected by the lock. Since spinlocks are intended
for very short-term locks, they should not be used when calling
elog(DEBUG2). So this commit moves that elog() out of spinlock period.

Author: Kyotaro Horiguchi
Reviewed-by: Amit Kapila and Fujii Masao
Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com
2020-06-02 19:21:04 +09:00
Peter Eisentraut 42181b1015 Use correct and consistent unit abbreviation 2020-06-01 21:18:36 +02:00
Michael Paquier ce1c5b9ae8 Fix use-after-release mistake in currtid() and currtid2() for views
This issue has been present since the introduction of this code as of
a3519a2 from 2002, and has been found by buildfarm member prion that
uses RELCACHE_FORCE_RELEASE via the tests introduced recently in
e786be5.

Discussion: https://postgr.es/m/20200601022055.GB4121@paquier.xyz
Backpatch-through: 9.5
2020-06-01 14:41:18 +09:00
Michael Paquier e786be5fcb Fix crashes with currtid() and currtid2()
A relation that has no storage initializes rd_tableam to NULL, which
caused those two functions to crash because of a pointer dereference.
Note that in 11 and older versions, this has always failed with a
confusing error "could not open file".

These two functions are used by the Postgres ODBC driver, which requires
them only when connecting to a backend strictly older than 8.1.  When
connected to 8.2 or a newer version, the driver uses a RETURNING clause
instead whose support has been added in 8.2, so it should be possible to
just remove both functions in the future.  This is left as an issue to
address later.

While on it, add more regression tests for those functions as we never
really had coverage for them, and for aggregates of TIDs.

Reported-by: Jaime Casanova, via sqlsmith
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/CAJGNTeO93u-5APMga6WH41eTZ3Uee9f3s8dCpA-GSSqNs1b=Ug@mail.gmail.com
Backpatch-through: 12
2020-06-01 10:32:06 +09:00
Tomas Vondra 4cad2534da Use CP_SMALL_TLIST for hash aggregate
Commit 1f39bce021 added disk-based hash aggregation, which may spill
incoming tuples to disk. It however did not request projection to make
the tuples as narrow as possible, which may mean having to spill much
more data than necessary (increasing I/O, pushing other stuff from page
cache, etc.).

This adds CP_SMALL_TLIST in places that may use hash aggregation - we do
that only for AGG_HASHED. It's unnecessary for AGG_SORTED, because that
either uses explicit Sort (which already does projection) or pre-sorted
input (which does not need spilling to disk).

Author: Tomas Vondra
Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/20200519151202.u2p2gpiawoaznsv2%40development
2020-05-31 14:43:13 +02:00
Andres Freund 6a4a335b84 llvmjit: Fix building against LLVM 11 by removing unnecessary include.
LLVM has removed this header, in the branch that will become llvm
11. But as it turns out we didn't actually need it, so just remove it.

Author: Jesse Zhang <sbjesse@gmail.com>
Discussion: https://postgr.es/m/CAGf+fX7bvtP0YXMu7pOsu_NwhxW6dArTkxb=jt7M2-UJkyJ_3g@mail.gmail.com
Backpatch: 11, where JIT support using llvm was introduced.
2020-05-28 15:24:28 -07:00
Joe Conway 887cdff4dc Add CHECK_FOR_INTERRUPTS() to the repeat() function
The repeat() function loops for potentially a long time without
ever checking for interrupts. This prevents, for example, a query
cancel from interrupting until the work is all done. Fix by
inserting a CHECK_FOR_INTERRUPTS() into the loop.

Backpatch to all supported versions.

Discussion: https://www.postgresql.org/message-id/flat/8692553c-7fe8-17d9-cbc1-7cddb758f4c6%40joeconway.com
2020-05-28 13:19:00 -04:00
Heikki Linnakangas 5b1c61e8b8 Add missing error code to "cannot attach index ..." error.
ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE was used in an ereport with the
same message but different errdetail a few lines earlier, so use that
here as well.

Backpatch-through: 11
2020-05-28 12:37:00 +03:00
Michael Paquier 55ca50deb8 Fix some mentions to memory units in postgresql.conf.sample
The default unit for max_slot_wal_keep_size is megabytes.  While on it,
also change temp_file_limit to use a more consistent wording.

Reported-by: Jeff Janes, Fujii Masao
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAMkU=1wWZhhjpwRFKJ9waQGxxROeC0P6UqPvb90fAaGz7dhoHA@mail.gmail.com
2020-05-28 15:39:05 +09:00
Jeff Davis 896ddf9b3c Avoid fragmentation of logical tapes when writing concurrently.
Disk-based HashAgg relies on writing to multiple tapes
concurrently. Avoid fragmentation of the tapes' blocks by
preallocating many blocks for a tape at once. No file operations are
performed during preallocation; only the block numbers are reserved.

Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20200519151202.u2p2gpiawoaznsv2%40development
2020-05-26 16:49:43 -07:00
Peter Eisentraut add4211600 Add lcov exclusion markers to jsonpath scanner
This was done for all scanners in
4211673622 but not added to the new one.
2020-05-26 14:09:36 +02:00
Bruce Momjian ac5852fb30 gss: add missing references to hostgssenc and hostnogssenc
These were missed when these were added to pg_hba.conf in PG 12;
updates docs and pg_hba.conf.sample.

Reported-by: Arthur Nascimento

Bug: 16380

Discussion: https://postgr.es/m/20200421182736.GG19613@momjian.us

Backpatch-through: 12
2020-05-25 20:19:28 -04:00
Noah Misch 587322de36 Reconcile nodes/*funcs.c.
The stmt_len changes do not affect behavior.  LimitPath has no other
support functions, so that part changes only debugging output.
2020-05-25 16:23:48 -07:00
Michael Paquier a995b371ae Add missing invocations to object access hooks
The following commands have been missing calls to object access hooks
InvokeObjectPost{Create|Alter}Hook normally applied to all commands:
- ALTER RULE RENAME TO
- ALTER USER MAPPING
- CREATE ACCESS METHOD
- CREATE STATISTICS

Thanks also to Robert Haas for the discussion.

Author: Mark Dilger
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/435CD295-F409-44E0-91EC-DF32C7AFCD76@enterprisedb.com
2020-05-23 14:03:04 +09:00
Alvaro Herrera c99cec96b8
Fix two typos in a comment
They were introduced in 898e5e3290a7; backpatch to 12.
2020-05-22 17:39:16 -04:00
Peter Eisentraut 574925bfd0 Remove unnecessary cast
Probably copied from nearby calls where it is necessary.  But this one
also casts away constness, so it was doubly annoying.
2020-05-22 10:36:49 +02:00
Etsuro Fujita bb2ae6fa47 Adjust indentation in src/backend/optimizer/README.
The previous indentation of optimizer functions was unclear; adjust the
indentation dashes so that a deeper level of indentation indicates that
the outer optimizer function calls the inner one.

Author: Richard Guo, with additional change by me
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAMbWs4-U-ogzpchGsP2BBMufCss1hktm%2B%2BeTJK_dUC196pw0cQ%40mail.gmail.com
2020-05-22 15:45:00 +09:00
Noah Misch 3350fb5d1f Clear some style deviations. 2020-05-21 08:31:16 -07:00
Tom Lane c7d65a252c part_strategy does not need its very own keyword classification.
This should be plain old ColId.  Making it so makes the grammar less
complicated, and makes the compiled tables a kilobyte or so smaller
(likely because they don't have to deal with a keyword classification
that's not used anyplace else).
2020-05-19 20:09:59 -04:00
Peter Geoghegan 67b0b2dbf9 Reconsider nbtree page deletion assertion.
Commit 624686abcf added an assertion that verified that _bt_search
successfully relocated the leaf page undergoing deletion.  Page deletion
cannot deal with the case where the descent stack is to the right of the
page, so this seemed critical (deletion can only handle the case where
the descent stack is to the left of the leaf/target page).  However, the
assertion went a bit too far.

Since only a buffer pin is held on the leaf page throughout the call to
_bt_search, nothing guarantees that it can't have split during this
small window.  And if does actually split, _bt_search may end up
"relocating" a page to the right of the original target leaf page.  This
scenario seems extremely unlikely, but it must still be considered.
Remove the assertion, and document how we cope in this scenario.
2020-05-19 15:04:34 -07:00
Alvaro Herrera c301c2e739
WITH TIES: number of rows is optional and defaults to one
FETCH FIRST .. ONLY implements this correctly, but we missed to include
it for FETCH FIRST .. WITH TIES in commit 357889eb17.

Author: Vik Fearing
Discussion: https://postgr.es/m/6aa690ef-551d-e24f-2690-c38c2442947c@postgresfriends.org
2020-05-18 19:28:46 -04:00
Peter Eisentraut ac449d8801 Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 031ca65d7825c3e539a3e62ea9d6630af12e6b6b
2020-05-18 12:49:30 +02:00
Magnus Hagander a01debe3db Fix typos in README
Author: Daniel Gustafsson
2020-05-18 11:55:35 +02:00
Amit Kapila 7e041b0c1d Fix comment in slot.c.
Reported-by: Sawada Masahiko
Author: Sawada Masahiko
Reviewed-by: Amit Kapila
Backpatch-through: 9.5
Discussion: https://postgr.es/m/CA+fd4k4Ws7M7YQ8PqSym5WB1y75dZeBTd1sZJUQdfe0KJQ-iSA@mail.gmail.com
2020-05-18 07:53:26 +05:30
Tom Lane 3048898e73 Mop-up for wait event naming issues.
Synchronize the event names for parallel hash join waits with other
event names, by getting rid of the slashes and dropping "-ing"
suffixes.  Rename ClogGroupUpdate to XactGroupUpdate, to match the
new SLRU name.  Move the ProcSignalBarrier event to the IPC category;
it doesn't belong under IO.

Also a bit more wordsmithing in the wait event documentation tables.

Discussion: https://postgr.es/m/4505.1589640417@sss.pgh.pa.us
2020-05-16 21:00:11 -04:00
Michael Paquier 2c8dd05d6c Make pg_stat_wal_receiver consistent with the WAL receiver's shmem info
d140f2f3 has renamed receivedUpto to flushedUpto, and has added
writtenUpto to the WAL receiver's shared memory information, but
pg_stat_wal_receiver was not consistent with that.  This commit renames
received_lsn to flushed_lsn, and adds a new column called written_lsn.

Bump catalog version.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20200515090817.GA212736@paquier.xyz
2020-05-17 09:22:07 +09:00
Tom Lane fa27dd40d5 Run pgindent with new pg_bsd_indent version 2.1.1.
Thomas Munro fixed a longstanding annoyance in pg_bsd_indent, that
it would misformat lines containing IsA() macros on the assumption
that the IsA() call should be treated like a cast.  This improves
some other cases involving field/variable names that match typedefs,
too.  The only places that get worse are a couple of uses of the
OpenSSL macro STACK_OF(); we'll gladly take that trade-off.

Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
2020-05-16 11:54:51 -04:00
Tom Lane e02ad575d8 Final pgindent run with pg_bsd_indent version 2.1.
This is just to provide a clean basis for comparison of the results
of the new version.  I did fix a typo that crept into 242dfcbaf.

Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
2020-05-16 11:49:14 -04:00
Michael Paquier 7ccb2f54d9 Fix assertion with relation using REPLICA IDENTITY FULL in subscriber
In a logical replication subscriber, a table using REPLICA IDENTITY FULL
which has a primary key would try to use the primary key's index
available to scan for a tuple, but an assertion only assumed as correct
the case of an index associated to REPLICA IDENTITY USING INDEX.  This
commit corrects the assertion so as the use of a primary key index is a
valid case.

Reported-by: Dilip Kumar
Analyzed-by: Dilip Kumar
Author: Euler Taveira
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/CAFiTN-u64S5bUiPL1q5kwpHNd0hRnf1OE-bzxNiOs5zo84i51w@mail.gmail.com
Backpatch-through: 10
2020-05-16 18:15:18 +09:00
Tom Lane 474e7da648 Change locktype "speculative token" to "spectoken".
It's just weird that this name wasn't chosen to look like an
identifier.  The suspicion that it wasn't thought about too
hard is reinforced by the fact that it wasn't documented in
the pg_locks view (until I did so, a day or two back).

Update, and add a comment reminding future adjusters of this
array to fix the docs too.

Do some desultory wordsmithing on various entries in the wait
events tables.

Discussion: https://postgr.es/m/24595.1589326879@sss.pgh.pa.us
2020-05-15 21:47:34 -04:00
Alvaro Herrera 1d3743023e
Fix walsender error cleanup code
In commit 850196b610 I (Álvaro) failed to handle the case of walsender
shutting down on an error before setting up its 'xlogreader' pointer;
the error handling code dereferences the pointer, causing a crash.
Fix by testing the pointer before trying to dereference it.

Kyotaro authored the code fix; I adopted Nathan's test case to be used
by the TAP tests and added the necessary PostgresNode change.

Reported-by: Nathan Bossart <bossartn@amazon.com>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/C04FC24E-903D-4423-B312-6910E4D846E5@amazon.com
2020-05-15 20:00:52 -04:00
Tom Lane 14a9101091 Drop the redundant "Lock" suffix from LWLock wait event names.
This was mostly confusing, especially since some wait events in
this class had the suffix and some did not.

While at it, stop exposing MainLWLockNames[] as a globally visible
name; any code using that directly is almost certainly wrong, as
its name has been misleading for some time.
(GetLWLockIdentifier() is what to use instead.)

Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 19:55:56 -04:00
Tom Lane 8048404939 Fix bogus initialization of replication origin shared memory state.
The previous coding zeroed out offsetof(ReplicationStateCtl, states)
more bytes than it was entitled to, as a consequence of starting the
zeroing from the wrong pointer (or, if you prefer, using the wrong
calculation of how much to zero).

It's unsurprising that this has not caused any reported problems,
since it can be expected that the newly-allocated block is at the end
of what we've used in shared memory, and we always make the shmem
block substantially bigger than minimally necessary.  Nonetheless,
this is wrong and it could bite us someday; plus it's a dangerous
model for somebody to copy.

This dates back to the introduction of this code (commit 5aa235042),
so back-patch to all supported branches.
2020-05-15 19:05:39 -04:00
Tom Lane 36ac359d36 Rename assorted LWLock tranches.
Choose names that fit into the conventions for wait event names
(particularly, that multi-word names are in the style MultiWordName)
and hopefully convey more information to non-hacker users than the
previous names did.

Also rename SerializablePredicateLockListLock to
SerializablePredicateListLock; the old name was long enough to cause
table formatting problems, plus the double occurrence of "Lock" seems
confusing/error-prone.

Also change a couple of particularly opaque LWLock field names.

Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 18:11:07 -04:00
Alvaro Herrera a0ab4f4909
Add comments linking pg_strftime to timestamptz_to_str 2020-05-15 18:05:34 -04:00
Alvaro Herrera 242dfcbafa
Avoid killing btree items that are already dead
_bt_killitems marks btree items dead when a scan leaves the page where
they live, but it does so with only share lock (to improve concurrency).
This was historicall okay, since killing a dead item has no
consequences.  However, with the advent of data checksums and
wal_log_hints, this action incurs a WAL full-page-image record of the
page.  Multiple concurrent processes would write the same page several
times, leading to WAL bloat.  The probability of this happening can be
reduced by only killing items if they're not already dead, so change the
code to do that.

The problem could eliminated completely by having _bt_killitems upgrade
to exclusive lock upon seeing a killable item, but that would reduce
concurrency so it's considered a cure worse than the disease.

Backpatch all the way back to 9.5, since wal_log_hints was introduced in
9.4.

Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: https://postgr.es/m/CA+fd4k6PeRj2CkzapWNrERkja5G0-6D-YQiKfbukJV+qZGFZ_Q@mail.gmail.com
2020-05-15 16:50:34 -04:00
Tom Lane 5da14938f7 Rename SLRU structures and associated LWLocks.
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care.  Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing).  Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.

For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects.  This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.

(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)

Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.

Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 14:28:25 -04:00
Amit Kapila a9cf48a4cf Make COPY TO keep locks until the transaction end.
COPY TO released the ACCESS SHARE lock immediately when it was done rather
than holding on to it until the end of the transaction.

This breaks the case where a REPEATABLE READ transaction could see an
empty table if it repeats a COPY statement and somebody truncated the
table in the meantime.

Before 4dded12faa the lock was also released after COPY FROM, but the
commit failed to notice the irregularity in COPY TO.

This is old behavior but doesn't seem important enough to backpatch.

Author: Laurenz Albe, based on suggestion by Robert Haas and Tom Lane
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/7bcfc39d4176faf85ab317d0c26786953646a411.camel@cybertec.at
2020-05-15 08:10:00 +05:30
Michael Paquier ff87fabef2 Remove duplicated comment block in event_trigger.c
The reasons why event triggers are disabled in standalone mode are
documented in the code path of ddl_command_start, and other places
checking if standalone mode is enabled or not mention to refer to the
comment for ddl_command_start, except for table_rewrite that duplicated
the same explanation.

Reported-by: David G. Johnston
Discussion: https://postgr.es/m/CAKFQuwYqHtXpvr2mBJRwH9f+Y5y1GXw3rhbaAu0Dk2MoNevsmA@mail.gmail.com
2020-05-15 08:19:30 +09:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Tom Lane 29c3e2dd5a Collect built-in LWLock tranche names statically, not dynamically.
There is little point in using the LWLockRegisterTranche mechanism for
built-in tranche names.  It wastes cycles, it creates opportunities for
bugs (since failing to register a tranche name is a very hard-to-detect
problem), and the lack of any centralized list of names encourages
sloppy nonconformity in name choices.  Moreover, since we have a
centralized list of the tranches anyway in enum BuiltinTrancheIds, we're
certainly not buying any flexibility in return for these disadvantages.

Hence, nuke all the backend-internal LWLockRegisterTranche calls,
and instead provide a const array of the builtin tranche names.

(I have in mind to change a bunch of these names shortly, but this
patch is just about getting them into one place.)

Discussion: https://postgr.es/m/9056.1589419765@sss.pgh.pa.us
2020-05-14 11:10:31 -04:00
Heikki Linnakangas e8abf585ab Move check for fsync=off so that pendingOps still gets cleared.
Commit 3eb77eba5a moved the loop and refactored it, and inadvertently
changed the effect of fsync=off so that it also skipped removing entries
from the pendingOps table. That was not intentional, and leads to an
assertion failure if you turn fsync on while the server is running and
reload the config.

Backpatch-through: 12-
Reviewed-By: Thomas Munro
Discussion: https://www.postgresql.org/message-id/3cbc7f4b-a5fa-56e9-9591-c886deb07513%40iki.fi
2020-05-14 08:39:26 +03:00
Amit Kapila a169155453 Fix the MSVC build for versions 2015 and later.
Visual Studio 2015 and later versions should still be able to do the same
as Visual Studio 2012, but the declaration of locale_name is missing in
_locale_t, causing the code compilation to fail, hence this falls back
instead on to enumerating all system locales by using EnumSystemLocalesEx
to find the required locale name.  If the input argument is in Unix-style
then we can get ISO Locale name directly by using GetLocaleInfoEx() with
LCType as LOCALE_SNAME.

In passing, change the documentation references of the now obsolete links.

Note that this problem occurs only with NLS enabled builds.

Author: Juan José Santamaría Flecha, Davinder Singh and Amit Kapila
Reviewed-by: Ranier Vilela and Amit Kapila
Backpatch-through: 9.5
Discussion: https://postgr.es/m/CAHzhFSFoJEWezR96um4-rg5W6m2Rj9Ud2CNZvV4NWc9tXV7aXQ@mail.gmail.com
2020-05-14 09:24:33 +05:30
Tom Lane 7fd89f4d7a Fix async.c to not register any SLRU stats counts in the postmaster.
Previously, AsyncShmemInit forcibly initialized the first page of the
async SLRU, to save dealing with that case in asyncQueueAddEntries.
But this is a poor tradeoff, since many installations do not ever use
NOTIFY; for them, expending those cycles in AsyncShmemInit is a
complete waste.  Besides, this only saves a couple of instructions
in asyncQueueAddEntries, which hardly seems likely to be measurable.

The real reason to change this now, though, is that now that we track
SLRU access stats, the existing code is causing the postmaster to
accumulate some access counts, which then get inherited into child
processes by fork(), messing up the statistics.  Delaying the
initialization into the first child that does a NOTIFY fixes that.

Hence, we can revert f3d23d83e, which was an incorrect attempt at
fixing that issue.  Also, add an Assert to pgstat.c that should
catch any future errors of the same sort.

Discussion: https://postgr.es/m/8367.1589391884@sss.pgh.pa.us
2020-05-13 22:48:26 -04:00
Alvaro Herrera 17cc133f01
Dial back -Wimplicit-fallthrough to level 3
The additional pain from level 4 is excessive for the gain.

Also revert all the source annotation changes to their original
wordings, to avoid back-patching pain.

Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
2020-05-13 15:31:14 -04:00
Tom Lane 81ca868630 Improve management of SLRU statistics collection.
Instead of re-identifying which statistics bucket to use for a given
SLRU on every counter increment, do it once during shmem initialization.
This saves a fair number of cycles, and there's no real cost because
we could not have a bucket assignment that varies over time or across
backends anyway.

Also, get rid of the ill-considered decision to let pgstat.c pry
directly into SLRU's shared state; it's cleaner just to have slru.c
pass the stats bucket number.

In consequence of these changes, there's no longer any need to store
an SLRU's LWLock tranche info in shared memory, so get rid of that,
making this a net reduction in shmem consumption.  (That partly
reverts fe702a7b3.)

This is basically code review for 28cac71bd, so I also cleaned up
some comments, removed a dangling extern declaration, fixed some
things that should be static and/or const, etc.

Discussion: https://postgr.es/m/3618.1589313035@sss.pgh.pa.us
2020-05-13 13:08:23 -04:00
Alvaro Herrera 850196b610
Adjust walsender usage of xlogreader, simplify APIs
* Have both physical and logical walsender share a 'xlogreader' state
  struct for tracking state.  This replaces the existing globals sendSeg
  and sendCxt.

* Change WALRead not to receive XLogReaderState->seg and ->segcxt as
  separate arguments anymore; just use the ones from 'state'.  This is
  made possible by the above change.

* have the XLogReader segment_open contract require the callbacks to
  install the file descriptor in the state struct themselves instead of
  returning it.  xlogreader was already ignoring any possible failed
  return from the callbacks, relying solely on them never returning.

  (This point is not altogether excellent, as it means the callbacks
  have to know more of XLogReaderState; but to really improve on that
  we would have to pass back error info from the callbacks to
  xlogreader.  And the complexity would not be saved but instead just
  transferred to the callers of WALRead, which would have to learn how
  to throw errors from the open_segment callback in addition of, as
  currently, from pg_pread.)

* segment_open no longer receives the 'segcxt' as a separate argument,
  since it's part of the XLogReaderState argument.

Per comments from Kyotaro Horiguchi.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200511203336.GA9913@alvherre.pgsql
2020-05-13 12:17:08 -04:00
Fujii Masao 043e3e0401 Use proper GetDatum function in pg_stat_get_slru().
This commit changes pg_stat_get_slru() so that it uses
TimestampTzGetDatum() for stats_reset field because that field
stores the timestamp with time zone value. Previously
Int64GetDatum() was used.

Author: Fujii Masao
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/b8784fe6-1401-ab35-aa14-d57b5bb8e312@oss.nttdata.com
2020-05-13 22:20:37 +09:00
Fujii Masao f3d23d83ef Initialize SLRU stats entries to zero.
Previously since SLRUStats was not initialized, SLRU stats counters
could begin with non-zero value. Which could lead to incorrect results
in pg_stat_slru view.

Author: Fujii Masao
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/976bbb73-a112-de3c-c488-b34b64609793@oss.nttdata.com
2020-05-13 22:19:25 +09:00
Alvaro Herrera 3e9744465d
Add -Wimplicit-fallthrough to CFLAGS and CXXFLAGS
Use it at level 4, a bit more restrictive than the default level, and
tweak our commanding comments to FALLTHROUGH.

(However, leave zic.c alone, since it's external code; to avoid the
warnings that would appear there, change CFLAGS for that file in the
Makefile.)

Author: Julien Rouhaud <rjuju123@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol
Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org
2020-05-12 16:07:30 -04:00
Tomas Vondra 6a918c3ac8 Rework EXPLAIN format for incremental sort
The explain format used by incremental sort was somewhat inconsistent
with other nodes, making it harder to parse and understand. This commit
addresses that by

 - adding an extra space to better separate groups of values

 - using colons instead of equal signs to separate key/value

 - properly capitalizing first letter of a key

 - using separate lines for full and pre-sorted groups

These changes were proposed by Justin Pryzby and mostly copy the final
explain format used to report WAL usage.

Author: Justin Pryzby
Reviewed-by: James Coleman
Discussion: https://postgr.es/m/20200419023625.GP26953@telsasoft.com
2020-05-12 20:04:39 +02:00
Tomas Vondra 1a40d37a9f Fix typos and improve incremental sort comments
Author: Justin Pryzby, James Coleman
Discussion: https://postgr.es/m/20200419023625.GP26953@telsasoft.com
2020-05-12 19:37:13 +02:00
Etsuro Fujita 2793bbe75e Remove unnecessary #include.
My oversight in commit c8434d64c.
2020-05-12 19:55:55 +09:00
Michael Paquier 078c9cd258 Fix comment in xlogutils.c
The existing callers of XLogReadDetermineTimeline() performing recovery
need to check a replay LSN position when determining on which timeline
to read a WAL page.  A portion of the comment describing this function
said exactly that, while referring to a routine for fetching a write
LSN, something not available in recovery.

Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20200511.101619.2043820539323292957.horikyota.ntt@gmail.com
2020-05-12 14:43:57 +09:00
Peter Geoghegan 624686abcf Adjust "root of to-be-deleted subtree" function.
Restructure the function that locates the root of the to-be-deleted
subtree during nbtree page deletion.  Handle the conditions that make
page deletion unsafe in a slightly more uniform way, and acknowledge the
fact that the behavior with incomplete splits on internal pages is
different (as pointed out in the nbtree README as of commit 35bc0ec7).
Also invent new terminology that avoids ambiguity around which pages are
about to be deleted.  Consistently use the term "to-be-deleted subtree",
not the ambiguous term "branch".

We were calling the subtree parent page the "top parent page", but that
was quite misleading.  The top parent page usually refers to a page
unlinked from its siblings and marked deleted (during the second stage
of page deletion).  There was one kind of top parent page that we merely
removed a downlink from, and another kind of top parent page that we
actually marked deleted.  Eliminate the ambiguity by inventing a new
term ("subtree parent page") that refers to the former kind of page
only.
2020-05-11 11:01:07 -07:00
Alvaro Herrera a8be5364ac
Fix obsolete references to "XLogRead"
The one in xlogreader.h was pointed out by Antonin Houska; I (Álvaro) noticed the
others by grepping.

Author: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/28250.1589186654@antos
2020-05-11 12:46:41 -04:00
Peter Eisentraut 7a9c9ce641 Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 80d8f54b3c5533ec036404bd3c3b24ff4825d037
2020-05-11 13:14:32 +02:00
Michael Paquier e111c9f90a Remove smgrdounlink() in smgr.c from the code tree
The last caller of this routine was removed in b416691, and as a wise
man said one day, dead code tends to silently break.

Per discussion between Fujii Masao, Peter Geoghegan, Vignesh C and me.

Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wz=sg5H8-vG4d5UmAofdcRMpeTDt2K-NUWp4GSfhenRGAQ@mail.gmail.com
2020-05-10 10:58:54 +09:00
Tomas Vondra ebeb3dea77 Simplify show_incremental_sort_info a bit
Incremental sort always processes at least one full group group before
switching to prefix groups, so it's enough to check just the number of
full groups. There was no risk of division by zero due to the extra
condition, but it made the code harder to understand.

Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAp+7qoS92-4V1vLChpdY3vEkLCbf+gye6P-4cirE-0z0A@mail.gmail.com
2020-05-09 19:41:42 +02:00
Tomas Vondra 9155b4be9a Do no reset bounded before incremental sort rescan
ExecReScanIncrementalSort was resetting bounded=false, which means the
optimization would be disabled on all rescans. This happens because
ExecSetTupleBound is called before the rescan, not after it.

Author: James Coleman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20200414065336.GI1492@paquier.xyz
2020-05-09 19:41:36 +02:00
Tomas Vondra c442722648 Fix handling of REWIND/MARK/BACKWARD in incremental sort
The executor flags were not handled entirely correctly, although the
bugs were mostly harmless and it was mostly comment inaccuracy. We don't
need to strip any of the flags for child nodes.

Incremental sort does not support backward scans of mark/restore, so
MARK/BACKWARDS flags should not be possible. So we simply ensure this
using an assert, and we don't bother removing them when initializing
the child node.

With REWIND it's a bit less clear - incremental sort does not support
REWIND, but there is no way to signal this - it's legal to just ignore
the flag. We however continue passing the flag to child nodes, because
they might be useful to leverage that.

Reported-by: Michael Paquier
Author: James Coleman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20200414065336.GI1492@paquier.xyz
2020-05-09 19:41:18 +02:00
Alvaro Herrera b060dbe000
Rework XLogReader callback system
Code review for 0dc8ead463, prompted by a bug closed by 91c40548d5.

XLogReader's system for opening and closing segments had gotten too
complicated, with callbacks being passed at both the XLogReaderAllocate
level (read_page) as well as at the WALRead level (segment_open).  This
was confusing and hard to follow, so restructure things so that these
callbacks are passed together at XLogReaderAllocate time, and add
another callback to the set (segment_close) to make it a coherent whole.
Also, ensure XLogReaderState is an argument to all the callbacks, so
that they can grab at the ->private data if necessary.

Document the whole arrangement more clearly.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20200422175754.GA19858@alvherre.pgsql
2020-05-08 15:40:11 -04:00
Peter Eisentraut 086ffddf36 Fix several DDL issues of generated columns versus inheritance
Several combinations of generated columns and inheritance in CREATE
TABLE were not handled correctly.  Specifically:

- Disallow a child column specifying a generation expression if the
  parent column is a generated column.  The child column definition
  must be unadorned and the parent column's generation expression will
  be copied.

- Prohibit a child column of a generated parent column specifying
  default values or identity.

- Allow a child column of a not-generated parent column specifying
  itself as a generated column.  This previously did not work, but it
  was possible to arrive at the state via other means (involving ALTER
  TABLE), so it seems sensible to support it.

Add tests for each case.  Also add documentation about the rules
involving generated columns and inheritance.

Discussion:
    https://www.postgresql.org/message-id/flat/15830.1575468847%40sss.pgh.pa.us
    https://www.postgresql.org/message-id/flat/2678bad1-048f-519a-ef24-b12962f41807%40enterprisedb.com
    https://www.postgresql.org/message-id/flat/CAJvUf_u4h0DxkCMCeEKAWCuzGUTnDP-G5iVmSwxLQSXn0_FWNQ%40mail.gmail.com
2020-05-08 11:31:57 +02:00
Peter Eisentraut 501e41dd3c Propagate ALTER TABLE ... SET STORAGE to indexes
When creating a new index, the attstorage setting of the table column
is copied to regular (non-expression) index columns.  But a later
ALTER TABLE ... SET STORAGE is not propagated to indexes, thus
creating an inconsistent and undumpable state.

Discussion: https://www.postgresql.org/message-id/flat/9765d72b-37c0-06f5-e349-2a580aafd989%402ndquadrant.com
2020-05-08 08:39:17 +02:00
Fujii Masao f2ff203596 Report missing wait event for timeline history file.
TimelineHistoryRead and TimelineHistoryWrite wait events are reported
during waiting for a read and write of a timeline history file, respectively.
However, previously, TimelineHistoryRead wait event was not reported
while readTimeLineHistory() was reading a timeline history file. Also
TimelineHistoryWrite was not reported while writeTimeLineHistory() was
writing one line with the details of the timeline split, at the end.
This commit fixes these issues.

Back-patch to v10 where wait events for a timeline history file was added.

Author: Masahiro Ikeda
Reviewed-by: Michael Paquier, Fujii Masao
Discussion: https://postgr.es/m/d11b0c910b63684424e06772eb844ab5@oss.nttdata.com
2020-05-08 10:36:40 +09:00
Peter Geoghegan cd8c73a38a Refactor nbtree deletion INCOMPLETE_SPLIT check.
Factor out code common to _bt_lock_branch_parent() and _bt_pagedel()
into a new utility function.  This new function is used to check that
the left sibling of a deletion target page does not have the
INCOMPLETE_SPLIT page flag set.  If it is set then deletion is unsafe;
there won't be a usable pivot tuple (with a downlink) in the parent page
that points to the deletion target page.  The page deletion algorithm is
not prepared to deal with that.  Also restructure an existing, related
utility function that checks if the right sibling of the target page has
the ISHALFDEAD page flag set.

This organization highlights the symmetry between the two cases.  The
goal is to make the design of page deletion clearer.  Both functions
involve a sibling page with a flag that indicates that there was an
interrupted operation (a page split or a page deletion) that resulted in
a page pointed to by sibling pages, but not pointed to in the parent.
And, both functions indicate if page deletion is unsafe due to the
absence of a particular downlink in the parent page.
2020-05-07 16:08:54 -07:00
Tom Lane db89f0e3a4 Fix YA text phrase search bug.
checkcondition_str() failed to report multiple matches for a prefix
pattern correctly: it would dutifully merge the match positions, but
then after exiting that loop, if the last prefix-matching word had
had no suitable positions, it would report there were no matches.
The upshot would be failing to recognize a match that the query
should match.

It looks like you need all of these conditions to see the bug:
* a phrase search (else we don't ask for match position details)
* a prefix search item (else we don't get to this code)
* a weight restriction (else checkclass_str won't fail)

Noted while investigating a problem report from Pavel Borisov,
though this is distinct from the issue he was on about.

Back-patch to 9.6 where phrase search was added.
2020-05-07 15:59:51 -04:00
Alvaro Herrera 5be594caf8
Heed lock protocol in DROP OWNED BY
We were acquiring object locks then deleting objects one by one, instead
of acquiring all object locks first, ignoring those that did not exist,
and then deleting all objects together.   The latter is the correct
protocol to use, and what this commits changes to code to do.  Failing
to follow that leads to "cache lookup failed for relation XYZ" error
reports when DROP OWNED runs concurrently with other DDL -- for example,
a session termination that removes some temp tables.

Author: Álvaro Herrera
Reported-by: Mithun Chicklore Yogendra (Mithun CY)
Reviewed-by: Ahsan Hadi, Tom Lane
Discussion: https://postgr.es/m/CADq3xVZTbzK4ZLKq+dn_vB4QafXXbmMgDP3trY-GuLnib2Ai1w@mail.gmail.com
2020-05-06 12:29:41 -04:00
Tom Lane 46da7bf671 Fix severe memory leaks in GSSAPI encryption support.
Both the backend and libpq leaked buffers containing encrypted data
to be transmitted, so that the process size would grow roughly as
the total amount of data sent.

There were also far-less-critical leaks of the same sort in GSSAPI
session establishment.

Oversight in commit b0b39f72b, which I failed to notice while
reviewing the code in 2c0cdc818.

Per complaint from pmc@citylink.
Back-patch to v12 where this code was introduced.

Discussion: https://postgr.es/m/20200504115649.GA77072@gate.oper.dinoex.org
2020-05-05 13:10:17 -04:00
Amit Kapila 69bfaf2e1d Change the display of WAL usage statistics in Explain.
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain.  The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field.  Change the format to make WAL usage
statistics consistent with Buffer usage statistics.

This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.

Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-05-05 08:00:53 +05:30
Alexander Korotkov 9f87ae38ea Fix typo in comment
Reported-by: Oleg Bartunov
2020-05-03 12:19:31 +03:00
Peter Geoghegan 9dc7251417 Refactor btvacuumpage().
Remove one of the arguments to btvacuumpage(), and give up on the idea
that it's a recursive function.  We now use the term "backtracking" to
refer to the case where an earlier block must be visited to make sure no
tuples that need to be removed were missed.

Advertising btvacuumpage() as a recursive function was unhelpful.  In
reality the function always simulates recursion with a loop (it doesn't
actually call itself).  This wasn't just necessary as a precaution (per
the comments mentioning tail recursion), though.  There is no reliable
natural limit on the number of times we can backtrack.

There are important behavioral difference when "recursing"/backtracking,
mostly related to page deletion.  We don't perform page deletion when
backtracking due to the extra complexity.  And when we recurse, we're
not performing a physical order scan anymore, so we expect fairly
different conditions to hold for the page.  Structuring the code like
this makes it clearer how _bt_pagedel() cooperates with btvacuumpage()
and btvacuumscan() (as established in commit b0229f26 and commit
73a076b0).

Author: Peter Geoghegan
Reviewed-By: Masahiko Sawada
Discussion: https://postgr.es/m/CAH2-WzmRGMDWiLMcb+zagG9652PboNN4Gfcq1Gc_wJL6A716MA@mail.gmail.com
2020-05-02 14:04:33 -07:00
Stephen Frost b68a560f8e Fix GSS client to non-GSS server connection
If the client is compiled with GSSAPI support and tries to start up GSS
with the server, but the server is not compiled with GSSAPI support, we
would mistakenly end up falling through to call ProcessStartupPacket
with secure_done = true, but the client might then try to perform SSL,
which the backend wouldn't understand and we'd end up failing the
connection with:

FATAL:  unsupported frontend protocol 1234.5679: server supports 2.0 to 3.0

Fix by arranging to track ssl_done independently from gss_done, instead
of trying to use the same boolean for both.

Author: Andrew Gierth
Discussion: https://postgr.es/m/87h82kzwqn.fsf@news-spur.riddles.org.uk
Backpatch: 12-, where GSSAPI encryption was added.
2020-05-02 11:39:26 -04:00
Tomas Vondra d5d09692ea Remove superfluous memset from pgstat_recv_resetslrucounter
The extra memset meant pg_stat_reset_slru() always reset all the entries
even when reset of a single entry was requested, but the timestamp was
left uninitialized.

Reported-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/CACZ0uYFe16pjZxQYaTn53mspyM7dgMPYL3DJLjjPw69GMCC2Ow%40mail.gmail.com
2020-05-02 15:30:10 +02:00
Tomas Vondra 60fbb4d762 Simplify cost_incremental_sort a bit
Commit de0dc1a847 added code to cost_incremental_sort to handle varno 0.
Explicitly removing the RelabelType is not really necessary, because the
pull_varnos handles that just fine, which simplifies the code a bit.

Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_3_D2J5XxOuw68hvn0-gJsw9FXNSGcZka9aTymn9UJ8A%40mail.gmail.com
Discussion: https://postgr.es/m/20200411214639.GK2228%40telsasoft.com
2020-05-02 01:33:51 +02:00
Tomas Vondra 2e08d314ed Remove pg_xact entry from SLRU stats
The "pg_xact" entry was duplicate with "clog" and was added by mistake.

Reported-by: Fujii Masao
Discussion: https://postgr.es/m/20200119143707.gyinppnigokesjok@development
2020-05-02 00:36:25 +02:00
Tom Lane 0da06d9faf Get rid of trailing semicolons in C macro definitions.
Writing a trailing semicolon in a macro is almost never the right thing,
because you almost always want to write a semicolon after each macro
call instead.  (Even if there was some reason to prefer not to, pgindent
would probably make a hash of code formatted that way; so within PG the
rule should basically be "don't do it".)  Thus, if we have a semi inside
the macro, the compiler sees "something;;".  Much of the time the extra
empty statement is harmless, but it could lead to mysterious syntax
errors at call sites.  In perhaps an overabundance of neatnik-ism, let's
run around and get rid of the excess semicolons whereever possible.

The only thing worse than a mysterious syntax error is a mysterious
syntax error that only happens in the back branches; therefore,
backpatch these changes where relevant, which is most of them because
most of these mistakes are old.  (The lack of reported problems shows
that this is largely a hypothetical issue, but still, it could bite
us in some future patch.)

John Naylor and Tom Lane

Discussion: https://postgr.es/m/CACPNZCs0qWTqJ2QUSGJ07B7uvAvzMb-KbG2q+oo+J3tsWN5cqw@mail.gmail.com
2020-05-01 17:28:00 -04:00
Peter Geoghegan 69cf853fe7 Clear up issue with FSM and oldest bpto.xact.
On further reflection, code comments added by commit b0229f26 slightly
misrepresented how we determine the oldest bpto.xact for the index.
btvacuumpage() does not treat the bpto.xact of a page that it put in the
FSM as a candidate to be the oldest deleted page (the delete-marked page
that has the oldest bpto.xact XID among all pages encountered).

The definition of a deleted page for the purposes of the bpto.xact
calculation is different from the definition used by the bulk delete
statistics.  The bulk delete statistics don't distinguish between pages
that were deleted by the current VACUUM, pages deleted by a previous
VACUUM operation but not yet recyclable/reusable, and pages that are
reusable (though reusable pages are counted separately).

Backpatch: 11-, just like commit b0229f26.
2020-05-01 12:19:44 -07:00
Peter Geoghegan 4e21f8b633 Reorder function prototypes for consistency. 2020-05-01 10:03:38 -07:00
Peter Geoghegan 73a076b03f Fix undercounting in VACUUM VERBOSE output.
The logic for determining how many nbtree pages in an index are deleted
pages sometimes undercounted pages.  Pages that were deleted by the
current VACUUM operation (as opposed to some previous VACUUM operation
whose deleted pages have yet to be reused) were sometimes overlooked.
The final count is exposed to users through VACUUM VERBOSE's "%u index
pages have been deleted" output.

btvacuumpage() avoided double-counting when _bt_pagedel() deleted more
than one page by assuming that only one page was deleted, and that the
additional deleted pages would get picked up during a future call to
btvacuumpage() by the same VACUUM operation.  _bt_pagedel() can
legitimately delete pages that the btvacuumscan() scan will not visit
again, though, so that assumption was slightly faulty.

Fix the accounting by teaching _bt_pagedel() about its caller's
requirements.  It now only reports on pages that it knows btvacuumscan()
won't visit again (including the current btvacuumpage() page), so
everything works out in the end.

This bug has been around forever.  Only backpatch to v11, though, to
keep _bt_pagedel() is sync on the branches that have today's bugfix
commit b0229f26da.  Note that this commit changes the signature of
_bt_pagedel(), just like commit b0229f26da.

Author: Peter Geoghegan
Reviewed-By: Masahiko Sawada
Discussion: https://postgr.es/m/CAH2-WzkrXBcMQWAYUJMFTTvzx_r4q=pYSjDe07JnUXhe+OZnJA@mail.gmail.com
Backpatch: 11-
2020-05-01 09:51:09 -07:00
Peter Geoghegan b0229f26da Fix bug in nbtree VACUUM "skip full scan" feature.
Commit 857f9c36cd (which taught nbtree VACUUM to skip a scan of the
index from btcleanup in situations where it doesn't seem worth it) made
VACUUM maintain the oldest btpo.xact among all deleted pages for the
index as a whole.  It failed to handle all the details surrounding pages
that are deleted by the current VACUUM operation correctly (though pages
deleted by some previous VACUUM operation were processed correctly).

The most immediate problem was that the special area of the page was
examined without a buffer pin at one point.  More fundamentally, the
handling failed to account for the full range of _bt_pagedel()
behaviors.  For example, _bt_pagedel() sometimes deletes internal pages
in passing, as part of deleting an entire subtree with btvacuumpage()
caller's page as the leaf level page.  The original leaf page passed to
_bt_pagedel() might not be the page that it deletes first in cases where
deletion can take place.

It's unclear how disruptive this bug may have been, or what symptoms
users might want to look out for.  The issue was spotted during
unrelated code review.

To fix, push down the logic for maintaining the oldest btpo.xact to
_bt_pagedel().  btvacuumpage() is now responsible for pages that were
fully deleted by a previous VACUUM operation, while _bt_pagedel() is now
responsible for pages that were deleted by the current VACUUM operation
(this includes half-dead pages from a previous interrupted VACUUM
operation that become fully deleted in _bt_pagedel()).  Note that
_bt_pagedel() should never encounter an existing deleted page.

This commit theoretically breaks the ABI of a stable release by changing
the signature of _bt_pagedel().  However, if any third party extension
is actually affected by this, then it must already be completely broken
(since there are numerous assumptions made in _bt_pagedel() that cannot
be met outside of VACUUM).  It seems highly unlikely that such an
extension actually exists, in any case.

Author: Peter Geoghegan
Reviewed-By: Masahiko Sawada
Discussion: https://postgr.es/m/CAH2-WzkrXBcMQWAYUJMFTTvzx_r4q=pYSjDe07JnUXhe+OZnJA@mail.gmail.com
Backpatch: 11-, where the "skip full scan" feature was introduced.
2020-05-01 08:39:52 -07:00
Peter Geoghegan dd1f645cc8 Fix AddressSanitizer use-after-scope complaint.
XLogRegisterBufData() does not copy data pointed to by caller's pointer
argument.

Oversight in commit 0d861bbb70.

Author: Peter Eisentraut
Reported-By: Peter Eisentraut
Discussion: https://postgr.es/m/21800dbe-a13e-22f7-d423-b81db9d249f5@2ndquadrant.com
2020-04-30 12:31:56 -07:00
Peter Eisentraut eb892102e0 Make SQL/JSON error code names match SQL standard
see also a00c53b0cb
2020-04-30 09:34:54 +02:00
Peter Geoghegan ab2343d4cb Remove redundant _bt_killitems() buffer check.
_bt_getbuf() cannot return an invalid buffer.

Oversight in commit 2ed5b87f96.
2020-04-29 18:17:49 -07:00
Michael Paquier e30b0b5cfa Fix check for conflicting SSL min/max protocol settings
Commit 79dfa8a has introduced a check to catch when the minimum protocol
version was set higher than the maximum version, however an error was
getting generated when both bounds are set even if they are able to
work, causing a backend to not use a new SSL context but keep the old
one.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/14BFD060-8C9D-43B4-897D-D5D9AA6FC92B@yesql.se
2020-04-30 08:14:02 +09:00
Alvaro Herrera 1816a1c6ff
Fix checkpoint signalling
Checkpointer uses its MyLatch to wake up when a checkpoint request is
received.  But before commit c655077639 the latch was not used for
anything else, so the code could just go to sleep after each loop
without rechecking the sleeping condition.  That commit added a separate
ResetLatch in its code path[1], which can cause a checkpoint to go
unnoticed for potentially a long time.

Fix by skipping sleep if any checkpoint flags are set.  Also add a test
to verify this; authored by Kyotaro Horiguchi.

[1] CreateCheckPoint -> InvalidateObsoleteReplicationSlots ->
ConditionVariableTimeSleep

Report and diagnosis by Kyotaro Horiguchi.
Co-authored-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200408.141956.891237856186513376.horikyota.ntt@gmail.com
2020-04-29 18:46:42 -04:00
Alvaro Herrera d0abe78d84
Check slot->restart_lsn validity in a few more places
Lack of these checks could cause visible misbehavior, including
assertion failures.  This was missed in commit c655077639, whereby
restart_lsn becomes invalid when the size limit is exceeded.

Also reword some existing error messages, and add errdetail(), so that
the reported errors all match in spirit.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200408.093710.447591748588426656.horikyota.ntt@gmail.com
2020-04-28 20:39:04 -04:00
Peter Eisentraut 6baa17fbd1 Add missing gettext triggers
Some translatable strings have been moved to scanner_yyerror(), so we
need to add that, too.
2020-04-28 13:35:40 +02:00
Alexander Korotkov ef11051bbe Fix definition of pg_statio_all_tables view
pg_statio_all_tables view appears to have a wrong grouping.  As the result
numbers of toast index blocks read and hit were multiplied to the number of
table indexes.  This commit fixes the view definition.

Backpatching this appears difficult.  We don't have a mechanism to patch a
system catalog of existing instances in minor upgrade.  We can write a
release notes instruction to do this manually.  But per discussion this is
probably not so critical bug for doing such an intrusive fix.

Reported-by: Andrei Zubkov
Discussion: https://postgr.es/m/CAPpHfdtMYkkNudLMG9G0dxX_B%3Dn5sfKzOyxxrvWYtSicaGW0Lw%40mail.gmail.com
2020-04-28 11:30:33 +03:00
Tom Lane e81e5741a6 Fix full text search to handle NOT above a phrase search correctly.
Queries such as '!(foo<->bar)' failed to find matching rows when
implemented as a GiST or GIN index search.  That's because of
failing to handle phrase searches as tri-valued when considering
a query without any position information for the target tsvector.
We can only say that the phrase operator might match, not that it
does match; and therefore its NOT also might match.  The previous
coding incorrectly inverted the approximate phrase result to
decide that there was certainly no match.

To fix, we need to make TS_phrase_execute return a real ternary result,
and then bubble that up accurately in TS_execute.  As long as we have
to do that anyway, we can simplify the baroque things TS_phrase_execute
was doing internally to manage tri-valued searching with only a bool
as explicit result.

For now, I left the externally-visible result of TS_execute as a plain
bool.  There do not appear to be any outside callers that need to
distinguish a three-way result, given that they passed in a flag
saying what to do in the absence of position data.  This might need
to change someday, but we wouldn't want to back-patch such a change.

Although tsginidx.c has its own TS_execute_ternary implementation for
use at upper index levels, that sadly managed to get this case wrong
as well :-(.  Fixing it is a lot easier fortunately.

Per bug #16388 from Charles Offenbacher.  Back-patch to 9.6 where
phrase search was introduced.

Discussion: https://postgr.es/m/16388-98cffba38d0b7e6e@postgresql.org
2020-04-27 12:21:04 -04:00
Michael Paquier 641b76d9d1 Fix some typos
Author: Justin Pryzby
Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
2020-04-27 14:59:36 +09:00
Peter Eisentraut f057980149 Fix typo
from 303640199d
2020-04-26 13:48:33 +02:00
Peter Geoghegan 7154aa16a6 Fix another minor page deletion buffer lock issue.
Avoid accessing the leaf page's top parent tuple without a buffer lock
held during the second phase of nbtree page deletion.  The old approach
was safe, though only because VACUUM never drops its buffer pin (and
because only VACUUM itself can modify a half-dead page).  Even still, it
seems like a good idea to be strict here.  Tighten things up by copying
the top parent page's block number to a local variable before releasing
the buffer lock on the leaf page -- not after.

This is a follow-up to commit fa7ff642, which fixed a similar issue in
the first phase of nbtree page deletion.

Update some related comments in passing.

Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
2020-04-25 16:45:20 -07:00
Peter Geoghegan fa7ff642c2 Fix minor nbtree page deletion buffer lock issue.
Avoid accessing the deletion target page's special area during nbtree
page deletion at a point where there is no buffer lock held.  This issue
was detected by a patch that extends Valgrind's memcheck tool to mark
nbtree pages that are unsafe to access (due to not having a buffer lock
or buffer pin) as NOACCESS.

We do hold a buffer pin at this point, and only access the special area,
so the old approach was safe.  Even still, it seems like a good idea to
tighten up the rules in this area.  There is no reason to not simply
insist on always holding a buffer lock (not just a pin) when accessing
nbtree pages.

Update some related comments in passing.

Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
2020-04-25 14:17:02 -07:00
Noah Misch f246ea3b2a In caught-up logical walsender, sleep only in WalSndWaitForWal().
Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write
< sentPtr.  When the latest physical LSN yields no logical replication
messages (a common case), that keepalive elicits a reply.  Processing
the reply updates pg_stat_replication.replay_lsn.  WalSndLoop() lacks
that; when WalSndLoop() slept, replay_lsn advancement could stall until
wal_receiver_status_interval elapsed.  This sometimes stalled
src/test/subscription/t/001_rep_changes.pl for up to 10s.

Reviewed by Fujii Masao and Michael Paquier.

Discussion: https://postgr.es/m/20200418070142.GA1075445@rfd.leadboat.com
2020-04-25 10:18:12 -07:00
Noah Misch 72a3dc321d Revert "When WalSndCaughtUp, sleep only in WalSndWaitForWal()."
This reverts commit 4216858122.  It caused
idle physical walsenders to busy-wait, as reported by Fujii Masao.

Discussion: https://postgr.es/m/20200417054146.GA1061007@rfd.leadboat.com
2020-04-25 10:17:26 -07:00
Andrew Gierth d9a4cce29d Fix error case for CREATE ROLE ... IN ROLE.
CreateRole() was passing a Value node, not a RoleSpec node, for the
newly-created role name when adding the role as a member of existing
roles for the IN ROLE syntax.

This mistake went unnoticed because the node in question is used only
for error messages and is not accessed on non-error paths.

In older pg versions (such as 9.5 where this was found), this results
in an "unexpected node type" error in place of the real error. That
node type check was removed at some point, after which the code would
accidentally fail to fail on 64-bit platforms (on which accessing the
Value node as if it were a RoleSpec would be mostly harmless) or give
an "unexpected role type" error on 32-bit platforms.

Fix the code to pass the correct node type, and add an lfirst_node
assertion just in case.

Per report on irc from user m1chelangelo.

Backpatch all the way, because this error has been around for a long
time.
2020-04-25 05:09:30 +01:00
Tom Lane baf17ad9df Repair performance regression in information_schema.triggers view.
Commit 32ff26911 introduced use of rank() into the triggers view to
calculate the spec-mandated action_order column.  As written, this
prevents query constraints on the table-name column from being pushed
below the window aggregate step.  That's bad for performance of this
typical usage pattern, since the view now has to be evaluated for all
tables not just the one(s) the user wants to see.  It's also the cause
of some recent buildfarm failures, in which trying to evaluate the view
rows for triggers in process of being dropped resulted in "cache lookup
failed for function NNN" errors.  Those rows aren't of interest to the
test script doing the query, but the filter that would eliminate them
is being applied too late.  None of this happened before the rank()
call was there, so it's a regression compared to v10 and before.

We can improve matters by changing the rank() call so that instead of
partitioning by OIDs, it partitions by nspname and relname, casting
those to sql_identifier so that they match the respective view output
columns exactly.  The planner has enough intelligence to know that
constraints on partitioning columns are safe to push down, so this
eliminates the performance problem and the regression test failure
risk.  We could make the other partitioning columns match view outputs
as well, but it'd be more complicated and the performance benefits
are questionable.

Side note: as this stands, the planner will push down constraints on
event_object_table and trigger_schema, but not on event_object_schema,
because it checks for ressortgroupref matches not expression
equivalence.  That might be worth improving someday, but it's not
necessary to fix the immediate concern.

Back-patch to v11 where the rank() call was added.  Ordinarily we'd not
change information_schema in released branches, but the test failure has
been seen in v12 and presumably could happen in v11 as well, so we need
to do this to keep the buildfarm happy.  The change is harmless so far
as users are concerned.  Some might wish to apply it to existing
installations if performance of this type of query is of concern,
but those who don't are no worse off.

I bumped catversion in HEAD as a pro forma matter (there's no
catalog incompatibility that would really require a re-initdb).
Obviously that can't be done in the back branches.

Discussion: https://postgr.es/m/5891.1587594470@sss.pgh.pa.us
2020-04-24 12:02:36 -04:00
Michael Paquier 4e87c4836a Fix handling of WAL segments ready to be archived during crash recovery
78ea8b5 has fixed an issue related to the recycling of WAL segments on
standbys depending on archive_mode.  However, it has introduced a
regression with the handling of WAL segments ready to be archived during
crash recovery, causing those files to be recycled without getting
archived.

This commit fixes the regression by tracking in shared memory if a live
cluster is either in crash recovery or archive recovery as the handling
of WAL segments ready to be archived is different in both cases (those
WAL segments should not be removed during crash recovery), and by using
this new shared memory state to decide if a segment can be recycled or
not.  Previously, it was not possible to know if a cluster was in crash
recovery or archive recovery as the shared state was able to track only
if recovery was happening or not, leading to the problem.

A set of TAP tests is added to close the gap here, making sure that WAL
segments ready to be archived are correctly handled when a cluster is in
archive or crash recovery with archive_mode set to "on" or "always", for
both standby and primary.

Reported-by: Benoît Lobréau
Author: Jehan-Guillaume de Rorthais
Reviewed-by: Kyotaro Horiguchi, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/20200331172229.40ee00dc@firost
Backpatch-through: 9.5
2020-04-24 08:48:28 +09:00
Tom Lane 3436c5e283 Remove ACLDEBUG #define and associated code.
In the footsteps of aaf069aa3, remove ACLDEBUG, which was the only
other remaining undocumented symbol in pg_config_manual.h.  The fact
that nobody had bothered to document it in seventeen years is a good
clue to its usefulness.  In practice, none of the tracing logic it
enabled would be of any value without additional effort.

Discussion: https://postgr.es/m/6631.1587565046@sss.pgh.pa.us
2020-04-23 15:38:04 -04:00
Tom Lane ee88ef55db Remove useless (and broken) logging logic in memory context functions.
Nobody really uses this stuff, especially not since we created
valgrind-based infrastructure that does the same thing better.
It is thus unsurprising that the generation.c and slab.c versions
were actually broken.  Rather than fix 'em, let's just remove 'em.

Alexander Lakhin

Discussion: https://postgr.es/m/8936216c-3492-3f6e-634b-d638fddc5f91@gmail.com
2020-04-23 15:27:37 -04:00
Robert Haas 3989dbdf12 Rename exposed identifiers to say "backup manifest".
Function names declared "extern" now use BackupManifest in the name
rather than just Manifest, and data types use backup_manifest
rather than just manifest.

Per note from Michael Paquier.

Discussion: http://postgr.es/m/20200418125713.GG350229@paquier.xyz
2020-04-23 08:44:06 -04:00
Andres Freund 299298bc87 Fix transient memory leak for SRFs in FROM.
In a9c35cf85c I changed ExecMakeTableFunctionResult() to dynamically
allocate the FunctionCallInfo used to call the SRF. Unfortunately I
did not account for the fact that the surrounding memory context has
query lifetime, leading to a leak till the end of the query.

In most cases the leak is fairly inconsequential, but if the
FunctionScan is done many times in the query, the leak can add
up. This happens e.g. if the function scan is on the inner side of a
nested loop, due to a lateral join.
EXPLAIN SELECT sum(f) FROM generate_series(1, 100000000) g(i), generate_series(i, i+1) f;
quickly shows the leak.

Instead of explicitly freeing the FunctionCallInfo it seems better to
make sure all the per-set temporary state in
ExecMakeTableFunctionResult() is cleaned up wholesale. Currently
that's probably just the FunctionCallInfo allocation, but since
there's some initialization work, and since there's already an
appropriate context, this seems like a more robust approach.

Bug: #16112
Reported-By: Ben Cornett
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/16112-4448bbf55a404189%40postgresql.org
Backpatch: 12, a9c35cf85c
2020-04-22 19:53:06 -07:00
Tomas Vondra de0dc1a847 Fix cost_incremental_sort for expressions with varno 0
When estimating the number of pre-sorted groups in cost_incremental_sort
we must not pass Vars with varno 0 to estimate_num_groups, which would
cause failues in find_base_rel. This may happen when sorting output of
set operations, thanks to generate_append_tlist.

Unlike recurse_set_operations we can't easily access the original target
list, so if we find any Vars with varno 0, we fall back to the default
estimate DEFAULT_NUM_DISTINCT.

Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20200411214639.GK2228%40telsasoft.com
2020-04-23 00:15:24 +02:00
David Rowley 9f2c4edec2 Remove bogus Assert in foreign key cloning code
This Assert was trying to ensure that the number of columns in the foreign
key being cloned was the same number of attributes in the parentRel.  Of
course, it's perfectly valid to have columns in the table which are not
part of the foreign key constraint. It appears that this Assert was
misunderstanding that.

Reported-by: Rajkumar Raghuwanshi
Reviewed-by: amul sul
Discussion: https://postgr.es/m/CAKcux6=z1dtiWw5BOpqDx-U6KTiq+zD0Y2m810zUtWL+giVXWA@mail.gmail.com
2020-04-22 22:12:19 +12:00
Peter Eisentraut aaf069aa34 Remove HEAPDEBUGALL
This has been broken since PostgreSQL 12 and was probably never really
used.  PostgreSQL 12 added an analogous HEAPAMSLOTDEBUGALL, which
still works right now, but it's also not very useful, so remove that
as well.

Discussion: https://www.postgresql.org/message-id/flat/645c0646-4218-d4c3-409a-a7003a0c108d%402ndquadrant.com
2020-04-22 08:35:33 +02:00
Tom Lane d12bdba77b Fix possible crash during FATAL exit from reindexing.
index.c supposed that it could just use a PG_TRY block to clean up the
state associated with an active REINDEX operation.  However, that code
doesn't run if we do a FATAL exit --- for example, due to a SIGTERM
shutdown signal --- while the REINDEX is happening.  And that state does
get consulted during catalog accesses, which makes it problematic if we
do any catalog accesses during shutdown --- for example, to clean up any
temp tables created in the session.

If this combination of circumstances occurred, we could find ourselves
trying to access already-freed memory.  In debug builds that'd fairly
reliably cause an assertion failure.  In production we might often
get away with it, but with some bad luck it could cause a core dump.

Another possible bad outcome is an erroneous conclusion that an
index-to-be-accessed is being reindexed; but it looks like that would
be unlikely to have any consequences worse than failing to drop temp
tables right away.  (They'd still get dropped by the next session that
uses that temp schema.)

To fix, get rid of the use of PG_TRY here, and instead hook into
the transaction abort mechanisms to clean up reindex state.

Per bug #16378 from Alexander Lakhin.  This has been wrong for a
very long time, so back-patch to all supported branches.

Discussion: https://postgr.es/m/16378-7a70ca41b3ec2009@postgresql.org
2020-04-21 15:58:42 -04:00
Tom Lane 5836d32655 Fix minor violations of FunctionCallInvoke usage protocol.
Working on commit 1c455078b led me to check through FunctionCallInvoke
call sites to see if every one was being honest about (a) making sure
that fcinfo.isnull is initially false, and (b) checking its state after
the call.  Sure enough, I found some violations.

The main one is that finalize_partialaggregate re-used serialfn_fcinfo
without resetting isnull, even though it clearly intends to cater for
serialfns that return NULL.  There would only be an issue with a
non-strict serialfn, since it's unlikely that a serialfn would return
NULL for non-null input.  We have no non-strict serialfns in core, and
there may be none in the wild either, which would account for the lack
of complaints.  Still, it's clearly wrong, so back-patch that fix to
9.6 where finalize_partialaggregate was introduced.

Also, arrayfuncs.c and rowtypes.c contained various callers that were
not bothering to check for result nulls.  While what's being called is
a comparison or hash function that probably *shouldn't* return null,
that's a lousy excuse for not having any check at all.  There are
existing places that just Assert(!fcinfo->isnull) in comparable
situations, so I added that to the places that were calling btree
comparison or hash support functions.  In the places calling
boolean-returning equality functions, it's quite cheap to have them
treat isnull as FALSE, so make those places do that.  Also remove some
"locfcinfo->isnull = false" assignments that are unnecessary given the
assumption that no previous call returned null.  These changes seem like
mostly neatnik-ism or debugging support, so I didn't back-patch.
2020-04-21 14:23:53 -04:00
Alvaro Herrera afccd76f1c
Fix detaching partitions with cloned row triggers
When a partition is detached, any triggers that had been cloned from its
parent were not properly disentangled from its parent triggers.
This resulted in triggers that could not be dropped because they
depended on the trigger in the trigger in the no-longer-parent table:
  ALTER TABLE t DETACH PARTITION t1;
  DROP TRIGGER trig ON t1;
    ERROR:  cannot drop trigger trig on table t1 because trigger trig on table t requires it
    HINT:  You can drop trigger trig on table t instead.

Moreover the table can no longer be re-attached to its parent, because
the trigger name is already taken:
  ALTER TABLE t ATTACH PARTITION t1 FOR VALUES FROM (1)TO(2);
    ERROR:  trigger "trig" for relation "t1" already exists

The former is a bug introduced in commit 86f575948c.  (The latter is
not necessarily a bug, but it makes the bug more uncomfortable.)

To avoid the complexity that would be needed to tell whether the trigger
has a local definition that has to be merged with the one coming from
the parent table, establish the behavior that the trigger is removed
when the table is detached.

Backpatch to pg11.

Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/20200408152412.GZ2228@telsasoft.com
2020-04-21 13:57:00 -04:00
Peter Geoghegan 1542e16f2c Consider outliers in split interval calculation.
Commit 0d861bbb, which introduced deduplication to nbtree, added some
logic to take large posting list tuples into account when choosing a
split point.  We subtract firstright posting list overhead from the
projected new high key size when calculating leftfree/rightfree values
for an affected candidate split point.  Posting list tuples aren't
special to nbtsplitloc.c, but taking them into account like this makes a
huge difference in practice.  Posting list tuples are frequently tuple
size outliers.

However, commit 0d861bbb missed a closely related issue: split interval
itself is calculated based on the assumption that tuples on the page
being split are roughly equisized.  That assumption was acceptable back
when commit fab25024 taught the logic for choosing a split point about
suffix truncation, but it's pretty questionable now that very large
tuple sizes are common.  This oversight led to unbalanced page splits in
low cardinality multi-column indexes when deduplication was used: page
splits that don't give sufficient weight to how unbalanced the split is
when the interval happens to include some large posting list tuples (and
when most other tuples on the page are not so large).

Nail this down by calculating an initial split interval in a way that's
attuned to the actual cost that we want to keep under control (not a
fuzzy proxy for the cost): apply a leftfree + rightfree evenness test to
each candidate split point that actually gets included in the split
interval (for the default strategy).  This replaces logic that used a
percentage of all legal split points for the page as the basis of the
initial split interval.

Discussion: https://postgr.es/m/CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com
2020-04-21 09:59:24 -07:00
Tom Lane 1c455078b0 Allow matchingsel() to be used with operators that might return NULL.
Although selfuncs.c will never call a target operator with null inputs,
some functions might return null anyway.  The existing coding will fail
if that happens (since FunctionCall2Coll will punt), which seems
undesirable given that matchingsel() has such a broad range of potential
applicability --- in fact, we already have a problem because we apply it
to jsonb_path_exists_opr, which can return null.  Hence, rejigger the
underlying functions mcv_selectivity and histogram_selectivity to cope,
treating a null result as false.

While we are at it, we can move the InitFunctionCallInfoData overhead
out of the inner loops, which isn't a huge number of cycles but might
save something considering we are likely calling functions as cheap
as int4eq().  Plus, the number of loop cycles to be expected is much
more than it was when this code was written, since typical settings
of default_statistics_target are higher.

In view of that consideration, let's apply the same change to
var_eq_const, eqjoinsel_inner, and eqjoinsel_semi.  We do not expect
equality functions to ever return null for non-null inputs (and
certainly that code has been that way a long time without complaints),
but the cycle savings seem attractive, especially in the eqjoinsel loops
where there's potentially an O(N^2) savings.

Similar code exists in ineq_histogram_selectivity and
get_variable_range, but I forebore from changing those for now.
The performance argument for changing ineq_histogram_selectivity
is really weak anyway, since that will only iterate log2(N) times.

Nikita Glukhov and Tom Lane

Discussion: https://postgr.es/m/9d3b0959-95d6-c37e-2c0b-287bcfe5c705@postgrespro.ru
2020-04-21 12:56:55 -04:00
Tom Lane 9d25e1aa31 Clean up cpluspluscheck violation.
"operator" is a reserved word in C++, so per project conventions,
don't use it as an identifier in header files.

My oversight in commit a80818605.
2020-04-21 11:21:15 -04:00
Robert Haas 079ac29d4d Move the server's backup manifest code to a separate file.
basebackup.c is already a pretty big and complicated file, so it
makes more sense to keep the backup manifest support routines
in a separate file, for clarity and ease of maintenance.

Discussion: http://postgr.es/m/CA+TgmoavRak5OdP76P8eJExDYhPEKWjMb0sxW7dF01dWFgE=uA@mail.gmail.com
2020-04-20 14:38:15 -04:00
Alvaro Herrera 5fc703946b
Add ALTER .. NO DEPENDS ON
Commit f2fcad27d5 (9.6 era) added the ability to mark objects as
dependent an extension, but forgot to add a way for such dependencies to
be removed.  This commit fixes that oversight.

Strictly speaking this should be backpatched to 9.6, but due to lack of
demand we're not doing so at this time.

Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
Reviewed-by: ahsan hadi <ahsan.hadi@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
2020-04-20 13:42:12 -04:00
Magnus Hagander 7e4e574744 Allow pg_read_all_stats to access all stats views again
The views pg_stat_progress_* had not gotten the memo that
pg_read_all_stats is supposed to be able to read all statistics. Also
make a pass over all text-returning pg_stat_xyz functions that could
return "insufficient privilege" and make sure they also respect
pg_read_all_status.

Reported-by: Andrey M. Borodin
Reviewed-by: Andrey M. Borodin, Kyotaro Horiguchi
Discussion: https://postgr.es/m/13145F2F-8458-4977-9D2D-7B2E862E5722@yandex-team.ru
2020-04-20 12:53:40 +02:00
Jeff Davis 0cacb2b79d Fix missing pfree() in logtape.c, missed by 24d85952. 2020-04-19 10:33:06 -07:00
Tom Lane f332241a60 Fix race conditions in synchronous standby management.
We have repeatedly seen the buildfarm reach the Assert(false) in
SyncRepGetSyncStandbysPriority.  This apparently is due to failing to
consider the possibility that the sync_standby_priority values in
shared memory might be inconsistent; but they will be whenever only
some of the walsenders have updated their values after a change in
the synchronous_standby_names setting.  That function is vastly too
complex for what it does, anyway, so rewriting it seems better than
trying to apply a band-aid fix.

Furthermore, the API of SyncRepGetSyncStandbys is broken by design:
it returns a list of WalSnd array indexes, but there is nothing
guaranteeing that the contents of the WalSnd array remain stable.
Thus, if some walsender exits and then a new walsender process
takes over that WalSnd array slot, a caller might make use of
WAL position data that it should not, potentially leading to
incorrect decisions about whether to release transactions that
are waiting for synchronous commit.

To fix, replace SyncRepGetSyncStandbys with a new function
SyncRepGetCandidateStandbys that copies all the required data
from shared memory while holding the relevant mutexes.  If the
associated walsender process then exits, this data is still safe to
make release decisions with, since we know that that much WAL *was*
sent to a valid standby server.  This incidentally means that we no
longer need to treat sync_standby_priority as protected by the
SyncRepLock rather than the per-walsender mutex.

SyncRepGetSyncStandbys is no longer used by the core code, so remove
it entirely in HEAD.  However, it seems possible that external code is
relying on that function, so do not remove it from the back branches.
Instead, just remove the known-incorrect Assert.  When the bug occurs,
the function will return a too-short list, which callers should treat
as meaning there are not enough sync standbys, which seems like a
reasonably safe fallback until the inconsistent state is resolved.
Moreover it's bug-compatible with what has been happening in non-assert
builds.  We cannot do anything about the walsender-replacement race
condition without an API/ABI break.

The bogus assertion exists back to 9.6, but 9.6 is sufficiently
different from the later branches that the patch doesn't apply at all.
I chose to just remove the bogus assertion in 9.6, feeling that the
probability of a bad outcome from the walsender-replacement race
condition is too low to justify rewriting the whole patch for 9.6.

Discussion: https://postgr.es/m/21519.1585272409@sss.pgh.pa.us
2020-04-18 14:02:44 -04:00
David Rowley 3cb02e307e Fix possible crash with GENERATED ALWAYS columns
In some corner cases, this could also lead to corrupted values being
included in the tuple.

Users who are concerned that they are affected by this should first
upgrade and then perform a base backup of their database and restore onto
an off-line server. They should then query each table with generated
columns to ensure there are no rows where the generated expression does
not match a newly calculated version of the GENERATED ALWAYS expression.
If no crashes occur and no rows are returned then you're not affected.

Fixes bug #16369.

Reported-by: Cameron Ezell
Discussion: https://postgr.es/m/16369-5845a6f1bef59884@postgresql.org
Backpatch-through: 12 (where GENERATED ALWAYS columns were added.)
2020-04-18 14:10:37 +12:00
Tom Lane 3125a5baec Fix possible future cache reference leak in ALTER EXTENSION ADD/DROP.
recordExtObjInitPriv and removeExtObjInitPriv were sloppy about
calling ReleaseSysCache.  The cases cannot occur given current usage
in ALTER EXTENSION ADD/DROP, since we wouldn't get here for these
relkinds; but it seems wise to clean up better.

In passing, extend test logic in test_pg_dump to exercise the
dropped-column code paths here.

Since the case is unreachable at present, there seems no great
need to back-patch; hence fix HEAD only.

Kyotaro Horiguchi, with test case and comment adjustments by me

Discussion: https://postgr.es/m/20200417.151831.1153577605111650154.horikyota.ntt@gmail.com
2020-04-17 13:41:59 -04:00
David Rowley 5b736e9cf9 Remove unneeded constraint dependency tracking
It was previously thought that remove_useless_groupby_columns() needed to
keep track of which constraints the generated plan depended upon, however,
this is unnecessary. The confusion likely arose regarding this because of
check_functional_grouping(), which does need to track the dependency to
ensure VIEWs with columns which are functionally dependant on the GROUP BY
remain so. For remove_useless_groupby_columns(), cached plans will just
become invalidated when the primary key's underlying index is removed
through the normal relcache invalidation code.

Here we just remove the unneeded code which records the dependency and
updates the comments. The previous comments claimed that we could not use
UNIQUE constraints for the same optimization due to lack of a
pg_constraint record for NOT NULL constraints (which are required because
NULLs can be duplicated in a unique index). Since we don't actually need a
pg_constraint record to handle the invalidation, it looks like we could
add code to do this in the future. But not today.

We're not really fixing any bug in the code here, this fix is just to set
the record straight on UNIQUE constraints. This code was added back in
9.6, but due to lack of any bug, we'll not be backpatching this.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAApHDvrdYa=VhOoMe4ZZjZ-G4ALnD-xuAeUNCRTL+PYMVN8OnQ@mail.gmail.com
2020-04-17 10:29:49 +12:00
Amit Kapila 24d2d38b1e Fix the usage of parallel and full options of vacuum command.
Earlier we were inconsistent in allowing the usage of parallel and
full options.  Change it such that we disallow them only when they are
combined in a way that we don't support.

In passing, improve the comments in some of the existing tests of parallel
vacuum.

Reported-by: Tushar Ahuja
Author: Justin Pryzby, Amit Kapila
Reviewed-by: Sawada Masahiko, Michael Paquier, Mahendra Singh Thalor and
Amit Kapila
Discussion: https://postgr.es/m/58c8d171-e665-6fa3-a9d3-d9423b694dae%40enterprisedb.com
2020-04-16 10:55:02 +05:30
Peter Geoghegan f0ca378d4c Slightly simplify nbtree split point choice loop.
Spotted during post-commit review of the nbtree deduplication commit
(commit 0d861bbb).
2020-04-15 15:47:26 -07:00
Peter Geoghegan 4a05a64095 Remove obsolete "hole in center of page" comment.
A comment from the Berkeley days incorrectly claimed that the page
management code cares about the contents of the hole in the center of
the page (at least in the case of the left half of an nbtree page
split).  Commit 8fa30f906b added an addendum that stated that the
original comment was "probably obsolete".  It's definitely obsolete,
though, so remove the original comment plus the addendum.
2020-04-14 14:38:28 -07:00
Tom Lane 2d59643dbc Account for collation when coercing the output of a SQL function.
Commit 913bbd88d overlooked that the result of coerce_to_target_type
might need collation fixups.  Per report from Andreas Joseph Krogh.

Discussion: https://postgr.es/m/VisenaEmail.72.37d08ec2b8cb8fb5.17179940cd3@tc7-visena
2020-04-14 17:30:36 -04:00
Andrew Dunstan e60c6f6ea1 Set Perl search path more idiomatically
Back in commits 1df92eeafe, f884a96819, and 592123efbb I used some
hackish code to set the script search path, unaware despite decades of
perl that there was a completely standard way to do this. This patch
changes those cases to use the standard perl FindBin package.
2020-04-14 16:47:07 -04:00
Peter Geoghegan 80634e3b18 Rearrange _bt_insertonpg() "update metapage" code.
Nest the "update metapage as part of insert into root-like page" branch
inside the broader "insert into internal page" branch.  This improves
readability.
2020-04-14 09:33:18 -07:00
Michael Paquier 8128b0c152 Fix collection of typos and grammar mistakes in the tree, volume 2
This fixes some comments and documentation new as of Postgres 13, and is
a follow-up of the work done in dd0f37e.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
2020-04-14 14:45:43 +09:00
Peter Geoghegan f762b2feba Add defensive "split_only_page" nbtree assertion.
Clearly it's not okay for nbtree to split a page that is the only page
on its level, and then find that it has to split the parent one level up
in turn.  There is simply no code to handle the split_only_page case in
the _bt_insertonpg() "newitem won't fit" branch (only the "newitem fits"
branch handles split_only_page).  Add a defensive assertion that will
fail if a split_only_page call to _bt_insertonpg() somehow ends up
splitting the target/parent page.

I (pgeoghegan) believe that we don't need split_only_page handling for
the "newitem won't fit" branch because anybody calling _bt_insertonpg()
like this would have to hold a lock on the same one and only child page.
2020-04-13 21:11:03 -07:00
Amit Kapila a6fea120a7 Comments and doc fixes for commit 40d964ec99.
Reported-by: Justin Pryzby
Author: Justin Pryzby, with few changes by me
Reviewed-by: Amit Kapila and Sawada Masahiko
Discussion: https://postgr.es/m/20200322021801.GB2563@telsasoft.com
2020-04-14 08:10:27 +05:30
Peter Geoghegan 826ee1a019 Make _bt_insertonpg() more like _bt_split().
It seems like a good idea for nbtree's retail insert code to be
absolutely consistent with nbtree's page split code for anything that
naturally requires equivalent handling.  Anything that concerns
inserting newitem (which is handled as part of the page split atomic
action when a page split is required) should work in exactly the same
way.  With that in mind, make _bt_insertonpg() handle 'cbuf' in a way
that matches _bt_split().
2020-04-13 19:26:41 -07:00
Peter Geoghegan bc3087b626 Harmonize nbtree page split point code.
An nbtree split point can be thought of as a point between two adjoining
tuples from an imaginary version of the page being split that includes
the incoming/new item (in addition to the items that really are on the
page).  These adjoining tuples are called the lastleft and firstright
tuples.

The variables that represent split points contained a field called
firstright, which is an offset number of the first data item from the
original page that goes on the new right page.  The corresponding tuple
from origpage was usually the same thing as the actual firstright tuple,
but not always: the firstright tuple is sometimes the new/incoming item
instead.  This situation seems unnecessarily confusing.

Make things clearer by renaming the origpage offset returned by
_bt_findsplitloc() to "firstrightoff".  We now have a firstright tuple
and a firstrightoff offset number which are comparable to the
newitem/lastleft tuples and the newitemoff/lastleftoff offset numbers
respectively.  Also make sure that we are consistent about how we
describe nbtree page split point state.

Push the responsibility for dealing with pg_upgrade'd !heapkeyspace
indexes down to lower level code, relieving _bt_split() from dealing
with it directly.  This means that we always have a palloc'd left page
high key on the leaf level, no matter what.  This enables simplifying
some of the code (and code comments) within _bt_split().

Finally, restructure the page split code to make it clearer why suffix
truncation (which only takes place during leaf page splits) is
completely different to the first data item truncation that takes place
during internal page splits.  Tuples are marked as having fewer
attributes stored in both cases, and the firstright tuple is truncated
in both cases, so it's easy to imagine somebody missing the distinction.
2020-04-13 16:39:55 -07:00
Andrew Dunstan 7be5d8df1f Use perl warnings pragma consistently
We've had a mixture of the warnings pragma, the -w switch on the shebang
line, and no warnings at all. This patch removes the -w swicth and add
the warnings pragma to all perl sources missing it. It raises the
severity of the TestingAndDebugging::RequireUseWarnings  perlcritic
policy to level 5, so that we catch any future violations.

Discussion: https://postgr.es/m/20200412074245.GB623763@rfd.leadboat.com
2020-04-13 11:55:45 -04:00
Amit Kapila ef08ca113f Cosmetic fixups for WAL usage work.
Reported-by: Justin Pryzby and Euler Taveira
Author: Justin Pryzby and Julien Rouhaud
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-13 15:31:16 +05:30
Peter Eisentraut 0c620a5803 Improve error messages after LoadLibrary()
Move the file name to a format parameter to ease translatability.  Add
error code where missing.  Make the wording consistent.
2020-04-13 10:24:46 +02:00
Tom Lane 35cb574aa8 Suppress -Wimplicit-fallthrough warning in new LIMIT WITH TIES code.
The placement of the fall-through comment in this code appears not to
work to suppress the warning in recent gcc.  Move it to the bottom of
the case group, and add an assertion that we didn't get there through
some other code path.  Also improve wording of nearby comments.

Julien Rouhaud, comment hacking by me

Discussion: https://postgr.es/m/CAOBaU_aLdPGU5wCpaowNLF-Q8328iR7mj1yJAhMOVsdLwY+sdg@mail.gmail.com
2020-04-11 15:02:44 -04:00
Noah Misch 328c70997b Optimize RelationFindReplTupleSeq() for CLOBBER_CACHE_ALWAYS.
Specifically, remember lookup_type_cache() results instead of retrieving
them once per comparison.  Under CLOBBER_CACHE_ALWAYS, this reduced
src/test/subscription/t/001_rep_changes.pl elapsed time by an order of
magnitude, which reduced check-world elapsed time by 9%.

Discussion: https://postgr.es/m/20200406085420.GC162712@rfd.leadboat.com
2020-04-11 10:30:12 -07:00
Noah Misch 4216858122 When WalSndCaughtUp, sleep only in WalSndWaitForWal().
Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write
< sentPtr.  That is important in logical replication.  When the latest
physical LSN yields no logical replication messages (a common case),
that keepalive elicits a reply, and processing the reply updates
pg_stat_replication.replay_lsn.  WalSndLoop() lacks that; when
WalSndLoop() slept, replay_lsn advancement could stall until
wal_receiver_status_interval elapsed.  This sometimes stalled
src/test/subscription/t/001_rep_changes.pl for up to 10s.

Discussion: https://postgr.es/m/20200406063649.GA3738151@rfd.leadboat.com
2020-04-11 10:30:00 -07:00
Tom Lane 969f9d0b4b Make EXPLAIN report maximum hashtable usage across multiple rescans.
Before discarding the old hash table in ExecReScanHashJoin, capture
its statistics, ensuring that we report the maximum hashtable size
across repeated rescans of the hash input relation.  We can repurpose
the existing code for reporting hashtable size in parallel workers
to help with this, making the patch pretty small.  This also ensures
that if rescans happen within parallel workers, we get the correct
maximums across all instances.

Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro
of a trouble report from Alvaro Herrera.

Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql
2020-04-11 12:39:19 -04:00
Tom Lane 5c27bce7f3 Clear dangling pointer to avoid bogus EXPLAIN printout in a corner case.
ExecReScanHashJoin will destroy the join's hash table if it expects
that the inner relation will produce different rows on rescan.
Up to now it's not bothered to clear the additional pointer to that
hash table that exists in the child HashState node.  However, it's
possible for the query to terminate without building a fresh hash
table (this happens if the outer relation is found to be empty
during the final rescan).  So we can end with a dangling pointer
to a deleted hash table.  That was harmless originally, but since
9.0 EXPLAIN ANALYZE has used that pointer to print hash table
statistics.  In debug builds this reproducibly results in garbage
statistics.  In non-debug builds there's frequently no ill effects,
but in principle one could get wrong EXPLAIN ANALYZE output, or
perhaps even a crash if free() has released the hashtable memory
back to the OS.

To fix, just make sure we clear the additional pointer when destroying
the hash table.  In problematic cases, EXPLAIN ANALYZE will then print
no hashtable statistics (reverting to its pre-9.0 behavior).  This isn't
ideal, but since the problem manifests only in unusual corner cases,
it's hard to justify taking any risks to do better in the back
branches.  A follow-on patch will improve matters in HEAD.

Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro
of a trouble report from Alvaro Herrera.

Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql
2020-04-11 12:29:06 -04:00
Peter Eisentraut 12fb189bfe Fix RELCACHE_FORCE_RELEASE issue
Introduced by 83fd4532a7.  To fix, the
tuple descriptors need to be copied into the current memory context.

Discussion: https://www.postgresql.org/message-id/04d78603-edae-9243-9dde-fe3037176a7d@2ndquadrant.com
2020-04-11 15:07:25 +02:00
Peter Eisentraut 5a1d0c9925 Fix relcache reference leak
Introduced by 83fd4532a7
2020-04-11 09:44:14 +02:00
Tom Lane 401418ca6a Suppress unused-variable warning.
Ashutosh Bapat

Discussion: https://postgr.es/m/CAG-ACPWPB8Lc_aFj25eiPFqi31YB5vmaZnb39mbHSf5Yej=miA@mail.gmail.com
2020-04-10 12:00:28 -04:00
Michael Paquier dd0f37ecce Fix collection of typos and grammar mistakes in the tree
This fixes some comments and documentation new as of Postgres 13.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
2020-04-10 11:18:39 +09:00
Tom Lane 2e0e409e3c Further cleanup of ts_headline code.
Suppress a probably-meaningless uninitialized-variable warning
(induced by my previous patch, I'm sorry to say).

Improve mark_hl_fragments()'s test for overlapping cover strings:
it failed to consider the possibility that the current string is
strictly within another one.  That's unlikely given the preceding
splitting into MaxWords fragments, but I don't think it's impossible.

Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org
2020-04-09 15:38:43 -04:00
Tom Lane c9b0c678d3 Fix default text search parser's ts_headline code for phrase queries.
This code could produce very poor results when asked to highlight a
string based on a query using phrase-match operators.  The root cause
is that hlCover(), which is supposed to find a minimal substring that
matches the query, was written assuming that word position is not
significant.  I'm only 95% convinced that its algorithm was correct even
for plain AND/OR queries; but it definitely fails completely for phrase
matches, causing it to possibly not identify a cover string at all.

Hence, rewrite hlCover() with a less-tense algorithm that just tries
all the possible substrings, earlier and shorter ones first.  (This is
not as bad as it sounds performance-wise, because all of the string
matching has been done already: the repeated tsquery match checks
boil down to pointer comparisons.)

Unfortunately, since that approach produces more candidate cover
strings than before, it also exposes that there were bugs in the
heuristics in mark_hl_words() for selecting a best cover string.
Fixes there include:
* Do not apply the ShortWord filter to words that appear in the query.
* Remove a misguided optimization for quickly rejecting a cover.
* Fix order-of-operation bug that could cause computation of a
wrong figure of merit (poslen) when shortening a cover.
* Change the preference rule so that candidate headlines that do not
include their whole cover string (after MaxWords trimming) are lowest
priority, since they may not actually satisfy the user's query.

This results in some changes in existing regression test cases,
but they all seem reasonable.  Note in particular that the tests
involving strings like "1 2 3" were previously being affected by
the ShortWord filter, masking the normal matching behavior.

Per bug #16345 from Augustinas Jokubauskas; the new test cases are
based on that example.  Back-patch to 9.6 where phrase search was
added to tsquery.

Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org
2020-04-09 13:19:23 -04:00
Tom Lane b10f8bb9fd Cosmetic improvements for default text search parser's ts_headline code.
This code was woefully unreadable and under-commented.  Try to improve
matters by adding comments, using some macros to make complicated
if-tests more readable, using boolean type where appropriate, etc.
There are a couple of tiny coding improvements too, but this commit
includes (I hope) no behavioral change.

Nonetheless, back-patch as far as 9.6, because a followup bug-fixing
commit depends on this.

Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org
2020-04-09 12:36:59 -04:00
Peter Eisentraut e92e4a2b68 Fix CREATE TABLE LIKE INCLUDING GENERATED column order issue
CREATE TABLE LIKE INCLUDING GENERATED would fail if a generated column
referred to a column with a higher attribute number.  This is because
the column mapping mechanism created the mapping incrementally as
columns are added.  This was sufficient for previous uses of that
mechanism (omitting dropped columns), and it also happened to work if
generated columns only referred to columns with lower attribute
numbers, but here it failed.

This fix is to build the attribute mapping in a separate loop before
processing the columns in detail.

Bug: #16342
Reported-by: Ethan Waldo <ewaldo@healthetechs.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
2020-04-09 16:36:45 +02:00
Fujii Masao 1ec50a81ec Exclude backup_manifest file that existed in database, from BASE_BACKUP.
If there is already a backup_manifest file in the database cluster,
it belongs to the past backup that was used to start this server.
It is not correct for the backup being taken now. So this commit
changes pg_basebackup so that it always skips such backup_manifest
file. The backup_manifest file for the current backup will be injected
separately if users want it.

Author: Fujii Masao
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/78f76a3d-1a28-a97d-0394-5c96985dd1c0@oss.nttdata.com
2020-04-09 22:37:11 +09:00
Amit Kapila 5c71362174 Allow parallel create index to accumulate buffer usage stats.
Currently, we don't account for buffer usage incurred by parallel workers
for parallel create index.  This commit allows each worker to record the
buffer usage stats and leader backend to accumulate that stats at the
end of the operation.  This will allow pg_stat_statements to display
correct buffer usage stats for (parallel) create index command.

Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Julien Rouhaud and Amit Kapila
Backpatch-through: 11, where this was introduced
Discussion: https://postgr.es/m/20200328151721.GB12854@nol
2020-04-09 09:49:30 +05:30
Thomas Munro d140f2f3e2 Rationalize GetWalRcv{Write,Flush}RecPtr().
GetWalRcvWriteRecPtr() previously reported the latest *flushed*
location.  Adopt the conventional terminology used elsewhere in the tree
by renaming it to GetWalRcvFlushRecPtr(), and likewise for some related
variables that used the term "received".

Add a new definition of GetWalRcvWriteRecPtr(), which returns the latest
*written* value.  This will allow later patches to use the value for
non-data-integrity purposes, without having to wait for the flush
pointer to advance.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2020-04-08 23:45:09 +12:00
Peter Eisentraut 83fd4532a7 Allow publishing partition changes via ancestors
To control whether partition changes are replicated using their own
identity and schema or an ancestor's, add a new parameter that can be
set per publication named 'publish_via_partition_root'.

This allows replicating a partitioned table into a different partition
structure on the subscriber.

Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
2020-04-08 11:19:23 +02:00
Alexander Korotkov 1aac32df89 Revert 0f5ca02f53
0f5ca02f53 introduces 3 new keywords.  It appears to be too much for relatively
small feature.  Given now we past feature freeze, it's already late for
discussion of the new syntax.  So, revert.

Discussion: https://postgr.es/m/28209.1586294824%40sss.pgh.pa.us
2020-04-08 11:37:27 +03:00
David Rowley 02a2e8b442 Modify additional power 2 calculations to use new helper functions
2nd pass of modifying various places which obtain the next power
of 2 of a number and make them use the new functions added in
f0705bb62.

In passing, also modify num_combinations(). This can be implemented
using simple bitshifting rather than looping.

Reviewed-by: John Naylor
Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
2020-04-08 18:29:51 +12:00
Michael Paquier c0187869a0 Fix crash when using COLLATE in partition bound expressions
Attempting to use a COLLATE clause with a type that it not collatable in
a partition bound expression could crash the server.  This commit fixes
the code by adding more checks similar to what is done when computing
index or partition attributes by making sure that there is a collation
iff the type is collatable.

Backpatch down to 12, as 7c079d7 introduced this problem.

Reported-by: Alexander Lakhin
Author: Dmitry Dolgov
Discussion: https://postgr.es/m/16325-809194cf742313ab@postgresql.org
Backpatch-through: 12
2020-04-08 15:04:51 +09:00
David Rowley d025cf88ba Modify various power 2 calculations to use new helper functions
First pass of modifying various places that obtain the next power of 2 of
a number and make them use the new functions added in pg_bitutils.h
instead.

This also removes the _hash_log2() function. There are no longer any
callers in core. Other users can swap their _hash_log2(n) call to make use
of pg_ceil_log2_32(n).

Author: David Fetter, with some minor adjustments by me
Reviewed-by: John Naylor, Jesse Zhang
Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
2020-04-08 16:55:03 +12:00
Jeff Davis 50a38f6517 Create memory context for HashAgg with a reasonable maxBlockSize.
If the memory context's maxBlockSize is too big, a single block
allocation can suddenly exceed work_mem. For Hash Aggregation, this
can mean spilling to disk too early or reporting a confusing memory
usage number for EXPLAN ANALYZE.

Introduce CreateWorkExprContext(), which is like CreateExprContext(),
except that it creates the AllocSet with a maxBlockSize that is
reasonable in proportion to work_mem.

Right now, CreateWorkExprContext() is only used by Hash Aggregation,
but it may be generally useful in the future.

Discussion: https://postgr.es/m/412a3fbf306f84d8d78c4009e11791867e62b87c.camel@j-davis.com
2020-04-07 21:25:28 -07:00
Tom Lane 7a5d74b7dd Put back mistakenly removed #include.
In commit 4dbcb3f84 I removed some code from parse_coerce.c, and also
removed some apparently-no-longer-needed #includes.  But removing
datum.h broke some not-compiled-by-default code.

Discussion: https://postgr.es/m/20200407205436.pyjhddw5bn5upvsu@development
2020-04-08 00:10:16 -04:00
Thomas Munro 3985b600f5 Support PrefetchBuffer() in recovery.
Provide PrefetchSharedBuffer(), a variant that takes SMgrRelation, for
use in recovery.  Rename LocalPrefetchBuffer() to PrefetchLocalBuffer()
for consistency.

Add a return value to all of these.  In recovery, tolerate and report
missing files, so we can handle relations unlinked before crash recovery
began.  Also report cache hits and misses, so that callers can do faster
buffer lookups and better I/O accounting.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2020-04-08 14:56:57 +12:00
Tom Lane 981643dcdb Allow partitionwise join to handle nested FULL JOIN USING cases.
This case didn't work because columns merged by FULL JOIN USING are
represented in the parse tree by COALESCE expressions, and the logic
for recognizing a partitionable join failed to match upper-level join
clauses to such expressions.  To fix, synthesize suitable COALESCE
expressions and add them to the nullable_partexprs lists.  This is
pretty ugly and brute-force, but it gets the job done.  (I have
ambitions of rethinking the way outer-join output Vars are
represented, so maybe that will provide a cleaner solution someday.
For now, do this.)

Amit Langote, reviewed by Justin Pryzby, Richard Guo, and myself

Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
2020-04-07 22:12:14 -04:00
Etsuro Fujita c8434d64ce Allow partitionwise joins in more cases.
Previously, the partitionwise join technique only allowed partitionwise
join when input partitioned tables had exactly the same partition
bounds.  This commit extends the technique to some cases when the tables
have different partition bounds, by using an advanced partition-matching
algorithm introduced by this commit.  For both the input partitioned
tables, the algorithm checks whether every partition of one input
partitioned table only matches one partition of the other input
partitioned table at most, and vice versa.  In such a case the join
between the tables can be broken down into joins between the matching
partitions, so the algorithm produces the pairs of the matching
partitions, plus the partition bounds for the join relation, to allow
partitionwise join for computing the join.  Currently, the algorithm
works for list-partitioned and range-partitioned tables, but not
hash-partitioned tables.  See comments in partition_bounds_merge().

Ashutosh Bapat and Etsuro Fujita, most of regression tests by Rajkumar
Raghuwanshi, some of the tests by Mark Dilger and Amul Sul, reviewed by
Dmitry Dolgov and Amul Sul, with additional review at various points by
Ashutosh Bapat, Mark Dilger, Robert Haas, Antonin Houska, Amit Langote,
Justin Pryzby, and Tomas Vondra

Discussion: https://postgr.es/m/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com
2020-04-08 10:25:00 +09:00
Tom Lane 41a194f491 Fix circle_in to accept "(x,y),r" as it's advertised to do.
Our documentation describes four allowed input syntaxes for circles,
but the regression tests tried only three ... with predictable
consequences.  Remarkably, this has been wrong since the circle
datatype was added in 1997, but nobody noticed till now.

David Zhang, with some help from me

Discussion: https://postgr.es/m/332c47fa-d951-7574-b5cc-a8f7f7201202@highgo.ca
2020-04-07 20:50:28 -04:00
Andres Freund 75848bc744 snapshot scalability: Move delayChkpt from PGXACT to PGPROC.
The goal of separating hotly accessed per-backend data from PGPROC
into PGXACT is to make accesses fast (GetSnapshotData() in
particular). But delayChkpt is not actually accessed frequently; only
when starting a checkpoint. As it is frequently modified (multiple
times in the course of a single transaction), storing it in the same
cacheline as hotly accessed data unnecessarily dirties a contended
cacheline.

Therefore move delayChkpt to PGPROC.

This is part of a larger series of patches intending to improve
GetSnapshotData() scalability. It is committed and pushed separately,
as it is independently beneficial (small but measurable win, limited
by the other frequent modifications of PGXACT).

Author: Andres Freund
Reviewed-By: Robert Haas, Thomas Munro, David Rowley
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-04-07 17:36:23 -07:00
Tomas Vondra 2b88fdde30 Track SLRU page hits in SimpleLruReadPage_ReadOnly
SLRU page hits were tracked only in SimpleLruReadPage, but that's not
enough because we may hit the page in SimpleLruReadPage_ReadOnly in
which case we don't call SimpleLruReadPage at all.

Reported-by: Kuntal Ghosh
Discussion: https://postgr.es/m/20200119143707.gyinppnigokesjok@development
2020-04-08 02:15:47 +02:00
Andres Freund 91c40548d5 Fix XLogReader FD leak that makes backends unusable after 2PC usage.
Before the fix every 2PC commit/abort leaked a file descriptor. As the
files are opened using BasicOpenFile(), that quickly leads to the
backend running out of file descriptors.

Once enough 2PC abort/commit have caused enough FDs to leak, any IO
in the backend will fail with "Too many open files", as
BasicOpenFilePerm() will have triggered all open files known to fd.c
to be closed.

The leak causing the problem at hand is a consequence of 0dc8ead46,
but is only exascerbated by it. Previously most XLogPageReadCB
callbacks used static variables to cache one open file, but after the
commit the cache is private to each XLogReader instance. There never
was infrastructure to close FDs at the time of XLogReaderFree, but the
way XLogReader was used limited the leak to one FD.

This commit just closes the during XLogReaderFree() if the FD is
stored in XLogReaderState.seg.ws_segno. This may not be the way to
solve this medium/long term, but at least unbreaks 2PC.

Discussion: https://postgr.es/m/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de
2020-04-07 17:03:04 -07:00
Peter Geoghegan 60cbd7751c Remove nbtree BTreeTupleSetAltHeapTID() function.
Since heap TID is supposed to be just another key attribute to the
implementation, it doesn't make much sense to have separate
BTreeTupleSetNAtts() and BTreeTupleSetAltHeapTID() functions.  Merge the
two functions together.  This slightly simplifies _bt_truncate().
2020-04-07 15:56:52 -07:00
Alvaro Herrera c655077639
Allow users to limit storage reserved by replication slots
Replication slots are useful to retain data that may be needed by a
replication system.  But experience has shown that allowing them to
retain excessive data can lead to the primary failing because of running
out of space.  This new feature allows the user to configure a maximum
amount of space to be reserved using the new option
max_slot_wal_keep_size.  Slots that overrun that space are invalidated
at checkpoint time, enabling the storage to be released.

Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp
2020-04-07 18:35:00 -04:00
Alexander Korotkov 0f5ca02f53 Implement waiting for given lsn at transaction start
This commit adds following optional clause to BEGIN and START TRANSACTION
commands.

  WAIT FOR LSN lsn [ TIMEOUT timeout ]

New clause pospones transaction start till given lsn is applied on standby.
This clause allows user be sure, that changes previously made on primary would
be visible on standby.

New shared memory struct is used to track awaited lsn per backend.  Recovery
process wakes up backend once required lsn is applied.

Author: Ivan Kartyshov, Anna Akenteva
Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi
Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs
Reviewed-by: Amit Kapila, Alexander Korotkov
Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru
2020-04-07 23:51:10 +03:00
Alvaro Herrera 357889eb17
Support FETCH FIRST WITH TIES
WITH TIES is an option to the FETCH FIRST N ROWS clause (the SQL
standard's spelling of LIMIT), where you additionally get rows that
compare equal to the last of those N rows by the columns in the
mandatory ORDER BY clause.

There was a proposal by Andrew Gierth to implement this functionality in
a more powerful way that would yield more features, but the other patch
had not been finished at this time, so we decided to use this one for
now in the spirit of incremental development.

Author: Surafel Temesgen <surafel3000@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Discussion: https://postgr.es/m/CALAY4q9ky7rD_A4vf=FVQvCGngm3LOes-ky0J6euMrg=_Se+ag@mail.gmail.com
Discussion: https://postgr.es/m/87o8wvz253.fsf@news-spur.riddles.org.uk
2020-04-07 16:22:13 -04:00
Tom Lane 26a944cf29 Adjust bytea get_bit/set_bit to use int8 not int4 for bit numbering.
Since the existing bit number argument can't exceed INT32_MAX, it's
not possible for these functions to manipulate bits beyond the first
256MB of a bytea value.  Lift that restriction by redeclaring the
bit number arguments as int8 (which requires a catversion bump,
hence is not back-patchable).

The similarly-named functions for bit/varbit don't really have a
problem because we restrict those types to at most VARBITMAXLEN bits;
hence leave them alone.

While here, extend the encode/decode functions in utils/adt/encode.c
to allow dealing with values wider than 1GB.  This is not a live bug
or restriction in current usage, because no input could be more than
1GB, and since none of the encoders can expand a string more than 4X,
the result size couldn't overflow uint32.  But it might be desirable
to support more in future, so make the input length values size_t
and the potential-output-length values uint64.

Also add some test cases to improve the miserable code coverage
of these functions.

Movead Li, editorialized some by me; also reviewed by Ashutosh Bapat

Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
2020-04-07 15:57:58 -04:00
Tomas Vondra 9c74ceb20b Remove debugging elog from pgstat_recv_resetslrucounter
Reported-by: Thomas Munro
2020-04-07 19:20:20 +02:00
Tomas Vondra d22782a539 Minor improvements in Incremental Sort explain
Some places still used "Maximum" instead of "Peak" when displaying info
about sort space, so fix that. Also, add a comment clarifying why it's
correct to check the number of full/prefix sort groups.

Author: James Coleman
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-07 18:25:13 +02:00
Fujii Masao 4bd0ad9e44 Prevent archive recovery from scanning non-existent WAL files.
Previously when there were multiple timelines listed in the history file
of the recovery target timeline, archive recovery searched all of them,
starting from the newest timeline to the oldest one, to find the segment
to read. That is, archive recovery had to continuously fail scanning
the segment until it reached the timeline that the segment belonged to.
These scans for non-existent segment could be harmful on the recovery
performance especially when archival area was located on the remote
storage and each scan could take a long time.

To address the issue, this commit changes archive recovery so that
it skips scanning the timeline that the segment to read doesn't belong to.

Author: Kyotaro Horiguchi, tweaked a bit by Fujii Masao
Reviewed-by: David Steele, Pavel Suderevsky, Grigory Smolkin
Discussion: https://postgr.es/m/16159-f5a34a3a04dc67e0@postgresql.org
Discussion: https://postgr.es/m/20200129.120222.1476610231001551715.horikyota.ntt@gmail.com
2020-04-08 00:49:29 +09:00
Tomas Vondra ba3e76cc57 Consider Incremental Sort paths at additional places
Commit d2d8a229bc introduced Incremental Sort, but it was considered
only in create_ordered_paths() as an alternative to regular Sort. There
are many other places that require sorted input and might benefit from
considering Incremental Sort too.

This patch modifies a number of those places, but not all. The concern
is that just adding Incremental Sort to any place that already adds
Sort may increase the number of paths considered, negatively affecting
planning time, without any benefit. So we've taken a more conservative
approach, based on analysis of which places do affect a set of queries
that did seem practical. This means some less common queries may not
benefit from Incremental Sort yet.

Author: Tomas Vondra
Reviewed-by: James Coleman
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-07 16:43:22 +02:00
Tom Lane c7654f6a37 Fix representation of SORT_TYPE_STILL_IN_PROGRESS.
It turns out that the code did indeed rely on a zeroed
TuplesortInstrumentation.sortMethod field to indicate
"this worker never did anything", although it seems the
issue only comes up during certain race-condition-y cases.

Hence, rearrange the TuplesortMethod enum to restore
SORT_TYPE_STILL_IN_PROGRESS to having the value zero,
and add some comments reinforcing that that isn't optional.

Also future-proof a loop over the possible values of the enum.
sizeof(bits32) happened to be the correct limit value,
but only by purest coincidence.

Per buildfarm and local investigation.

Discussion: https://postgr.es/m/12222.1586223974@sss.pgh.pa.us
2020-04-06 22:22:13 -04:00
Thomas Munro 4c04be9b05 Introduce xid8-based functions to replace txid_XXX.
The txid_XXX family of fmgr functions exposes 64 bit transaction IDs to
users as int8.  Now that we have an SQL type xid8 for FullTransactionId,
define a new set of functions including pg_current_xact_id() and
pg_current_snapshot() based on that.  Keep the old functions around too,
for now.

It's a bit sneaky to use the same C functions for both, but since the
binary representation is identical except for the signedness of the
type, and since older functions are the ones using the wrong signedness,
and since we'll presumably drop the older ones after a reasonable period
of time, it seems reasonable to switch to FullTransactionId internally
and share the code for both.

Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com>
Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
2020-04-07 12:04:32 +12:00
Thomas Munro aeec457de8 Add SQL type xid8 to expose FullTransactionId to users.
Similar to xid, but 64 bits wide.  This new type is suitable for use in
various system views and administration functions.

Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com>
Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
2020-04-07 12:03:59 +12:00
Tomas Vondra 4bea576b03 Use INT64_FORMAT when formatting int64 values in explain
Per report from lapwing.
2020-04-07 01:16:57 +02:00
Peter Geoghegan ce2cee0ade Fix nbtree kill_prior_tuple posting list assert.
An assertion added by commit 0d861bbb checked that _bt_killitems() only
processes a BTScanPosItem whose heap TID is contained in a posting list
tuple when its page offset number still matches what is on the page
(i.e. when it matches the posting list tuple's current offset number).
This was only correct in the common case where the page can't have
changed since we first read it.  It was not correct in cases where we
don't drop the buffer pin (and don't need to verify the page hasn't
changed using its LSN).  The latter category includes scans involving
unlogged tables, and scans that use a non-MVCC snapshot, per the logic
originally introduced by commit 2ed5b87f.

The assertion still seems helpful.  Fix it by taking cases where the
page may have been concurrently modified into account.

Reported-By: Anastasia Lubennikova, Alexander Lakhin
Discussion: https://postgr.es/m/c4e38e9a-0f9c-8e53-e639-adf343f94472@postgrespro.ru
2020-04-06 14:46:33 -07:00
Tomas Vondra 7d6d82a524 Fix show_incremental_sort_info with force_parallel_mode
When executed with force_parallel_mode=regress, the function was exiting
too early and thus failed to print the worker stats. Fixed by making it
more like show_sort_info.

Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-06 23:19:13 +02:00
Tomas Vondra d2d8a229bc Implement Incremental Sort
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.

This has a number of benefits:

- Reduced memory consumption, because only a single group (determined by
  values in the sorted prefix) needs to be kept in memory. This may also
  eliminate the need to spill to disk.

- Lower startup cost, because Incremental Sort produce results after each
  prefix group, which is beneficial for plans where startup cost matters
  (like for example queries with LIMIT clause).

We consider both Sort and Incremental Sort, and decide based on costing.

The implemented algorithm operates in two different modes:

- Fetching a minimum number of tuples without check of equality on the
  prefix keys, and sorting on all columns when safe.

- Fetching all tuples for a single prefix group and then sorting by
  comparing only the remaining (non-prefix) keys.

We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.

This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.

There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.

Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-06 21:35:10 +02:00
Peter Eisentraut f1ac27bfda Add logical replication support to replicate into partitioned tables
Mainly, this adds support code in logical/worker.c for applying
replicated operations whose target is a partitioned table to its
relevant partitions.

Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
2020-04-06 15:15:52 +02:00
Amit Kapila b7ce6de93b Allow autovacuum to log WAL usage statistics.
This commit allows autovacuum to log WAL usage statistics added by commit
df3b181499.

Author: Julien Rouhaud
Reviewed-by: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-06 16:24:51 +05:30
Michael Paquier 8ef9451f58 Refactor cluster.c to use new routine get_index_isclustered()
This new cache lookup routine has been introduced in a40caf5, and more
code paths can directly use it.

Note that in cluster_rel(), the code was returning immediately if the
tuple's entry in pg_index for the clustered index was not valid.  This
commit changes the code so as a lookup error is raised instead,
something that could not happen from the start as we check for the
existence of the index beforehand, while holding an exclusive lock on
the parent table.

Author: Justin Pryzby
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com
2020-04-06 11:44:23 +09:00
Amit Kapila 33e05f89c5 Add the option to report WAL usage in EXPLAIN and auto_explain.
This commit adds a new option WAL similar to existing option BUFFERS in the
EXPLAIN command.  This option allows to include information on WAL record
generation added by commit df3b181499 in EXPLAIN output.

This also allows the WAL usage information to be displayed via
the auto_explain module.  A new parameter auto_explain.log_wal controls
whether WAL usage statistics are printed when an execution plan is logged.
This parameter has no effect unless auto_explain.log_analyze is enabled.

Author: Julien Rouhaud
Reviewed-by: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-06 08:02:15 +05:30
Michael Paquier a40caf5f86 Preserve clustered index after rewrites with ALTER TABLE
A table rewritten by ALTER TABLE would lose tracking of an index usable
for CLUSTER.  This setting is tracked by pg_index.indisclustered and is
controlled by ALTER TABLE, so some extra work was needed to restore it
properly.  Note that ALTER TABLE only marks the index that can be used
for clustering, and does not do the actual operation.

Author: Amit Langote, Justin Pryzby
Reviewed-by: Ibrar Ahmed, Michael Paquier
Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com
Backpatch-through: 9.5
2020-04-06 11:03:49 +09:00
Andres Freund fc3f4453a2 Recompute stack base in forked postmaster children.
This is for the benefit of running postgres under the rr
debugger. When using rr signal handlers running while a syscall is
active use an alternative stack. As e.g. bgworkers are started from
within signal handlers, the forked backend then has a different stack
base than postmaster. Previously that subsequently lead to those
processes triggering spurious "stack depth limit exceeded" errors.

Discussion: https://postgr.es/m/20200327182217.ubrrl32lyfhxfwk5@alap3.anarazel.de
2020-04-05 18:23:30 -07:00
Andres Freund f946069e68 Use TransactionXmin instead of RecentGlobalXmin in heap_abort_speculative().
There's a very low risk that RecentGlobalXmin could be far enough in
the past to be older than relfrozenxid, or even wrapped
around. Luckily the consequences of that having happened wouldn't be
too bad - the page wouldn't be pruned for a while.

Avoid that risk by using TransactionXmin instead. As that's announced
via MyPgXact->xmin, it is protected against wrapping around (see code
comments for details around relfrozenxid).

Author: Andres Freund
Discussion: https://postgr.es/m/20200328213023.s4eyijhdosuc4vcj@alap3.anarazel.de
Backpatch: 9.5-
2020-04-05 17:47:30 -07:00
Andres Freund 549a3e23c3 Fix recently introduced typo.
Reported-By: David Rowley
2020-04-05 12:03:09 -07:00
Peter Eisentraut a9d9bdd3ad Save errno across LWLockRelease() calls
Fixup for "Drop slot's LWLock before returning from SaveSlotToPath()"

Reported-by: Michael Paquier <michael@paquier.xyz>
2020-04-05 10:02:00 +02:00
Tom Lane 07871d40c7 Remove bogus Assert, add some regression test cases showing why.
Commit 77ec5affb added an assertion to enforce_generic_type_consistency
that boils down to "if the function result is polymorphic, there must be
at least one polymorphic argument".  This should be true for user-created
functions, but there are built-in functions for which it's not true, as
pointed out by Jaime Casanova.  Hence, go back to the old behavior of
leaving the return type alone.  There's only a limited amount of stuff
you can do with such a function result, but it does work to some extent;
add some regression test cases to ensure we don't break that again.

Discussion: https://postgr.es/m/CAJGNTeMbhtsCUZgJJ8h8XxAJbK7U2ipsX8wkHRtZRz-NieT8RA@mail.gmail.com
2020-04-04 18:03:30 -04:00
Noah Misch c6b92041d3 Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this.  If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules.  Maintainers of table access
methods should examine that section.

To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL.  A new GUC,
wal_skip_threshold, guides that choice.  If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.

Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node.  Make relcache.c retain entries for certain
dropped relations until end of transaction.

Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.

Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem.  Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.

Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 12:25:34 -07:00
Peter Eisentraut 552fcebff0 Revert "Improve handling of parameter differences in physical replication"
This reverts commit 246f136e76.

That patch wasn't quite complete enough.

Discussion: https://www.postgresql.org/message-id/flat/E1jIpJu-0007Ql-CL%40gemulon.postgresql.org
2020-04-04 09:08:12 +02:00
Amit Kapila df3b181499 Add infrastructure to track WAL usage.
This allows gathering the WAL generation statistics for each statement
execution.  The three statistics that we collect are the number of WAL
records, the number of full page writes and the amount of WAL bytes
generated.

This helps the users who have write-intensive workload to see the impact
of I/O due to WAL.  This further enables us to see approximately what
percentage of overall WAL is due to full page writes.

In the future, we can extend this functionality to allow us to compute the
the exact amount of WAL data due to full page writes.

This patch in itself is just an infrastructure to compute WAL usage data.
The upcoming patches will expose this data via explain, auto_explain,
pg_stat_statements and verbose (auto)vacuum output.

Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-04 10:02:08 +05:30
Jeff Davis 0588ee63aa Include chunk overhead in hash table entry size estimate.
Don't try to be precise about it, just use a constant 16 bytes of
chunk overhead. Being smarter would require knowing the memory context
where the chunk will be allocated, which is not known by all callers.

Discussion: https://postgr.es/m/20200325220936.il3ni2fj2j2b45y5@alap3.anarazel.de
2020-04-03 20:07:58 -07:00
Robert Haas 3e0d80fd8d Fix resource management bug with replication=database.
Commit 0d8c9c1210 allowed BASE_BACKUP to
acquire a ResourceOwner without a transaction so that the backup
manifest functionality could use a BufFile, but it overlooked the fact
that when a walsender is used with replication=database, it might have
a transaction in progress, because in that mode, SQL and replication
commands can be mixed.  Try to fix things up so that the two cleanup
mechanisms don't conflict.

Per buildfarm member serinus, which triggered the problem when
CREATE_REPLICATION_SLOT failed from inside a transaction.  It passed
on the subsequent run, so evidently the failure doesn't happen every
time.
2020-04-03 22:28:37 -04:00
Robert Haas db1531cae0 Be more careful about time_t vs. pg_time_t in basebackup.c.
lapwing is complaining that about a call to pg_gmtime, saying that
it "expected 'const pg_time_t *' but argument is of type 'time_t *'".
I at first thought that the problem had someting to do with const,
but Thomas Munro suggested that it might be just because time_t
and pg_time_t are different identifers. lapwing is i686 rather than
x86_64, and pg_time_t is always int64, so that seems like a good
guess.

There is other code that just casts time_t to pg_time_t without
any conversion function, so try that approach here.

Introduced in commit 0d8c9c1210.
2020-04-03 20:18:47 -04:00
Tom Lane 0568e7a2a4 Cosmetic improvements for code related to partitionwise join.
Move have_partkey_equi_join and match_expr_to_partition_keys to
relnode.c, since they're used only there.  Refactor
build_joinrel_partition_info to split out the code that fills the
joinrel's partition key lists; this doesn't have any non-cosmetic
impact, but it seems like a useful separation of concerns.
Improve assorted nearby comments.

Amit Langote, with a little further editorialization by me

Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
2020-04-03 17:00:35 -04:00
Robert Haas 0d8c9c1210 Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.

A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.

Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.

Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 15:05:59 -04:00
Fujii Masao ce77abe63c Include information on buffer usage during planning phase, in EXPLAIN output, take two.
When BUFFERS option is enabled, EXPLAIN command includes the information
on buffer usage during each plan node, in its output. In addition to that,
this commit makes EXPLAIN command include also the information on
buffer usage during planning phase, in its output. This feature makes it
easier to discern the cases where lots of buffer access happen during
planning.

This commit revives the original commit ed7a509571 that was reverted by
commit 19db23bcbd. The original commit had to be reverted because
it caused the regression test failure on the buildfarm members prion and
dory. But since commit c0885c4c30 got rid of the caues of the test failure,
the original commit can be safely introduced again.

Author: Julien Rouhaud, slightly revised by Fujii Masao
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/16109-26a1a88651e90608@postgresql.org
2020-04-04 03:13:17 +09:00
Tom Lane e41955faf0 Fix bugs in gin_fuzzy_search_limit processing.
entryGetItem()'s three code paths each contained bugs associated
with filtering the entries for gin_fuzzy_search_limit.

The posting-tree path failed to advance "advancePast" after having
decided to filter an item.  If we ran out of items on the current
page and needed to advance to the next, what would actually happen
is that entryLoadMoreItems() would re-load the same page.  Eventually,
the random dropItem() test would accept one of the same items it'd
previously rejected, and we'd move on --- but it could take awhile
with small gin_fuzzy_search_limit.  To add insult to injury, this
case would inevitably cause entryLoadMoreItems() to decide it needed
to re-descend from the root, making things even slower.

The posting-list path failed to implement gin_fuzzy_search_limit
filtering at all, so that all entries in the posting list would
be returned.

The bitmap-result path used a "gotitem" variable that it failed to
update in the one place where it'd actually make a difference, ie
at the one "continue" statement.  I think this was unreachable in
practice, because if we'd looped around then it shouldn't be the
case that the entries on the new page are before advancePast.
Still, the "gotitem" variable was contributing nothing to either
clarity or correctness, so get rid of it.

Refactor all three loops so that the termination conditions are
more alike and less unreadable.

The code coverage report showed that we had no coverage at all for
the re-descend-from-root code path in entryLoadMoreItems(), which
seems like a very bad thing, so add a test case that exercises it.
We also had exactly no coverage for gin_fuzzy_search_limit, so add a
simplistic test case that at least hits those code paths a little bit.

Back-patch to all supported branches.

Adé Heyward and Tom Lane

Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
2020-04-03 13:15:45 -04:00
Tom Lane 6dd9f35779 Fix bogus CALLED_AS_TRIGGER() defenses.
contrib/lo's lo_manage() thought it could use
trigdata->tg_trigger->tgname in its error message about
not being called as a trigger.  That naturally led to a core dump.

unique_key_recheck() figured it could Assert that fcinfo->context
is a TriggerData node in advance of having checked that it's
being called as a trigger.  That's harmless in production builds,
and perhaps not that easy to reach in any case, but it's logically
wrong.

The first of these per bug #16340 from William Crowell;
the second from manual inspection of other CALLED_AS_TRIGGER
call sites.

Back-patch the lo.c change to all supported branches, the
other to v10 where the thinko crept in.

Discussion: https://postgr.es/m/16340-591c7449dc7c8c47@postgresql.org
2020-04-03 11:24:56 -04:00
Fujii Masao 19db23bcbd Revert "Include information on buffer usage during planning phase, in EXPLAIN output."
This reverts commit ed7a509571.

Per buildfarm member prion.
2020-04-03 12:20:42 +09:00
Fujii Masao 18808f8c89 Add wait events for recovery conflicts.
This commit introduces new wait events RecoveryConflictSnapshot and
RecoveryConflictTablespace. The former is reported while waiting for
recovery conflict resolution on a vacuum cleanup. The latter is reported
while waiting for recovery conflict resolution on dropping tablespace.

Also this commit changes the code so that the wait event Lock is reported
while waiting in ResolveRecoveryConflictWithVirtualXIDs() for recovery
conflict resolution on a lock. Basically the wait event Lock is reported
during that wait, but previously was not reported only when that wait
happened in ResolveRecoveryConflictWithVirtualXIDs().

Author: Masahiko Sawada
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com
2020-04-03 12:15:56 +09:00
Fujii Masao ed7a509571 Include information on buffer usage during planning phase, in EXPLAIN output.
When BUFFERS option is enabled, EXPLAIN command includes the information
on buffer usage during each plan node, in its output. In addition to that,
this commit makes EXPLAIN command include also the information on
buffer usage during planning phase, in its output. This feature makes it
easier to discern the cases where lots of buffer access happen during
planning.

Author: Julien Rouhaud, slightly revised by Fujii Masao
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/16109-26a1a88651e90608@postgresql.org
2020-04-03 11:27:09 +09:00