Commit Graph

21621 Commits

Author SHA1 Message Date
Tom Lane
ed29089633 Propagate CTE property flags when copying a CTE list into a rule.
rewriteRuleAction() neglected this step, although it was careful to
propagate other similar flags such as hasSubLinks or hasRowSecurity.
Omitting to transfer hasRecursive is just cosmetic at the moment,
but omitting hasModifyingCTE is a live bug, since the executor
certainly looks at that.

The proposed test case only fails back to v10, but since the executor
examines hasModifyingCTE in 9.x as well, I suspect that a test case
could be devised that fails in older branches.  Given the nearness
of the release deadline, though, I'm not going to spend time looking
for a better test.

Report and patch by Greg Nancarrow, cosmetic changes by me

Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-02-06 19:28:39 -05:00
Tom Lane
dd705a039f Disallow converting an inheritance child table to a view.
Generally, members of inheritance trees must be plain tables (or,
in more recent versions, foreign tables).  ALTER TABLE INHERIT
rejects creating an inheritance relationship that has a view at
either end.  When DefineQueryRewrite attempts to convert a relation
to a view, it already had checks prohibiting doing so for partitioning
parents or children as well as traditional-inheritance parents ...
but it neglected to check that a traditional-inheritance child wasn't
being converted.  Since the planner assumes that any inheritance
child is a table, this led to making plans that tried to do a physical
scan on a view, causing failures (or even crashes, in recent versions).

One could imagine trying to support such a case by expanding the view
normally, but since the rewriter runs before the planner does
inheritance expansion, it would take some very fundamental refactoring
to make that possible.  There are probably a lot of other parts of the
system that don't cope well with such a situation, too.  For now,
just forbid it.

Per bug #16856 from Yang Lin.  Back-patch to all supported branches.
(In versions before v10, this includes back-patching the portion of
commit 501ed02cf that added has_superclass().  Perhaps the lack of
that infrastructure partially explains the missing check.)

Discussion: https://postgr.es/m/16856-0363e05c6e1612fd@postgresql.org
2021-02-06 15:17:01 -05:00
Michael Paquier
f7400823c3 Clarify some comments around SharedRecoveryState in xlog.c
SharedRecoveryState has been switched from a boolean to an enum as of
commit 4e87c48, but some comments still referred to it as a boolean.

Author: Amul Sul
Reviewed-by: Dilip Kumar, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAAJ_b97Hf+1SXnm8jySpO+Fhm+-VKFAAce1T_cupUYtnE3Nxig
2021-02-06 10:27:55 +09:00
Heikki Linnakangas
c444472af5 Fix backslash-escaping multibyte chars in COPY FROM.
If a multi-byte character is escaped with a backslash in TEXT mode input,
and the encoding is one of the client-only encodings where the bytes after
the first one can have an ASCII byte "embedded" in the char, we didn't
skip the character correctly. After a backslash, we only skipped the first
byte of the next character, so if it was a multi-byte character, we would
try to process its second byte as if it was a separate character. If it
was one of the characters with special meaning, like '\n', '\r', or
another '\\', that would cause trouble.

One such exmple is the byte sequence '\x5ca45c2e666f6f' in Big5 encoding.
That's supposed to be [backslash][two-byte character][.][f][o][o], but
because the second byte of the two-byte character is 0x5c, we incorrectly
treat it as another backslash. And because the next character is a dot, we
parse it as end-of-copy marker, and throw an "end-of-copy marker corrupt"
error.

Backpatch to all supported versions.

Reviewed-by: John Naylor, Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/a897f84f-8dca-8798-3139-07da5bb38728%40iki.fi
2021-02-05 11:14:56 +02:00
Tom Lane
0ff865fbe5 Fix bug in HashAgg's selective-column-spilling logic.
Commit 230230223 taught nodeAgg.c that, when spilling tuples from
memory in an oversized hash aggregation, it only needed to spill
input columns referenced in the node's tlist and quals.  Unfortunately,
that's wrong: we also have to save the grouping columns.  The error
is masked in common cases because the grouping columns also appear
in the tlist, but that's not necessarily true.  The main category
of plans where it's not true seem to come from semijoins ("WHERE
outercol IN (SELECT innercol FROM innertable)") where the innercol
needs an implicit promotion to make it comparable to the outercol.
The grouping column will be "innercol::promotedtype", but that
expression appears nowhere in the Agg node's own tlist and quals;
only the bare "innercol" is found in the tlist.

I spent quite a bit of time looking for a suitable regression test
case for this, without much success.  If the number of distinct
values of the innercol is large enough to make spilling happen,
the planner tends to prefer a non-HashAgg plan, at least for
problem sizes that are reasonable to use in the regression tests.
So, no new regression test.  However, this patch does demonstrably
fix the originally-reported test case.

Per report from s.p.e (at) gmx-topmail.de.  Backpatch to v13
where the troublesome code came in.

Discussion: https://postgr.es/m/trinity-1c565d44-159f-488b-a518-caf13883134f-1611835701633@3c-app-gmx-bap78
2021-02-04 23:01:37 -05:00
Tom Lane
82e0e29308 Fix YA incremental sort bug.
switchToPresortedPrefixMode() did the wrong thing if it detected
a batch boundary just at the last tuple of a fullsort group.

The initially-reported symptom was a "retrieved too many tuples in a
bounded sort" error, but the test case added here just silently gives
the wrong answer without this patch.

I (tgl) am not really happy about committing this patch without review
from the incremental-sort authors, but they seem AWOL and we are hard
against a release deadline.  This does demonstrably make some cases
better, anyway.

Per bug #16846 from Yoran Heling.  Back-patch to v13 where incremental
sort was introduced.

Neil Chen

Discussion: https://postgr.es/m/16846-ae49f51ac379a4cb@postgresql.org
2021-02-04 19:12:14 -05:00
Peter Geoghegan
c34787f910 Harden nbtree page deletion.
Add some additional defensive checks in the second phase of index
deletion to detect and report index corruption during VACUUM, and to
avoid having VACUUM become stuck in more cases.  The code is still not
robust in the presence of a circular chain of sibling links, though it's
not clear whether that really matters.  This is follow-up work to commit
3a01f68e.

The new defensive checks rely on the assumption that there can be no
more than one VACUUM operation running for an index at any given time.
Remove an old comment suggesting that multiple concurrent VACUUMs need
to be considered here.  This concern now seems highly unlikely to have
any real validity, since we clearly rely on the same assumption in
several other places.  For example, there are much more recent comments
that appear in the same function (added by commit efada2b8e9) that make
the same assumption.

Also add a CHECK_FOR_INTERRUPTS() to the relevant code path.  Contrary
to comments added by commit 3a01f68e, it is actually possible to handle
interrupts here, at least in the common case where processing takes
place at the leaf level.  We only hold a pin on leafbuf/target page when
stepping right at the leaf level.

No backpatch due to the lack of complaints following hardening added to
the same area by commit 3a01f68e.
2021-02-04 15:42:36 -08:00
Heikki Linnakangas
2f86ab305e Fix small error in COPY FROM progress reporting.
The # of bytes processed was accumulated slightly incorrectly. After
loading more data to the input buffer, we added the number of bytes in
the buffer to the sum. But in case of multi-byte characters or escapes,
there can be a few unprocessed bytes left over from previous load in the
buffer. Those bytes got counted twice.
2021-02-04 17:40:33 +02:00
Peter Eisentraut
3c78e0569c Refactor Windows error message for easier translation
In the error messages referring to the user right "Lock pages in
memory", this is a term from the Windows OS, so it should be
translated in accordance with the OS localization.  Refactor the error
messages so this is easier and clearer.  Also fix the capitalization
to match the existing capitalization in the OS.
2021-02-04 13:31:13 +01:00
Michael Paquier
5128483d06 Ensure unlinking of old index file with REINDEX (TABLESPACE)
The original versions of the patch included this part, but a mismerge
from my side has made this piece go missing.  Oversight in c5b28604.
2021-02-04 17:16:47 +09:00
Michael Paquier
fc749bc704 Clarify comment in tablesync.c
Author: Peter Smith
Reviewed-by: Amit Kapila, Michael Paquier, Euler Taveira
Discussion: https://postgr.es/m/CAHut+Pt9_T6pWar0FLtPsygNmme8HPWPdGUyZ_8mE1Yvjdf0ZA@mail.gmail.com
2021-02-04 16:02:31 +09:00
Michael Paquier
c5b286047c Add TABLESPACE option to REINDEX
This patch adds the possibility to move indexes to a new tablespace
while rebuilding them.  Both the concurrent and the non-concurrent cases
are supported, and the following set of restrictions apply:
- When using TABLESPACE with a REINDEX command that targets a
partitioned table or index, all the indexes of the leaf partitions are
moved to the new tablespace.  The tablespace references of the non-leaf,
partitioned tables in pg_class.reltablespace are not changed. This
requires an extra ALTER TABLE SET TABLESPACE.
- Any index on a toast table rebuilt as part of a parent table is kept
in its original tablespace.
- The operation is forbidden on system catalogs, including trying to
directly move a toast relation with REINDEX.  This results in an error
if doing REINDEX on a single object.  REINDEX SCHEMA, DATABASE and
SYSTEM skip system relations when TABLESPACE is used.

Author: Alexey Kondratov, Michael Paquier, Justin Pryzby
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
2021-02-04 14:34:20 +09:00
Tom Lane
9624321ec5 Avoid crash when rolling back within a prepared statement.
If a portal is used to run a prepared CALL or DO statement that
contains a ROLLBACK, PortalRunMulti fails because the portal's
statement list gets cleared by the rollback.  (Since the grammar
doesn't allow CALL/DO in PREPARE, the only easy way to get to this is
via extended query protocol, which treats all inputs as prepared
statements.)  It's difficult to avoid resetting the portal early
because of resource-management issues, so work around this by teaching
PortalRunMulti to be wary of portal->stmts having suddenly become NIL.

The crash has only been seen to occur in v13 and HEAD (as a
consequence of commit 1cff1b95a having added an extra touch of
portal->stmts).  But even before that, the code involved touching a
List that the portal no longer has any claim on.  In the test case at
hand, the List will still exist because of another refcount on the
cached plan; but I'm far from convinced that it's impossible for the
cached plan to have been dropped by the time control gets back to
PortalRunMulti.  Hence, backpatch to v11 where nested transactions
were added.

Thomas Munro and Tom Lane, per bug #16811 from James Inform

Discussion: https://postgr.es/m/16811-c1b599b2c6c2d622@postgresql.org
2021-02-03 19:38:43 -05:00
Tom Lane
ba0faf81c6 Remove special BKI_LOOKUP magic for namespace and role OIDs.
Now that commit 62f34097c attached BKI_LOOKUP annotation to all the
namespace and role OID columns in the catalogs, there's no real reason
to have the magic PGNSP and PGUID symbols.  Get rid of them in favor
of implementing those lookups according to genbki.pl's normal pattern.

This means that in the catalog headers, BKI_DEFAULT(PGNSP) becomes
BKI_DEFAULT(pg_catalog), which seems a lot more transparent.
BKI_DEFAULT(PGUID) becomes BKI_DEFAULT(POSTGRES), which is perhaps
less so; but you can look into pg_authid.dat to discover that
POSTGRES is the nonce name for the bootstrap superuser.

This change also means that if we ever need cross-references in the
initial catalog data to any of the other built-in roles besides
POSTGRES, or to some other built-in schema besides pg_catalog,
we can just do it.

No catversion bump here, as there's no actual change in the contents
of postgres.bki.

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-03 12:01:48 -05:00
Tom Lane
62f34097c8 Build in some knowledge about foreign-key relationships in the catalogs.
This follows in the spirit of commit dfb75e478, which created primary
key and uniqueness constraints to improve the visibility of constraints
imposed on the system catalogs.  While our catalogs contain many
foreign-key-like relationships, they don't quite follow SQL semantics,
in that the convention for an omitted reference is to write zero not
NULL.  Plus, we have some cases in which there are arrays each of whose
elements is supposed to be an FK reference; SQL has no way to model that.
So we can't create actual foreign key constraints to describe the
situation.  Nonetheless, we can collect and use knowledge about these
relationships.

This patch therefore adds annotations to the catalog header files to
declare foreign-key relationships.  (The BKI_LOOKUP annotations cover
simple cases, but we weren't previously distinguishing which such
columns are allowed to contain zeroes; we also need new markings for
multi-column FK references.)  Then, Catalog.pm and genbki.pl are
taught to collect this information into a table in a new generated
header "system_fk_info.h".  The only user of that at the moment is
a new SQL function pg_get_catalog_foreign_keys(), which exposes the
table to SQL.  The oidjoins regression test is rewritten to use
pg_get_catalog_foreign_keys() to find out which columns to check.
Aside from removing the need for manual maintenance of that test
script, this allows it to cover numerous relationships that were not
checked by the old implementation based on findoidjoins.  (As of this
commit, 217 relationships are checked by the test, versus 181 before.)

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-02 17:11:55 -05:00
Peter Eisentraut
1d71f3c83c Improve confusing variable names
The prototype calls the second argument of
pgstat_progress_update_multi_param() "index", and some callers name
their local variable that way.  But when the surrounding code deals
with index relations, this is confusing, and in at least one case
shadowed another variable that is referring to an index relation.
Adjust those call sites to have clearer local variable naming, similar
to existing callers in indexcmds.c.
2021-02-02 09:20:22 +01:00
Michael Paquier
4ad31bb2ef Remove unused column atttypmod from initial tablesync query
The initial tablesync done by logical replication used a query to fetch
the information of a relation's columns that included atttypmod, but it
was left unused.  This was added by 7c4f524.

Author: Euler Taveira
Reviewed-by: Önder Kalacı, Amit Langote, Japin Li
Discussion: https://postgr.es/m/CAHE3wggb715X+mK_DitLXF25B=jE6xyNCH4YOwM860JR7HarGQ@mail.gmail.com
2021-02-02 13:59:23 +09:00
Tom Lane
f003a7522b Remove [Merge]AppendPath.partitioned_rels.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

The preceding two commits did away with all use of the partitioned_rels
values; this commit just mechanically removes the fields and the code
that calculated them.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:43:54 -05:00
Tom Lane
5076f88bc9 Remove incidental dependencies on partitioned_rels lists.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

This patch undoes a couple of very minor uses of the partitioned_rels
values.

createplan.c was testing for nil-ness to optimize away the preparatory
work for make_partition_pruneinfo().  That is worth doing if the check
is nigh free, but it's not worth going to any great lengths to avoid.

create_append_path() was testing for nil-ness as part of deciding how
to set up ParamPathInfo for an AppendPath.  I replaced that with a
check for the appendrel's parent rel being partitioned.  That's not
quite the same thing but should cover most cases.  If we note any
interesting loss of optimizations, we can dumb this down to just
always use the more expensive method when the parent is a baserel.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:34:59 -05:00
Tom Lane
fb2d645dd5 Revise make_partition_pruneinfo to not use its partitioned_rels input.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

As a first step, teach make_partition_pruneinfo to collect the relevant
partitioned tables for itself.  It's not hard to do so by traversing
from child tables up to parents using the AppendRelInfo links.

While here, make some minor stylistic improvements; mainly, don't use
the "Relids" alias for bitmapsets that are not identities of any
relation considered by the planner.  Try to document the logic better,
too.

No backpatch, as there does not seem to be a live problem before
a929e17e5a.  Also no new regression test; the code where the bug
was will be gone at the end of this patch series, so it seems a
bit pointless to memorialize the issue.

Tom Lane and David Rowley, per reports from Andreas Seltenreich
and Jaime Casanova.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:05:51 -05:00
Peter Eisentraut
3696a600e2 SEARCH and CYCLE clauses
This adds the SQL standard feature that adds the SEARCH and CYCLE
clauses to recursive queries to be able to do produce breadth- or
depth-first search orders and detect cycles.  These clauses can be
rewritten into queries using existing syntax, and that is what this
patch does in the rewriter.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-01 14:32:51 +01:00
Alexander Korotkov
bb513b364b Get rid of unnecessary memory allocation in jsonb_subscript_assign()
Current code allocates memory for JsonbValue, but it could be placed locally.
2021-02-01 14:06:02 +03:00
Michael Paquier
fe61df7f82 Introduce --with-ssl={openssl} as a configure option
This is a replacement for the existing --with-openssl, extending the
logic to make easier the addition of new SSL libraries.  The grammar is
chosen to be similar to --with-uuid, where multiple values can be
chosen, with "openssl" as the only supported value for now.

The original switch, --with-openssl, is kept for compatibility.

Author: Daniel Gustafsson, Michael Paquier
Reviewed-by: Jacob Champion
Discussion: https://postgr.es/m/FAB21FC8-0F62-434F-AA78-6BD9336D630A@yesql.se
2021-02-01 19:19:44 +09:00
Tom Lane
7c5d57caed Fix portability issue in new jsonbsubs code.
On machines where sizeof(Datum) > sizeof(Oid) (that is, any 64-bit
platform), the previous coding would compute a misaligned
workspace->index pointer if nupper is odd.  Architectures where
misaligned access is a hard no-no would then fail.  This appears
to explain why thorntail is unhappy but other buildfarm members
are not.
2021-02-01 02:03:59 -05:00
Alexander Korotkov
aa6e46daf5 Throw error when assigning jsonb scalar instead of a composite object
During the jsonb subscripting assignment, the provided path might assume an
object or an array where the source jsonb has a scalar value.  Initial
subscripting assignment logic will skip such an update operation with no
message shown.  This commit makes it throw an error to indicate this type
of situation.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:51:06 +03:00
Alexander Korotkov
81fcc72e66 Filling array gaps during jsonb subscripting
This commit introduces two new flags for jsonb assignment:

* JB_PATH_FILL_GAPS: Appending array elements on the specified position, gaps
  are filled with nulls (similar to the JavaScript behavior).  This mode also
  instructs to   create the whole path in a jsonb object if some part of the
  path (more than just the last element) is not present.

* JB_PATH_CONSISTENT_POSITION: Assigning keeps array positions consistent by
  preventing prepending of elements.

Both flags are used only in jsonb subscripting assignment.

Initially proposed by Nikita Glukhov based on polymorphic subscripting
patch, but transformed into an independent change.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:51:01 +03:00
Alexander Korotkov
676887a3b0 Implementation of subscripting for jsonb
Subscripting for jsonb does not support slices, does not have a limit for the
number of subscripts, and an assignment expects a replace value to have jsonb
type.  There is also one functional difference between assignment via
subscripting and assignment via jsonb_set().  When an original jsonb container
is NULL, the subscripting replaces it with an empty jsonb and proceeds with
an assignment.

For the sake of code reuse, we rearrange some parts of jsonb functionality
to allow the usage of the same functions for jsonb_set and assign subscripting
operation.

The original idea belongs to Oleg Bartunov.

Catversion is bumped.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:50:40 +03:00
Peter Geoghegan
dc43492e46 Remove unused _bt_delitems_delete() argument.
The latestRemovedXid values used by nbtree deletion operations are
determined by _bt_delitems_delete()'s caller, so there is no reason to
pass a separate heapRel argument.

Oversight in commit d168b66682.
2021-01-31 10:10:55 -08:00
Alexander Korotkov
0c4f355c6a Fix parsing of complex morphs to tsquery
When to_tsquery() or websearch_to_tsquery() meet a complex morph containing
multiple words residing adjacent position, these words are connected
with OP_AND operator.  That leads to surprising results.  For instace,
both websearch_to_tsquery('"pg_class pg"') and to_tsquery('pg_class <-> pg')
produce '( pg & class ) <-> pg' tsquery.  This tsquery requires
'pg' and 'class' words to reside on the same position and doesn't match
to to_tsvector('pg_class pg').  It appears to be ridiculous behavior, which
needs to be fixed.

This commit makes to_tsquery() or websearch_to_tsquery() connect words
residing adjacent position with OP_PHRASE.  Therefore, now those words are
normally chained with other OP_PHRASE operator.  The examples of above now
produces 'pg <-> class <-> pg' tsquery, which matches to
to_tsvector('pg_class pg').

Another effect of this commit is that complex morph word positions now need to
match the tsvector even if there is no surrounding OP_PHRASE.  This behavior
change generally looks like an improvement but making this commit not
backpatchable.

Reported-by: Barry Pederson
Bug: #16592
Discussion: https://postgr.es/m/16592-70b110ff9731c07d@postgresql.org
Discussion: https://postgr.es/m/CAPpHfdv0EzVhf6CWfB1_TTZqXV_2Sn-jSY3zSd7ePH%3D-%2B1V2DQ%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Tom Lane, Neil Chen
2021-01-31 20:14:29 +03:00
Peter Eisentraut
dfb75e478c Add primary keys and unique constraints to system catalogs
For those system catalogs that have a unique indexes, make a primary
key and unique constraint, using ALTER TABLE ... PRIMARY KEY/UNIQUE
USING INDEX.

This can be helpful for GUI tools that look for a primary key, and it
might in the future allow declaring foreign keys, for making schema
diagrams.

The constraint creation statements are automatically created by
genbki.pl from DECLARE_UNIQUE_INDEX directives.  To specify which one
of the available unique indexes is the primary key, use the new
directive DECLARE_UNIQUE_INDEX_PKEY instead.  By convention, we
usually make a catalog's OID column its primary key, if it has one.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/dc5f44d9-5ec1-a596-0251-dadadcdede98@2ndquadrant.com
2021-01-30 19:44:29 +01:00
Peter Eisentraut
6aaaa76bb4 Allow GRANTED BY clause in normal GRANT and REVOKE statements
The SQL standard allows a GRANTED BY clause on GRANT and
REVOKE (privilege) statements that can specify CURRENT_USER or
CURRENT_ROLE.  In PostgreSQL, both of these are the default behavior.
Since we already have all the parsing support for this for the
GRANT (role) statement, we might as well add basic support for this
for the privilege variant as well.  This allows us to check off SQL
feature T332.  In the future, perhaps more interesting things could be
done with this, too.

Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/f2feac44-b4c5-f38f-3699-2851d6a76dc9@2ndquadrant.com
2021-01-30 09:45:11 +01:00
Noah Misch
7da83415e5 Revive "snapshot too old" with wal_level=minimal and SET TABLESPACE.
Given a permanent relation rewritten in the current transaction, the
old_snapshot_threshold mechanism assumed the relation had never been
subject to early pruning.  Hence, a query could fail to report "snapshot
too old" when the rewrite followed an early truncation.  ALTER TABLE SET
TABLESPACE is probably the only rewrite mechanism capable of exposing
this bug.  REINDEX sets indcheckxmin, avoiding the problem.  CLUSTER has
zeroed page LSNs since before old_snapshot_threshold existed, so
old_snapshot_threshold has never cooperated with it.  ALTER TABLE
... SET DATA TYPE makes the table look empty to every past snapshot,
which is strictly worse.  Back-patch to v13, where commit
c6b92041d3 broke this.

Kyotaro Horiguchi and Noah Misch

Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com
2021-01-30 00:12:18 -08:00
Noah Misch
360bd2321b Fix error with CREATE PUBLICATION, wal_level=minimal, and new tables.
CREATE PUBLICATION has failed spuriously when applied to a permanent
relation created or rewritten in the current transaction.  Make the same
change to another site having the same semantic intent; the second
instance has no user-visible consequences.  Back-patch to v13, where
commit c6b92041d3 broke this.

Kyotaro Horiguchi

Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com
2021-01-30 00:11:38 -08:00
Noah Misch
8a54e12a38 Fix CREATE INDEX CONCURRENTLY for simultaneous prepared transactions.
In a cluster having used CREATE INDEX CONCURRENTLY while having enabled
prepared transactions, queries that use the resulting index can silently
fail to find rows.  Fix this for future CREATE INDEX CONCURRENTLY by
making it wait for prepared transactions like it waits for ordinary
transactions.  This expands the VirtualTransactionId structure domain to
admit prepared transactions.  It may be necessary to reindex to recover
from past occurrences.  Back-patch to 9.5 (all supported versions).

Andrey Borodin, reviewed (in earlier versions) by Tom Lane and Michael
Paquier.

Discussion: https://postgr.es/m/2E712143-97F7-4890-B470-4A35142ABC82@yandex-team.ru
2021-01-30 00:00:27 -08:00
Michael Paquier
24843297a9 Adjust comments of CheckRelationTableSpaceMove() and SetRelationTableSpace()
4c9c359, that introduced those two functions, has been overoptimistic on
the point that only ShareUpdateExclusiveLock would be required when
moving a relation to a new tablespace.  AccessExclusiveLock is a
requirement, but ShareUpdateExclusiveLock may be used under specific
conditions like REINDEX CONCURRENTLY where waits on past transactions
make the operation safe even with a lower-level lock.  The current code
does only the former, so update the existing comments to reflect that.

Once a REINDEX (TABLESPACE) is introduced, those comments would require
an extra refresh to mention their new use case.

While on it, fix an incorrect variable name.

Per discussion with Álvaro Herrera.

Discussion: https://postgr.es/m/20210127140741.GA14174@alvherre.pgsql
2021-01-29 13:59:18 +09:00
Thomas Munro
514b411a2b Retire pg_standby.
pg_standby was useful more than a decade ago, but now it is obsolete.
It has been proposed that we retire it many times.  Now seems like a
good time to finally do it, because "waiting restore commands"
are incompatible with a proposed recovery prefetching feature.

Discussion: https://postgr.es/m/20201029024412.GP5380%40telsasoft.com
Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
2021-01-29 14:09:41 +13:00
Tom Lane
1046dbedde Silence another gcc 11 warning.
Per buildfarm and local experimentation, bleeding-edge gcc isn't
convinced that the MemSet in reorder_function_arguments() is safe.
Shut it up by adding an explicit check that pronargs isn't negative,
and by changing MemSet to memset.  (It appears that either change is
enough to quiet the warning at -O2, but let's do both to be sure.)
2021-01-28 17:19:16 -05:00
Alvaro Herrera
6f5c8a8ec2
Remove bogus restriction from BEFORE UPDATE triggers
In trying to protect the user from inconsistent behavior, commit
487e9861d0 "Enable BEFORE row-level triggers for partitioned tables"
tried to prevent BEFORE UPDATE FOR EACH ROW triggers from moving the row
from one partition to another.  However, it turns out that the
restriction is wrong in two ways: first, it fails spuriously, preventing
valid situations from working, as in bug #16794; and second, they don't
protect from any misbehavior, because tuple routing would cope anyway.

Fix by removing that restriction.

We keep the same restriction on BEFORE INSERT FOR EACH ROW triggers,
though.  It is valid and useful there.  In the future we could remove it
by having tuple reroute work for inserts as it does for updates.

Backpatch to 13.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Phillip Menke <pg@pmenke.de>
Discussion: https://postgr.es/m/16794-350a655580fbb9ae@postgresql.org
2021-01-28 16:56:07 -03:00
Tom Lane
1d9351a87c Fix hash partition pruning with asymmetric partition sets.
perform_pruning_combine_step() was not taught about the number of
partition indexes used in hash partitioning; more embarrassingly,
get_matching_hash_bounds() also had it wrong.  These errors are masked
in the common case where all the partitions have the same modulus
and no partition is missing.  However, with missing or unequal-size
partitions, we could erroneously prune some partitions that need
to be scanned, leading to silently wrong query answers.

While a minimal-footprint fix for this could be to export
get_partition_bound_num_indexes and make the incorrect functions use it,
I'm of the opinion that that function should never have existed in the
first place.  It's not reasonable data structure design that
PartitionBoundInfoData lacks any explicit record of the length of
its indexes[] array.  Perhaps that was all right when it could always
be assumed equal to ndatums, but something should have been done about
it as soon as that stopped being true.  Putting in an explicit
"nindexes" field makes both partition_bounds_equal() and
partition_bounds_copy() simpler, safer, and faster than before,
and removes explicit knowledge of the number-of-partition-indexes
rules from some other places too.

This change also makes get_hash_partition_greatest_modulus obsolete.
I left that in place in case any external code uses it, but no core
code does anymore.

Per bug #16840 from Michał Albrycht.  Back-patch to v11 where the
hash partitioning code came in.  (In the back branches, add the new
field at the end of PartitionBoundInfoData to minimize ABI risks.)

Discussion: https://postgr.es/m/16840-571a22976f829ad4@postgresql.org
2021-01-28 13:41:55 -05:00
Heikki Linnakangas
6c5576075b Add direct conversion routines between EUC_TW and Big5.
Conversions between EUC_TW and Big5 were previously implemented by
converting the whole input to MIC first, and then from MIC to the target
encoding. Implement functions to convert directly between the two.

The reason to do this now is that I'm working on a patch that will change
the conversion function signature so that if the input is invalid, we
convert as much as we can and return the number of bytes successfully
converted. That's not possible if we use an intermediary format, because
if an error happens in the intermediary -> final conversion, we lose track
of the location of the invalid character in the original input. Avoiding
the intermediate step makes the conversions faster, too.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/b9e3167f-f84b-7aa4-5738-be578a4db924%40iki.fi
2021-01-28 14:53:03 +02:00
Heikki Linnakangas
b80e10638e Add mbverifystr() functions specific to each encoding.
This makes pg_verify_mbstr() function faster, by allowing more efficient
encoding-specific implementations. All the implementations included in
this commit are pretty naive, they just call the same encoding-specific
verifychar functions that were used previously, but that already gives a
performance boost because the tight character-at-a-time loop is simpler.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01@iki.fi
2021-01-28 14:40:07 +02:00
Andrew Gierth
a3367aa3c4 Don't add bailout adjustment for non-strict deserialize calls.
When building aggregate expression steps, strict checks need a bailout
jump for when a null value is encountered, so there is a list of steps
that require later adjustment. Adding entries to that list for steps
that aren't actually strict would be harmless, except that there is an
Assert which catches them. This leads to spurious errors on asserts
builds, for data sets that trigger parallel aggregation of an
aggregate with a non-strict deserialization function (no such
aggregates exist in the core system).

Repair by not adding the adjustment entry when it's not needed.

Backpatch back to 11 where the code was introduced.

Per a report from Darafei (Komzpa) of the PostGIS project; analysis
and patch by me.

Discussion: https://postgr.es/m/87mty7peb3.fsf@news-spur.riddles.org.uk
2021-01-28 10:53:10 +00:00
Michael Paquier
f854c69a5b Refactor SQL functions of SHA-2 in cryptohashfuncs.c
The same code pattern was repeated four times when compiling a SHA-2
hash.  This refactoring has the advantage to issue a compilation warning
if a new value is added to pg_cryptohash_type, so as anybody doing an
addition in this area would need to consider if support for a new SQL
function is needed or not.

Author: Sehrope Sarkuni, Michael Paquier
Discussion: https://postgr.es/m/YA7DvLRn2xnTgsMc@paquier.xyz
2021-01-28 16:13:26 +09:00
Peter Geoghegan
e19594c5c0 Reduce the default value of vacuum_cost_page_miss.
When commit f425b605 introduced cost based vacuum delays back in 2004,
the defaults reflected then-current trends in hardware, as well as
certain historical limitations in PostgreSQL.  There have been enormous
improvements in both areas since that time.  The cost limit GUC defaults
finally became much more representative of current trends following
commit cbccac37, which decreased autovacuum_vacuum_cost_delay's default
by 10x for PostgreSQL 12 (it went from 20ms to only 2ms).

The relative costs have shifted too.  This should also be accounted for
by the defaults.  More specifically, the relative importance of avoiding
dirtying pages within VACUUM has greatly increased, primarily due to
main memory capacity scaling and trends in flash storage.  Within
Postgres itself, improvements like sequential access during index
vacuuming (at least in nbtree and GiST indexes) have also been
contributing factors.

To reflect all this, decrease the default of vacuum_cost_page_miss to 2.
Since the default of vacuum_cost_page_dirty remains 20, dirtying a page
is now considered 10x "costlier" than a page miss by default.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmLPFnkWT8xMjmcsm7YS3+_Qi3iRWAb2+_Bc8UhVyHfuA@mail.gmail.com
2021-01-27 15:11:13 -08:00
Robert Haas
69059d3b2f In TrimCLOG(), don't reset XactCtl->shared->latest_page_number.
Since the CLOG page number is not recorded directly in the checkpoint
record, we have to use ShmemVariableCache->nextXid to figure out the
latest CLOG page number at the start of recovery. However, as recovery
progresses, replay of CLOG/EXTEND records will update our notion of
the latest page number, and we should rely on that being accurate
rather than recomputing the value based on an updated notion of
nextXid. ShmemVariableCache->nextXid is only an approximation
during recovery anyway, whereas CLOG/EXTEND records are an
authoritative representation of how the SLRU has been updated.

Commit 0fcc2decd4 makes this
simplification possible, as before that change clog_redo() might
have injected a bogus value here, and we'd want to get rid of
that before entering normal running.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 15:52:34 -05:00
Robert Haas
0fcc2decd4 In clog_redo(), don't set XactCtl->shared->latest_page_number.
The comment is no longer accurate, and hasn't been entirely accurate
since Hot Standby was introduced. The original idea here was that
StartupCLOG() wouldn't be called until the end of recovery and
therefore this value would be uninitialized when this code is reached,
but Hot Standby made that true only when hot_standby=off, and commit
1f113abdf8 means that this value is now
always initialized before replay even starts.

The original purpose of this code was to bypass the sanity check
in SimpleLruTruncate(), which will no longer occur: now, if something
is wrong, that sanity check might trip during recovery. That's
probably a good thing, because in the current code base
latest_page_number should always be initialized and therefore we
expect that the sanity check should pass. If it doesn't, something
has gone wrong, and complaining about it is appropriate.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 13:11:30 -05:00
Robert Haas
1f113abdf8 Move StartupCLOG() calls to just after we initialize ShmemVariableCache.
Previously, the hot_standby=off code path did this at end of recovery,
while the hot_standby=on code path did it at the beginning of recovery.
It's better to do this in only one place because (a) it's simpler,
(b) StartupCLOG() is trivial so trying to postpone the work isn't
useful, and (c) this will make it possible to simplify some other
logic.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 12:20:46 -05:00
Peter Geoghegan
e42b3c3bd6 Fix GiST index deletion assert issue.
Avoid calling heap_index_delete_tuples() with an empty deltids array to
avoid an assertion failure.

This issue was arguably an oversight in commit b5f58cf2, though the
failing assert itself was added by my recent commit d168b666.  No
backpatch, though, since the oversight is harmless in the back branches.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec>
Discussion: https://postgr.es/m/CAJKUy5jscES84n3puE=sYngyF+zpb4wv8UMtuLnLPv5z=6yyNw@mail.gmail.com
2021-01-26 23:24:37 -08:00
Michael Paquier
4c9c359d38 Refactor code in tablecmds.c to check and process tablespace moves
Two code paths of tablecmds.c (for relations with storage and without
storage) use the same logic to check if the move of a relation to a
new tablespace is allowed or not and to update pg_class.reltablespace
and pg_class.relfilenode.

A potential TABLESPACE clause for REINDEX, CLUSTER and VACUUM FULL needs
similar checks to make sure that nothing is moved around in illegal ways
(no mapped relations, shared relations only in pg_global, no move of
temp tables owned by other backends).

This reorganizes the existing code of ALTER TABLE so as all this logic
is controlled by two new routines that can be reused for the other
commands able to move relations across tablespaces, limiting the number
of code paths in need of the same protections.  This also removes some
code that was duplicated for tables with and without storage for ALTER
TABLE.

Author: Alexey Kondratov, Michael Paquier
Discussion: https://postgr.es/m/YA+9mAMWYLXJMVPL@paquier.xyz
2021-01-27 11:54:16 +09:00
Tom Lane
d5a83d79c9 Rethink recently-added SPI interfaces.
SPI_execute_with_receiver and SPI_cursor_parse_open_with_paramlist are
new in v14 (cf. commit 2f48ede08).  Before they can get out the door,
let's change their APIs to follow the practice recently established by
SPI_prepare_extended etc: shove all optional arguments into a struct
that callers are supposed to pre-zero.  The hope is to allow future
addition of more options without either API breakage or a continuing
proliferation of new SPI entry points.  With that in mind, choose
slightly more generic names for them: SPI_execute_extended and
SPI_cursor_parse_open respectively.

Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
2021-01-26 16:37:12 -05:00
Tom Lane
ee895a655c Improve performance of repeated CALLs within plpgsql procedures.
This patch essentially is cleaning up technical debt left behind
by the original implementation of plpgsql procedures, particularly
commit d92bc83c4.  That patch (or more precisely, follow-on patches
fixing its worst bugs) forced us to re-plan CALL and DO statements
each time through, if we're in a non-atomic context.  That wasn't
for any fundamental reason, but just because use of a saved plan
requires having a ResourceOwner to hold a reference count for the
plan, and we had no suitable resowner at hand, nor would the
available APIs support using one if we did.  While it's not that
expensive to create a "plan" for CALL/DO, the cycles do add up
in repeated executions.

This patch therefore makes the following API changes:

* GetCachedPlan/ReleaseCachedPlan are modified to let the caller
specify which resowner to use to pin the plan, rather than forcing
use of CurrentResourceOwner.

* spi.c gains a "SPI_execute_plan_extended" entry point that lets
callers say which resowner to use to pin the plan.  This borrows the
idea of an options struct from the recently added SPI_prepare_extended,
hopefully allowing future options to be added without more API breaks.
This supersedes SPI_execute_plan_with_paramlist (which I've marked
deprecated) as well as SPI_execute_plan_with_receiver (which is new
in v14, so I just took it out altogether).

* I also took the opportunity to remove the crude hack of letting
plpgsql reach into SPI private data structures to mark SPI plans as
"no_snapshot".  It's better to treat that as an option of
SPI_prepare_extended.

Now, when running a non-atomic procedure or DO block that contains
any CALL or DO commands, plpgsql creates a ResourceOwner that
will be used to pin the plans of the CALL/DO commands.  (In an
atomic context, we just use CurrentResourceOwner, as before.)
Having done this, we can just save CALL/DO plans normally,
whether or not they are used across transaction boundaries.
This seems to be good for something like 2X speedup of a CALL
of a trivial procedure with a few simple argument expressions.
By restricting the creation of an extra ResourceOwner like this,
there's essentially zero penalty in cases that can't benefit.

Pavel Stehule, with some further hacking by me

Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
2021-01-25 22:28:29 -05:00
Andres Freund
55ef8555f0 Fix two typos in snapbuild.c.
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/c94be044-818f-15e3-1ad3-7a7ae2dfed0a@iki.fi
2021-01-25 12:15:10 -08:00
Tom Lane
07d46fceb4 Fix broken ruleutils support for function TRANSFORM clauses.
I chanced to notice that this dumped core due to a faulty Assert.
To add insult to injury, the output has been misformatted since v11.
Obviously we need some regression testing here.

Discussion: https://postgr.es/m/d1cc628c-3953-4209-957b-29427acc38c8@www.fastmail.com
2021-01-25 13:03:43 -05:00
Robert Haas
d18e75664a Remove CheckpointLock.
Up until now, we've held this lock when performing a checkpoint or
restartpoint, but commit 076a055acf back
in 2004 and commit 7e48b77b1c from 2009,
taken together, have removed all need for this. In the present code,
there's only ever one process entitled to attempt a checkpoint: either
the checkpointer, during normal operation, or the postmaster, during
single-user operation. So, we don't need the lock.

One possible concern in making this change is that it means that
a substantial amount of code where HOLD_INTERRUPTS() was previously
in effect due to the preceding LWLockAcquire() will now be
running without that. This could mean that ProcessInterrupts()
gets called in places from which it didn't before. However, this
seems unlikely to do very much, because the checkpointer doesn't
have any signal mapped to die(), so it's not clear how,
for example, ProcDiePending = true could happen in the first
place. Similarly with ClientConnectionLost and recovery conflicts.

Also, if there are any such problems, we might want to fix them
rather than reverting this, since running lots of code with
interrupt handling suspended is generally bad.

Patch by me, per an inquiry by Amul Sul. Review by Tom Lane
and Michael Paquier.

Discussion: http://postgr.es/m/CAAJ_b97XnBBfYeSREDJorFsyoD1sHgqnNuCi=02mNQBUMnA=FA@mail.gmail.com
2021-01-25 12:34:38 -05:00
David Rowley
16dfe253e3 Fix hypothetical bug in heap backward scans
Both heapgettup() and heapgettup_pagemode() incorrectly set the first page
to scan in a backward scan in which the number of pages to scan was
specified by heap_setscanlimits().  The code incorrectly started the scan
at the end of the relation when startBlk was 0, or otherwise at
startBlk - 1, neither of which is correct when only scanning a subset of
pages.

The fix here checks if heap_setscanlimits() has changed the number of
pages to scan and if so we set the first page to scan as the final page in
the specified range during backward scans.

Proper adjustment of this code was forgotten when heap_setscanlimits() was
added in 7516f5259 back in 9.5.  However, practice, nowhere in core code
performs backward scans after having used heap_setscanlimits(), yet, it is
possible an extension uses the heap functions in this way, hence
backpatch.

An upcoming patch does use heap_setscanlimits() with backward scans, so
this must be fixed before that can go in.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvpGc9h0_oVD2CtgBcxCS1N-qDYZSeBRnUh+0CWJA9cMaA@mail.gmail.com
Backpatch-through: 9.5, all supported versions
2021-01-25 19:52:18 +13:00
Amit Kapila
40ab64c1ec Fix ALTER PUBLICATION...DROP TABLE behavior.
Commit 69bd60672 fixed the initialization of streamed transactions for
RelationSyncEntry. It forgot to initialize the publication actions while
invalidating the RelationSyncEntry due to which even though the relation
is dropped from a particular publication we still publish its changes. Fix
it by initializing pubactions when entry got invalidated.

Author: Japin Li and Bharath Rupireddy
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CALj2ACV+0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw@mail.gmail.com
2021-01-25 07:39:29 +05:30
Tomas Vondra
39b66a91bd Fix COPY FREEZE with CLOBBER_CACHE_ALWAYS
This adds code omitted from commit 7db0cd2145 by accident, which had
two consequences. Firstly, only rows inserted by heap_multi_insert were
frozen as expected when running COPY FREEZE, while heap_insert left
rows unfrozen. That however includes rows in TOAST tables, so a lot of
data might have been left unfrozen. Secondly, page might have been left
partially empty after relcache invalidation.

This addresses both of those issues.

Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
2021-01-24 01:08:11 +01:00
Tom Lane
7cd9765f9b Re-allow DISTINCT in pl/pgsql expressions.
I'd omitted this from the grammar in commit c9d529848, figuring that
it wasn't worth supporting.  However we already have one complaint,
so it seems that judgment was wrong.  It doesn't require a huge
amount of code, so add it back.  (I'm still drawing the line at
UNION/INTERSECT/EXCEPT though: those'd require an unreasonable
amount of grammar refactoring, and the single-result-row restriction
makes them near useless anyway.)

Also rethink the documentation: this behavior is a property of
all pl/pgsql expressions, not just assignments.

Discussion: https://postgr.es/m/20210122134106.e94c5cd7@mail.verfriemelt.org
2021-01-22 16:26:22 -05:00
Peter Eisentraut
09418bed67 Remove bogus tracepoint
Calls to LWLockWaitForVar() fired the TRACE_POSTGRESQL_LWLOCK_ACQUIRE
tracepoint, but LWLockWaitForVar() never actually acquires the LWLock.
(Probably a copy/paste bug in 68a2e52bbaf.)  Remove it.

Author: Craig Ringer <craig.ringer@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAGRY4nxJo+-HCC2i5H93ttSZ4gZO-FSddCwvkb-qAfQ1zdXd1w@mail.gmail.com
2021-01-22 11:58:21 +01:00
Michael Paquier
af0e79c8f4 Move SSL information callback earlier to capture more information
The callback for retrieving state change information during connection
setup was only installed when the connection was mostly set up, and
thus didn't provide much information and missed all the details related
to the handshake.

This also extends the callback with SSL_state_string_long() to print
more information about the state change within the SSL object handled.

While there, fix some comments which were incorrectly referring to the
callback and its previous location in fe-secure.c.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/232CF476-94E1-42F1-9408-719E2AEC5491@yesql.se
2021-01-22 09:26:27 +09:00
Tom Lane
55dc86eca7 Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar.
Previously, pull_varnos() took the relids of a PlaceHolderVar as being
equal to the relids in its contents, but that fails to account for the
possibility that we have to postpone evaluation of the PHV due to outer
joins.  This could result in a malformed plan.  The known cases end up
triggering the "failed to assign all NestLoopParams to plan nodes"
sanity check in createplan.c, but other symptoms may be possible.

The right value to use is the join level we actually intend to evaluate
the PHV at.  We can get that from the ph_eval_at field of the associated
PlaceHolderInfo.  However, there are some places that call pull_varnos()
before the PlaceHolderInfos have been created; in that case, fall back
to the conservative assumption that the PHV will be evaluated at its
syntactic level.  (In principle this might result in missing some legal
optimization, but I'm not aware of any cases where it's an issue in
practice.)  Things are also a bit ticklish for calls occurring during
deconstruct_jointree(), but AFAICS the ph_eval_at fields should have
reached their final values by the time we need them.

The main problem in making this work is that pull_varnos() has no
way to get at the PlaceHolderInfos.  We can fix that easily, if a
bit tediously, in HEAD by passing it the planner "root" pointer.
In the back branches that'd cause an unacceptable API/ABI break for
extensions, so leave the existing entry points alone and add new ones
with the additional parameter.  (If an old entry point is called and
encounters a PHV, it'll fall back to using the syntactic level,
again possibly missing some valid optimization.)

Back-patch to v12.  The computation is surely also wrong before that,
but it appears that we cannot reach a bad plan thanks to join order
restrictions imposed on the subquery that the PlaceHolderVar came from.
The error only became reachable when commit 4be058fe9 allowed trivial
subqueries to be collapsed out completely, eliminating their join order
restrictions.

Per report from Stephan Springl.

Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
2021-01-21 15:37:23 -05:00
Tomas Vondra
920f853dc9 Fix initialization of FDW batching in ExecInitModifyTable
ExecInitModifyTable has to initialize batching for all result relations,
not just the first one. Furthermore, when junk filters were necessary,
the pointer pointed past the mtstate->resultRelInfo array.

Per reports from multiple non-x86 animals (florican, locust, ...).

Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-01-21 03:34:32 +01:00
Tomas Vondra
b663a41363 Implement support for bulk inserts in postgres_fdw
Extends the FDW API to allow batching inserts into foreign tables. That
is usually much more efficient than inserting individual rows, due to
high latency for each round-trip to the foreign server.

It was possible to implement something similar in the regular FDW API,
but it was inconvenient and there were issues with reporting the number
of actually inserted rows etc. This extends the FDW API with two new
functions:

* GetForeignModifyBatchSize - allows the FDW picking optimal batch size

* ExecForeignBatchInsert - inserts a batch of rows at once

Currently, only INSERT queries support batching. Support for DELETE and
UPDATE may be added in the future.

This also implements batching for postgres_fdw. The batch size may be
specified using "batch_size" option both at the server and table level.

The initial patch version was written by me, but it was rewritten and
improved in many ways by Takayuki Tsunakawa.

Author: Takayuki Tsunakawa
Reviewed-by: Tomas Vondra, Amit Langote
Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-01-20 23:57:27 +01:00
Heikki Linnakangas
6b4d3046f4 Fix bug in detecting concurrent page splits in GiST insert
In commit 9eb5607e69, I got the condition on checking for split or
deleted page wrong: I used && instead of ||. The comment correctly said
"concurrent split _or_ deletion".

As a result, GiST insertion could miss a concurrent split, and insert to
wrong page. Duncan Sands demonstrated this with a test script that did a
lot of concurrent inserts.

Backpatch to v12, where this was introduced. REINDEX is required to fix
indexes that were affected by this bug.

Backpatch-through: 12
Reported-by: Duncan Sands
Discussion: https://www.postgresql.org/message-id/a9690483-6c6c-3c82-c8ba-dc1a40848f11%40deepbluecap.com
2021-01-20 11:58:03 +02:00
Michael Paquier
21378e1fef Fix ALTER DEFAULT PRIVILEGES with duplicated objects
Specifying duplicated objects in this command would lead to unique
constraint violations in pg_default_acl or "tuple already updated by
self" errors.  Similarly to GRANT/REVOKE, increment the command ID after
each subcommand processing to allow this case to work transparently.

A regression test is added by tweaking one of the existing queries of
privileges.sql to stress this case.

Reported-by: Andrus
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/ae2a7dc1-9d71-8cba-3bb9-e4cb7eb1f44e@hot.ee
Backpatch-through: 9.5
2021-01-20 11:38:17 +09:00
Tom Lane
a0efda88a6 Remove faulty support for MergeAppend plan with WHERE CURRENT OF.
Somebody extended search_plan_tree() to treat MergeAppend exactly
like Append, which is 100% wrong, because unlike Append we can't
assume that only one input node is actively returning tuples.
Hence a cursor using a MergeAppend across a UNION ALL or inheritance
tree could falsely match a WHERE CURRENT OF query at a row that
isn't actually the cursor's current output row, but coincidentally
has the same TID (in a different table) as the current output row.

Delete the faulty code; this means that such a case will now return
an error like 'cursor "foo" is not a simply updatable scan of table
"bar"', instead of silently misbehaving.  Users should not find that
surprising though, as the same cursor query could have failed that way
already depending on the chosen plan.  (It would fail like that if the
sort were done with an explicit Sort node instead of MergeAppend.)

Expand the clearly-inadequate commentary to be more explicit about
what this code is doing, in hopes of forestalling future mistakes.

It's been like this for awhile, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us
2021-01-19 13:25:33 -05:00
Amit Kapila
ed43677e20 pgindent worker.c.
This is a leftover from commit 0926e96c49. Changing this separately
because this file is being modified for upcoming patch logical replication
of 2PC.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Ps+EgG8KzcmAyAgBUi_vuTps6o9ZA8DG6SdnO0-YuOhPQ@mail.gmail.com
2021-01-19 08:10:13 +05:30
Tom Lane
60661bbf2d Avoid crash with WHERE CURRENT OF and a custom scan plan.
execCurrent.c's search_plan_tree() assumed that ForeignScanStates
and CustomScanStates necessarily have a valid ss_currentRelation.
This is demonstrably untrue for postgres_fdw's remote join and
remote aggregation plans, and non-leaf custom scans might not have
an identifiable scan relation either.  Avoid crashing by ignoring
such nodes when the field is null.

This solution will lead to errors like 'cursor "foo" is not a
simply updatable scan of table "bar"' in cases where maybe we
could have allowed WHERE CURRENT OF to work.  That's not an issue
for postgres_fdw's usages, since joins or aggregations would render
WHERE CURRENT OF invalid anyway.  But an otherwise-transparent
upper level custom scan node might find this annoying.  When and if
someone cares to expend work on such a scenario, we could invent a
custom-scan-provider callback to determine what's safe.

Report and patch by David Geier, commentary by me.  It's been like
this for awhile, so back-patch to all supported branches.

Discussion: https://postgr.es/m/0253344d-9bdd-11c4-7f0d-d88c02cd7991@swarm64.com
2021-01-18 18:32:30 -05:00
Tom Lane
3fd80c728d Narrow the scope of a local variable.
This is better style and more symmetrical with the other if-branch.
This likely should have been included in 9de77b545 (which created
the opportunity), but it was overlooked.

Japin Li

Discussion: https://postgr.es/m/MEYP282MB16699FA4A7CD57EB250E871FB6A40@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2021-01-18 15:55:01 -05:00
Tom Lane
a6cf3df4eb Add bytea equivalents of ltrim() and rtrim().
We had bytea btrim() already, but for some reason not the other two.

Joel Jacobson

Discussion: https://postgr.es/m/d10cd5cd-a901-42f1-b832-763ac6f7ff3a@www.fastmail.com
2021-01-18 15:11:32 -05:00
Robert Haas
a3ed4d1efe Allow for error or refusal while absorbing a ProcSignalBarrier.
Previously, the per-barrier-type functions tasked with absorbing
them were expected to always succeed and never throw an error.
However, that's a bit inconvenient. Further study has revealed that
there are realistic cases where it might not be possible to absorb
a ProcSignalBarrier without terminating the transaction, or even
the whole backend. Similarly, for some barrier types, there might
be other reasons where it's not reasonably possible to absorb the
barrier at certain points in the code, so provide a way for a
per-barrier-type function to reject absorbing the barrier.

Unfortunately, there's still no committed code making use of this
infrastructure; hopefully, we'll get there. :-(

Patch by me, reviewed by Andres Freund and Amul Sul.

Discussion: http://postgr.es/m/20200908182005.xya7wetdh3pndzim@alap3.anarazel.de
Discussion: http://postgr.es/m/CA+Tgmob56Pk1-5aTJdVPCWFHon7me4M96ENpGe9n_R4JUjjhZA@mail.gmail.com
2021-01-18 12:09:52 -05:00
Peter Eisentraut
15251c0a60 Pause recovery for insufficient parameter settings
When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

This patch changes this behavior for hot standbys to pause recovery at
that point instead.  That allows read traffic on the standby to
continue while database administrators figure out next steps.  When
recovery is unpaused, the server shuts down (as before).  The idea is
to fix the parameters while recovery is paused and then restart when
there is a maintenance window.

Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
2021-01-18 09:04:04 +01:00
Michael Paquier
a3dc926009 Refactor option handling of CLUSTER, REINDEX and VACUUM
This continues the work done in b5913f6.  All the options of those
commands are changed to use hex values rather than enums to reduce the
risk of compatibility bugs when introducing new options.  Each option
set is moved into a new structure that can be extended with more
non-boolean options (this was already the case of VACUUM).  The code of
REINDEX is restructured so as manual REINDEX commands go through a
single routine from utility.c, like VACUUM, to ease the allocation
handling of option parameters when a command needs to go through
multiple transactions.

This can be used as a base infrastructure for future patches related to
those commands, including reindex filtering and tablespace support.

Per discussion with people mentioned below, as well as Alvaro Herrera
and Peter Eisentraut.

Author: Michael Paquier, Justin Pryzby
Reviewed-by: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/X8riynBLwxAD9uKk@paquier.xyz
2021-01-18 14:03:10 +09:00
Tomas Vondra
7db0cd2145 Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE
Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the
visibility map. Until now we only marked individual tuples as frozen,
but page-level flags were not updated, so the first VACUUM after the
COPY FREEZE had to rewrite the whole table.

This is a fairly old patch, and multiple people worked on it. The first
version was written by Jeff Janes, and then reworked by Pavan Deolasee
and Anastasia Lubennikova.

Author: Anastasia Lubennikova, Pavan Deolasee, Jeff Janes
Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada,
             Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii,
             Darafei Praliaskouski
Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com
2021-01-17 22:28:26 +01:00
Magnus Hagander
960869da08 Add pg_stat_database counters for sessions and session time
This add counters for number of sessions, the different kind of session
termination types, and timers for how much time is spent in active vs
idle in a database to pg_stat_database.

Internally this also renames the parameter "force" to disconnect. This
was the only use-case for the parameter before, so repurposing it to
this mroe narrow usecase makes things cleaner than inventing something
new.

Author: Laurenz Albe
Reviewed-By: Magnus Hagander, Soumyadeep Chakraborty, Masahiro Ikeda
Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.camel@cybertec.at
2021-01-17 13:52:31 +01:00
Noah Misch
6db992833c Prevent excess SimpleLruTruncate() deletion.
Every core SLRU wraps around.  With the exception of pg_notify, the wrap
point can fall in the middle of a page.  Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback.  Update each callback implementation to fit the new
specification.  This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.)  The bug fixed here has the same symptoms and user
followup steps as 592a589a04.  Back-patch
to 9.5 (all supported versions).

Reviewed by Andrey Borodin and (in earlier versions) by Tom Lane.

Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
2021-01-16 12:21:35 -08:00
Amit Kapila
c95765f476 Remove unnecessary pstrdup in fetch_table_list.
The result of TextDatumGetCString is already palloc'ed so we don't need to
allocate memory for it again. We decide not to backpatch it as there
doesn't seem to be any case where it can create a meaningful leak.

Author: Zhijie Hou
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/229fed2eb8c54c71a96ccb99e516eb12@G08CNEXMBPEKD05.g08.fujitsu.local
2021-01-16 10:15:32 +05:30
Tomas Vondra
c9a0dc3486 Disallow CREATE STATISTICS on system catalogs
Add a check that CREATE STATISTICS does not add extended statistics on
system catalogs, similarly to indexes etc.  It can be overriden using
the allow_system_table_mods GUC.

This bug exists since 7b504eb282, adding the extended statistics, so
backpatch all the way back to PostgreSQL 10.

Author: Tomas Vondra
Reported-by: Dean Rasheed
Backpatch-through: 10
Discussion: https://postgr.es/m/CAEZATCXAPrrOKwEsyZKQ4uzzJQWBCt6QAvOcgqRGdWwT1zb%2BrQ%40mail.gmail.com
2021-01-15 23:31:22 +01:00
Alvaro Herrera
f9900df5f9
Avoid spurious wait in concurrent reindex
This is like commit c98763bf51, but for REINDEX CONCURRENTLY.  To wit:
this flags indicates that the current process is safe to ignore for the
purposes of waiting for other snapshots, when doing CREATE INDEX
CONCURRENTLY and REINDEX CONCURRENTLY.  This helps two processes doing
either of those things not deadlock, and also avoids spurious waits.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
2021-01-15 10:31:42 -03:00
Fujii Masao
2ad78a87f0 Fix calculation of how much shared memory is required to store a TOC.
Commit ac883ac453 refactored shm_toc_estimate() but changed its calculation
of shared memory size for TOC incorrectly. Previously this could cause too
large memory to be allocated.

Back-patch to v11 where the bug was introduced.

Author: Takayuki Tsunakawa
Discussion: https://postgr.es/m/TYAPR01MB2990BFB73170E2C4921E2C4DFEA80@TYAPR01MB2990.jpnprd01.prod.outlook.com
2021-01-15 12:44:17 +09:00
Michael Paquier
5ae1572993 Fix O(N^2) stat() calls when recycling WAL segments
The counter tracking the last segment number recycled was getting
initialized when recycling one single segment, while it should be used
across a full cycle of segments recycled to prevent useless checks
related to entries already recycled.

This performance issue has been introduced by b2a5545, and it was first
implemented in 61b86142.

No backpatch is done per the lack of field complaints.

Reported-by: Andres Freund, Thomas Munro
Author: Michael Paquier
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20170621211016.eln6cxxp3jrv7m4m@alap3.anarazel.de
Discussion: https://postgr.es/m/CA+hUKG+DRiF9z1_MU4fWq+RfJMxP7zjoptfcmuCFPeO4JM2iVg@mail.gmail.com
2021-01-15 10:33:13 +09:00
Alvaro Herrera
ebfe2dbd6b
Prevent drop of tablespaces used by partitioned relations
When a tablespace is used in a partitioned relation (per commits
ca4103025d in pg12 for tables and 33e6c34c32 in pg11 for indexes),
it is possible to drop the tablespace, potentially causing various
problems.  One such was reported in bug #16577, where a rewriting ALTER
TABLE causes a server crash.

Protect against this by using pg_shdepend to keep track of tablespaces
when used for relations that don't keep physical files; we now abort a
tablespace if we see that the tablespace is referenced from any
partitioned relations.

Backpatch this to 11, where this problem has been latent all along.  We
don't try to create pg_shdepend entries for existing partitioned
indexes/tables, but any ones that are modified going forward will be
protected.

Note slight behavior change: when trying to drop a tablespace that
contains both regular tables as well as partitioned ones, you'd
previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST.  Arguably, the latter is more
correct.

It is possible to add protecting pg_shdepend entries for existing
tables/indexes, by doing
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
for each partitioned table/index that is not in the database default
tablespace.  Because these partitioned objects do not have storage, no
file needs to be actually moved, so it shouldn't take more time than
what's required to acquire locks.

This query can be used to search for such relations:
SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2021-01-14 15:32:14 -03:00
Fujii Masao
fef5b47f6b Ensure that a standby is able to follow a primary on a newer timeline.
Commit 709d003fbd refactored WAL-reading code, but accidentally caused
WalSndSegmentOpen() to fail to follow a timeline switch while reading from
a historic timeline. This issue caused a standby to fail to follow a primary
on a newer timeline when WAL archiving is enabled.

If there is a timeline switch within the segment, WalSndSegmentOpen() should
read from the WAL segment belonging to the new timeline. But previously
since it failed to follow a timeline switch, it tried to read the WAL segment
with old timeline. When WAL archiving is enabled, that WAL segment with
old timeline doesn't exist because it's renamed to .partial. This leads
a primary to have tried to read non-existent WAL segment, and which caused
replication to faill with the error "ERROR:  requested WAL segment ... has
 already been removed".

This commit fixes WalSndSegmentOpen() so that it's able to follow a timeline
switch, to ensure that a standby is able to follow a primary on a newer
timeline even when WAL archiving is enabled.

This commit also adds the regression test to check whether a standby is
able to follow a primary on a newer timeline when WAL archiving is enabled.

Back-patch to v13 where the bug was introduced.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi, tweaked by Fujii Masao
Reviewed-by:  Alvaro Herrera, Fujii Masao
Discussion: https://postgr.es/m/20201209.174314.282492377848029776.horikyota.ntt@gmail.com
2021-01-14 12:27:11 +09:00
Michael Paquier
aef8948f38 Rework refactoring of hex and encoding routines
This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected.  Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing.  This is moved to src/common/ and
it gains the same overflow checks as the decoding part.

On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries.  Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.

While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection.  The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code.  Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks.  The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.

Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
2021-01-14 11:13:24 +09:00
Thomas Munro
0d56acfbaa Move our p{read,write}v replacements into their own files.
macOS's ranlib issued a warning about an empty pread.o file with the
previous arrangement, on systems new enough to require no replacement
functions.  Let's go back to using configure's AC_REPLACE_FUNCS system
to build and include each .o in the library only if it's needed, which
requires moving the *v() functions to their own files.

Also move the _with_retry() wrapper to a more permanent home.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1283127.1610554395%40sss.pgh.pa.us
2021-01-14 11:16:59 +13:00
Peter Geoghegan
d168b66682 Enhance nbtree index tuple deletion.
Teach nbtree and heapam to cooperate in order to eagerly remove
duplicate tuples representing dead MVCC versions.  This is "bottom-up
deletion".  Each bottom-up deletion pass is triggered lazily in response
to a flood of versions on an nbtree leaf page.  This usually involves a
"logically unchanged index" hint (these are produced by the executor
mechanism added by commit 9dc718bd).

The immediate goal of bottom-up index deletion is to avoid "unnecessary"
page splits caused entirely by version duplicates.  It naturally has an
even more useful effect, though: it acts as a backstop against
accumulating an excessive number of index tuple versions for any given
_logical row_.  Bottom-up index deletion complements what we might now
call "top-down index deletion": index vacuuming performed by VACUUM.
Bottom-up index deletion responds to the immediate local needs of
queries, while leaving it up to autovacuum to perform infrequent clean
sweeps of the index.  The overall effect is to avoid certain
pathological performance issues related to "version churn" from UPDATEs.

The previous tableam interface used by index AMs to perform tuple
deletion (the table_compute_xid_horizon_for_tuples() function) has been
replaced with a new interface that supports certain new requirements.
Many (perhaps all) of the capabilities added to nbtree by this commit
could also be extended to other index AMs.  That is left as work for a
later commit.

Extend deletion of LP_DEAD-marked index tuples in nbtree by adding logic
to consider extra index tuples (that are not LP_DEAD-marked) for
deletion in passing.  This increases the number of index tuples deleted
significantly in many cases.  The LP_DEAD deletion process (which is now
called "simple deletion" to clearly distinguish it from bottom-up
deletion) won't usually need to visit any extra table blocks to check
these extra tuples.  We have to visit the same table blocks anyway to
generate a latestRemovedXid value (at least in the common case where the
index deletion operation's WAL record needs such a value).

Testing has shown that the "extra tuples" simple deletion enhancement
increases the number of index tuples deleted with almost any workload
that has LP_DEAD bits set in leaf pages.  That is, it almost never fails
to delete at least a few extra index tuples.  It helps most of all in
cases that happen to naturally have a lot of delete-safe tuples.  It's
not uncommon for an individual deletion operation to end up deleting an
order of magnitude more index tuples compared to the old naive approach
(e.g., custom instrumentation of the patch shows that this happens
fairly often when the regression tests are run).

Add a further enhancement that augments simple deletion and bottom-up
deletion in indexes that make use of deduplication: Teach nbtree's
_bt_delitems_delete() function to support granular TID deletion in
posting list tuples.  It is now possible to delete individual TIDs from
posting list tuples provided the TIDs have a tableam block number of a
table block that gets visited as part of the deletion process (visiting
the table block can be triggered directly or indirectly).  Setting the
LP_DEAD bit of a posting list tuple is still an all-or-nothing thing,
but that matters much less now that deletion only needs to start out
with the right _general_ idea about which index tuples are deletable.

Bump XLOG_PAGE_MAGIC because xl_btree_delete changed.

No bump in BTREE_VERSION, since there are no changes to the on-disk
representation of nbtree indexes.  Indexes built on PostgreSQL 12 or
PostgreSQL 13 will automatically benefit from bottom-up index deletion
(i.e. no reindexing required) following a pg_upgrade.  The enhancement
to simple deletion is available with all B-Tree indexes following a
pg_upgrade, no matter what PostgreSQL version the user upgrades from.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com
2021-01-13 09:21:32 -08:00
Peter Geoghegan
9dc718bdf2 Pass down "logically unchanged index" hint.
Add an executor aminsert() hint mechanism that informs index AMs that
the incoming index tuple (the tuple that accompanies the hint) is not
being inserted by execution of an SQL statement that logically modifies
any of the index's key columns.

The hint is received by indexes when an UPDATE takes place that does not
apply an optimization like heapam's HOT (though only for indexes where
all key columns are logically unchanged).  Any index tuple that receives
the hint on insert is expected to be a duplicate of at least one
existing older version that is needed for the same logical row.  Related
versions will typically be stored on the same index page, at least
within index AMs that apply the hint.

Recognizing the difference between MVCC version churn duplicates and
true logical row duplicates at the index AM level can help with cleanup
of garbage index tuples.  Cleanup can intelligently target tuples that
are likely to be garbage, without wasting too many cycles on less
promising tuples/pages (index pages with little or no version churn).

This is infrastructure for an upcoming commit that will teach nbtree to
perform bottom-up index deletion.  No index AM actually applies the hint
just yet.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=CEKFa74EScx_hFVshCOn6AA5T-ajFASTdzipdkLTNQQ@mail.gmail.com
2021-01-13 08:11:00 -08:00
Fujii Masao
39b03690b5 Log long wait time on recovery conflict when it's resolved.
This is a follow-up of the work done in commit 0650ff2303. This commit
extends log_recovery_conflict_waits so that a log message is produced
also when recovery conflict has already been resolved after deadlock_timeout
passes, i.e., when the startup process finishes waiting for recovery
conflict after deadlock_timeout. This is useful in investigating how long
recovery conflicts prevented the recovery from applying WAL.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Bertrand Drouvot
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
2021-01-13 22:59:17 +09:00
Amit Kapila
ee1b38f659 Fix memory leak in SnapBuildSerialize.
The memory for the snapshot was leaked while serializing it to disk during
logical decoding. This memory will be freed only once walsender stops
streaming the changes. This can lead to a huge memory increase when master
logs Standby Snapshot too frequently say when the user is trying to create
many replication slots.

Reported-by: funnyxj.fxj@alibaba-inc.com
Diagnosed-by: funnyxj.fxj@alibaba-inc.com
Author: Amit Kapila
Backpatch-through: 9.5
Discussion: https://postgr.es/m/033ab54c-6393-42ee-8ec9-2b399b5d8cde.funnyxj.fxj@alibaba-inc.com
2021-01-13 08:19:50 +05:30
Amit Kapila
bea449c635 Optimize DropRelFileNodesAllBuffers() for recovery.
Similar to commit d6ad34f341, this patch optimizes
DropRelFileNodesAllBuffers() by avoiding the complete buffer pool scan and
instead find the buffers to be invalidated by doing lookups in the
BufMapping table.

This optimization helps operations where the relation files need to be
removed like Truncate, Drop, Abort of Create Table, etc.

Author: Kirk Jamison
Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com
2021-01-13 07:46:11 +05:30
Michael Paquier
fce7d0e6ef Fix routine name in comment of catcache.c
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACUDXLAkf_XxQO9tAUtnTNGi3Lmd8fANd+vBJbcHn1HoWA@mail.gmail.com
2021-01-13 10:32:21 +09:00
Alvaro Herrera
c6c4b37395
Invent struct ReindexIndexInfo
This struct is used by ReindexRelationConcurrently to keep track of the
relations to process.  This saves having to obtain some data repeatedly,
and has future uses as well.

Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
2021-01-12 17:05:06 -03:00
Alvaro Herrera
a3e51a36b7
Fix thinko in comment
This comment has been wrong since its introduction in commit
2c03216d83.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAzz6qipFJBbGEaHmyWxvvNDp8httbwLR9tUQWaTjUs2Q@mail.gmail.com
2021-01-12 11:48:45 -03:00
Amit Kapila
044aa9e70e Fix relation descriptor leak.
We missed closing the relation descriptor while sending changes via the
root of partitioned relations during logical replication.

Author: Amit Langote and Mark Zhao
Reviewed-by: Amit Kapila and Ashutosh Bapat
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/tencent_41FEA657C206F19AB4F406BE9252A0F69C06@qq.com
Discussion: https://postgr.es/m/tencent_6E296D2F7D70AFC90D83353B69187C3AA507@qq.com
2021-01-12 08:19:39 +05:30
Amit Kapila
d6ad34f341 Optimize DropRelFileNodeBuffers() for recovery.
The recovery path of DropRelFileNodeBuffers() is optimized so that
scanning of the whole buffer pool can be avoided when the number of
blocks to be truncated in a relation is below a certain threshold. For
such cases, we find the buffers by doing lookups in BufMapping table.
This improves the performance by more than 100 times in many cases
when several small tables (tested with 1000 relations) are truncated
and where the server is configured with a large value of shared
buffers (greater than equal to 100GB).

This optimization helps cases (a) when vacuum or autovacuum truncated off
any of the empty pages at the end of a relation, or (b) when the relation is
truncated in the same transaction in which it was created.

This commit introduces a new API smgrnblocks_cached which returns a cached
value for the number of blocks in a relation fork. This helps us to determine
the exact size of relation which is required to apply this optimization. The
exact size is required to ensure that we don't leave any buffer for the
relation being dropped as otherwise the background writer or checkpointer
can lead to a PANIC error while flushing buffers corresponding to files that
don't exist.

Author: Kirk Jamison based on ideas by Amit Kapila
Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com
2021-01-12 07:45:40 +05:30
Tom Lane
4edf96846a Rethink SQLSTATE code for ERRCODE_IDLE_SESSION_TIMEOUT.
Move it to class 57 (Operator Intervention), which seems like a
better choice given that from the client's standpoint it behaves
a heck of a lot like, e.g., ERRCODE_ADMIN_SHUTDOWN.

In a green field I'd put ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT
here as well.  But that's been around for a few years, so it's
probably too late to change its SQLSTATE code.

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-11 14:53:42 -05:00
Thomas Munro
ce6a71fa53 Use vectored I/O to fill new WAL segments.
Instead of making many block-sized write() calls to fill a new WAL file
with zeroes, make a smaller number of pwritev() calls (or various
emulations).  The actual number depends on the OS's IOV_MAX, which
PG_IOV_MAX currently caps at 32.  That means we'll write 256kB per call
on typical systems.  We may want to tune the number later with more
experience.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
2021-01-11 15:28:31 +13:00
Tom Lane
afcc8772ed Fix ancient bug in parsing of BRE-mode regular expressions.
brenext(), when parsing a '*' quantifier, forgot to return any "value"
for the token; per the equivalent case in next(), it should return
value 1 to indicate that greedy rather than non-greedy behavior is
wanted.  The result is that the compiled regexp could behave like 'x*?'
rather than the intended 'x*', if we were unlucky enough to have
a zero in v->nextvalue at this point.  That seems to happen with some
reliability if we have '.*' at the beginning of a BRE-mode regexp,
although that depends on the initial contents of a stack-allocated
struct, so it's not guaranteed to fail.

Found by Alexander Lakhin using valgrind testing.  This bug seems
to be aboriginal in Spencer's code, so back-patch all the way.

Discussion: https://postgr.es/m/16814-6c5e3edd2bdf0d50@postgresql.org
2021-01-08 12:16:00 -05:00
Tom Lane
b8d0cda533 Further second thoughts about idle_session_timeout patch.
On reflection, the order of operations in PostgresMain() is wrong.
These timeouts ought to be shut down before, not after, we do the
post-command-read CHECK_FOR_INTERRUPTS, to guarantee that any
timeout error will be detected there rather than at some ill-defined
later point (possibly after having wasted a lot of work).

This is really an error in the original idle_in_transaction_timeout
patch, so back-patch to 9.6 where that was introduced.
2021-01-07 11:45:23 -05:00
Fujii Masao
0650ff2303 Add GUC to log long wait times on recovery conflicts.
This commit adds GUC log_recovery_conflict_waits that controls whether
a log message is produced when the startup process is waiting longer than
deadlock_timeout for recovery conflicts. This is useful in determining
if recovery conflicts prevent the recovery from applying WAL.

Note that currently a log message is produced only when recovery conflict
has not been resolved yet even after deadlock_timeout passes, i.e.,
only when the startup process is still waiting for recovery conflict
even after deadlock_timeout.

Author: Bertrand Drouvot, Masahiko Sawada
Reviewed-by: Alvaro Herrera, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
2021-01-08 00:47:03 +09:00
Tom Lane
9486e7b666 Improve commentary in timeout.c.
On re-reading I realized that I'd missed one race condition in the new
timeout code.  It's safe, but add a comment explaining it.

Discussion: https://postgr.es/m/CA+hUKG+o6pbuHBJSGnud=TadsuXySWA7CCcPgCt2QE9F6_4iHQ@mail.gmail.com
2021-01-06 22:09:16 -05:00
Tom Lane
9877374bef Add idle_session_timeout.
This GUC variable works much like idle_in_transaction_session_timeout,
in that it kills sessions that have waited too long for a new client
query.  But it applies when we're not in a transaction, rather than
when we are.

Li Japin, reviewed by David Johnston and Hayato Kuroda, some
fixes by me

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-06 18:28:52 -05:00
Tom Lane
09cf1d5226 Improve timeout.c's handling of repeated timeout set/cancel.
A very common usage pattern is that we set a timeout that we don't
expect to reach, cancel it after a little bit, and later repeat.
With the original implementation of timeout.c, this results in one
setitimer() call per timeout set or cancel.  We can do a lot better
by being lazy about changing the timeout interrupt request, namely:
(1) never cancel the outstanding interrupt, even when we have no
active timeout events;
(2) if we need to set an interrupt, but there already is one pending
at or before the required time, leave it alone.  When the interrupt
happens, the signal handler will reschedule it at whatever time is
then needed.

For example, with a one-second setting for statement_timeout, this
method results in having to interact with the kernel only a little
more than once a second, no matter how many statements we execute
in between.  The mainline code might never call setitimer() at all
after the first time, while each time the signal handler fires,
it sees that the then-pending request is most of a second away,
and that's when it sets the next interrupt request for.  Each
mainline timeout-set request after that will observe that the time
it wants is past the pending interrupt request time, and do nothing.

This also works pretty well for cases where a few different timeout
lengths are in use, as long as none of them are very short.  But
that describes our usage well.

Idea and original patch by Thomas Munro; I fixed a race condition
and improved the comments.

Discussion: https://postgr.es/m/CA+hUKG+o6pbuHBJSGnud=TadsuXySWA7CCcPgCt2QE9F6_4iHQ@mail.gmail.com
2021-01-06 18:28:52 -05:00
Tomas Vondra
8a4f618e7a Report progress of COPY commands
This commit introduces a view pg_stat_progress_copy, reporting progress
of COPY commands.  This allows rough estimates how far a running COPY
progressed, with the caveat that the total number of bytes may not be
available in some cases (e.g. when the input comes from the client).

Author: Josef Šimánek
Reviewed-by: Fujii Masao, Bharath Rupireddy, Vignesh C, Matthias van de Meent
Discussion: https://postgr.es/m/CAFp7QwqMGEi4OyyaLEK9DR0+E+oK3UtA4bEjDVCa4bNkwUY2PQ@mail.gmail.com
Discussion: https://postgr.es/m/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg+FEKet4XaTGSW=LitKQ@mail.gmail.com
2021-01-06 21:51:06 +01:00
Peter Eisentraut
4656e3d668 Replace CLOBBER_CACHE_ALWAYS with run-time GUC
Forced cache invalidation (CLOBBER_CACHE_ALWAYS) has been impractical
to use for testing in PostgreSQL because it's so slow and because it's
toggled on/off only at build time.  It is helpful when hunting bugs in
any code that uses the sycache/relcache because causes cache
invalidations to be injected whenever it would be possible for an
invalidation to occur, whether or not one was really pending.

Address this by providing run-time control over cache clobber
behaviour using the new debug_invalidate_system_caches_always GUC.
Support is not compiled in at all unless assertions are enabled or
CLOBBER_CACHE_ENABLED is explicitly defined at compile time.  It
defaults to 0 if compiled in, so it has negligible effect on assert
build performance by default.

When support is compiled in, test code can now set
debug_invalidate_system_caches_always=1 locally to a backend to test
specific queries, functions, extensions, etc.  Or tests can toggle it
globally for a specific test case while retaining normal performance
during test setup and teardown.

For backwards compatibility with existing test harnesses and scripts,
debug_invalidate_system_caches_always defaults to 1 if
CLOBBER_CACHE_ALWAYS is defined, and to 3 if CLOBBER_CACHE_RECURSIVE
is defined.

CLOBBER_CACHE_ENABLED is now visible in pg_config_manual.h, as is the
related RECOVER_RELATION_BUILD_MEMORY setting for the relcache.

Author: Craig Ringer <craig.ringer@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMsr+YF=+ctXBZj3ywmvKNUjWpxmuTuUKuv-rgbHGX5i5pLstQ@mail.gmail.com
2021-01-06 10:46:44 +01:00
Fujii Masao
8900b5a9d5 Detect the deadlocks between backends and the startup process.
The deadlocks that the recovery conflict on lock is involved in can
happen between hot-standby backends and the startup process.
If a backend takes an access exclusive lock on the table and which
finally triggers the deadlock, that deadlock can be detected
as expected. On the other hand, previously, if the startup process
took an access exclusive lock and which finally triggered the deadlock,
that deadlock could not be detected and could remain even after
deadlock_timeout passed. This is a bug.

The cause of this bug was that the code for handling the recovery
conflict on lock didn't take care of deadlock case at all. It assumed
that deadlocks involving the startup process and backends were able
to be detected by the deadlock detector invoked within backends.
But this assumption was incorrect. The startup process also should
have invoked the deadlock detector if necessary.

To fix this bug, this commit makes the startup process invoke
the deadlock detector if deadlock_timeout is reached while handling
the recovery conflict on lock. Specifically, in that case, the startup
process requests all the backends holding the conflicting locks to
check themselves for deadlocks.

Back-patch to v9.6. v9.5 has also this bug, but per discussion we decided
not to back-patch the fix to v9.5. Because v9.5 doesn't have some
infrastructure codes (e.g., 37c54863cf) that this bug fix patch depends on.
We can apply those codes for the back-patch, but since the next minor
version release is the final one for v9.5, it's risky to do that. If we
unexpectedly introduce new bug to v9.5 by the back-patch, there is no
chance to fix that. We determined that the back-patch to v9.5 would give
more risk than gain.

Author: Fujii Masao
Reviewed-by: Bertrand Drouvot, Masahiko Sawada, Kyotaro Horiguchi
Discussion: https://postgr.es/m/4041d6b6-cf24-a120-36fa-1294220f8243@oss.nttdata.com
2021-01-06 12:39:18 +09:00
Amit Kapila
e02e840ff7 Fix typos in decode.c and logical.c.
Per report by Ajin Cherian in email:
https://postgr.es/m/CAFPTHDYnRKDvzgDxoMn_CKqXA-D0MtrbyJvfvjBsO4G=UHDXkg@mail.gmail.com
2021-01-06 08:56:19 +05:30
Tom Lane
bf8a662c9a Introduce a new GUC_REPORT setting "in_hot_standby".
Aside from being queriable via SHOW, this value is sent to the client
immediately at session startup, and again later on if the server gets
promoted to primary during the session.  The immediate report will be
used in an upcoming patch to avoid an extra round trip when trying to
connect to a primary server.

Haribabu Kommi, Greg Nancarrow, Tom Lane; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.

Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
2021-01-05 16:18:05 -05:00
Dean Rasheed
fead67c24a Add an explicit cast to double when using fabs().
Commit bc43b7c2c0 used fabs() directly on an int variable, which
apparently requires an explicit cast on some platforms.

Per buildfarm.
2021-01-05 11:52:42 +00:00
Dean Rasheed
bc43b7c2c0 Fix numeric_power() when the exponent is INT_MIN.
In power_var_int(), the computation of the number of significant
digits to use in the computation used log(Abs(exp)), which isn't safe
because Abs(exp) returns INT_MIN when exp is INT_MIN. Use fabs()
instead of Abs(), so that the exponent is cast to a double before the
absolute value is taken.

Back-patch to 9.6, where this was introduced (by 7d9a4737c2).

Discussion: https://postgr.es/m/CAEZATCVd6pMkz=BrZEgBKyqqJrt2xghr=fNc8+Z=5xC6cgWrWA@mail.gmail.com
2021-01-05 11:15:28 +00:00
Peter Geoghegan
83e3239ee7 Standardize one aspect of rmgr desc output.
Bring heap and hash rmgr desc output in line with nbtree and GiST desc
output by using the name latestRemovedXid for all fields that output the
contents of the latestRemovedXid field from the WAL record's C struct
(stop using local variants).

This seems like a clear improvement because latestRemovedXid is a symbol
name that already appears across many different source files, and so is
probably much more recognizable.

Discussion: https://postgr.es/m/CAH2-Wzkt_Rs4VqPSCk87nyjPAAEmWL8STU9zgET_83EF5YfrLw@mail.gmail.com
2021-01-04 19:46:11 -08:00
Amit Kapila
cd357c7629 Fix typo in origin.c.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+PsReyuvww_Fn1NN_Vsv0wBP1bnzuhzRFr_2=y1nNZrG7w@mail.gmail.com
2021-01-05 08:05:08 +05:30
Amit Kapila
9da2224ea2 Fix typo in reorderbuffer.c.
Author: Zhijie Hou
Reviewed-by: Sawada Masahiko
Discussion: https://postgr.es/m/ba88bb58aaf14284abca16aec04bf279@G08CNEXMBPEKD05.g08.fujitsu.local
2021-01-05 07:56:40 +05:30
Thomas Munro
034510c820 Replace remaining uses of "whitelist".
Instead describe the action that the list effects, or just use "list"
where the meaning is obvious from context.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue%40alap3.anarazel.de
2021-01-05 14:00:16 +13:00
Thomas Munro
c0d4f6d897 Rename "enum blacklist" to "uncommitted enums".
We agreed to remove this terminology and use something more descriptive.

Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue%40alap3.anarazel.de
2021-01-05 12:38:48 +13:00
Tom Lane
4bd3fad80e Fix integer-overflow corner cases in substring() functions.
If the substring start index and length overflow when added together,
substring() misbehaved, either throwing a bogus "negative substring
length" error on a case that should succeed, or failing to complain that
a negative length is negative (and instead returning the whole string,
in most cases).  Unsurprisingly, the text, bytea, and bit variants of
the function all had this issue.  Rearrange the logic to ensure that
negative lengths are always rejected, and add an overflow check to
handle the other case.

Also install similar guards into detoast_attr_slice() (nee
heap_tuple_untoast_attr_slice()), since it's far from clear that
no other code paths leading to that function could pass it values
that would overflow.

Patch by myself and Pavel Stehule, per bug #16804 from Rafi Shamim.

Back-patch to v11.  While these bugs are old, the common/int.h
infrastructure for overflow-detecting arithmetic didn't exist before
commit 4d6ad3125, and it doesn't seem like these misbehaviors are bad
enough to justify developing a standalone fix for the older branches.

Discussion: https://postgr.es/m/16804-f4eeeb6c11ba71d4@postgresql.org
2021-01-04 18:32:44 -05:00
Tom Lane
1c1cbe279b Rethink the "read/write parameter" mechanism in pl/pgsql.
Performance issues with the preceding patch to re-implement array
element assignment within pl/pgsql led me to realize that the read/write
parameter mechanism is misdesigned.  Instead of requiring the assignment
source expression to be such that *all* its references to the target
variable could be passed as R/W, we really want to identify *one*
reference to the target variable to be passed as R/W, allowing any other
ones to be passed read/only as they would be by default.  As long as the
R/W reference is a direct argument to the top-level (hence last to be
executed) function in the expression, there is no harm in R/O references
being passed to other lower parts of the expression.  Nor is there any
use-case for more than one argument of the top-level function being R/W.

Hence, rewrite that logic to identify one single Param that references
the target variable, and make only that Param pass a read/write
reference, not any other Params referencing the target variable.

Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
2021-01-04 12:39:27 -05:00
Tom Lane
c9d5298485 Re-implement pl/pgsql's expression and assignment parsing.
Invent new RawParseModes that allow the core grammar to handle
pl/pgsql expressions and assignments directly, and thereby get rid
of a lot of hackery in pl/pgsql's parser.  This moves a good deal
of knowledge about pl/pgsql into the core code: notably, we have to
invent a CoercionContext that matches pl/pgsql's (rather dubious)
historical behavior for assignment coercions.  That's getting away
from the original idea of pl/pgsql as an arm's-length extension of
the core, but really we crossed that bridge a long time ago.

The main advantage of doing this is that we can now use the core
parser to generate FieldStore and/or SubscriptingRef nodes to handle
assignments to pl/pgsql variables that are records or arrays.  That
fixes a number of cases that had never been implemented in pl/pgsql
assignment, such as nested records and array slicing, and it allows
pl/pgsql assignment to support the datatype-specific subscripting
behaviors introduced in commit c7aba7c14.

There are cosmetic benefits too: when a syntax error occurs in a
pl/pgsql expression, the error report no longer includes the confusing
"SELECT" keyword that used to get prefixed to the expression text.
Also, there seem to be some small speed gains.

Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
2021-01-04 11:52:00 -05:00
Tom Lane
844fe9f159 Add the ability for the core grammar to have more than one parse target.
This patch essentially allows gram.y to implement a family of related
syntax trees, rather than necessarily always parsing a list of SQL
statements.  raw_parser() gains a new argument, enum RawParseMode,
to say what to do.  As proof of concept, add a mode that just parses
a TypeName without any other decoration, and use that to greatly
simplify typeStringToTypeName().

In addition, invent a new SPI entry point SPI_prepare_extended() to
allow SPI users (particularly plpgsql) to get at this new functionality.
In hopes of making this the last variant of SPI_prepare(), set up its
additional arguments as a struct rather than direct arguments, and
promise that future additions to the struct can default to zero.
SPI_prepare_cursor() and SPI_prepare_params() can perhaps go away at
some point.

Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
2021-01-04 11:03:22 -05:00
Michael Paquier
b49154b3b7 Simplify some comments in xml.c
Author: Justin Pryzby
Discussion: https://postgr.es/m/X/Ff7jfnvJUab013@paquier.xyz
2021-01-04 19:47:58 +09:00
Amit Kapila
a271a1b50e Allow decoding at prepare time in ReorderBuffer.
This patch allows PREPARE-time decoding of two-phase transactions (if the
output plugin supports this capability), in which case the transactions
are replayed at PREPARE and then committed later when COMMIT PREPARED
arrives.

Now that we decode the changes before the commit, the concurrent aborts
may cause failures when the output plugin consults catalogs (both system
and user-defined).

We detect such failures with a special sqlerrcode
ERRCODE_TRANSACTION_ROLLBACK introduced by commit 7259736a6e and stop
decoding the remaining changes. Then we rollback the changes when rollback
prepared is encountered.

Author: Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich
Reviewed-by: Amit Kapila, Peter Smith, Sawada Masahiko, Arseny Sher, and Dilip Kumar
Tested-by: Takamichi Osumi
Discussion:
https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru
https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
2021-01-04 08:34:50 +05:30
Bruce Momjian
ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Peter Geoghegan
32d6287d2e Get heap page max offset with buffer lock held.
On further reflection it seems better to call PageGetMaxOffsetNumber()
after acquiring a buffer lock on the page.  This shouldn't really
matter, but doing it this way is cleaner.

Follow-up to commit 42288174.

Backpatch: 12-, just like commit 42288174
2020-12-30 17:21:42 -08:00
Peter Geoghegan
4228817449 Fix index deletion latestRemovedXid bug.
The logic for determining the latest removed XID for the purposes of
generating recovery conflicts in REDO routines was subtly broken.  It
failed to follow links from HOT chains, and so failed to consider all
relevant heap tuple headers in some cases.

To fix, expand the loop that deals with LP_REDIRECT line pointers to
also deal with HOT chains.  The new version of the loop is loosely based
on a similar loop from heap_prune_chain().

The impact of this bug is probably quite limited, since the horizon code
necessarily deals with heap tuples that are pointed to by LP_DEAD-set
index tuples.  The process of setting LP_DEAD index tuples (e.g. within
the kill_prior_tuple mechanism) is highly correlated with opportunistic
pruning of pointed-to heap tuples.  Plus the question of generating a
recovery conflict usually comes up some time after index tuple LP_DEAD
bits were initially set, unlike heap pruning, where a latestRemovedXid
is generated at the point of the pruning operation (heap pruning has no
deferred "would-be page split" style processing that produces conflicts
lazily).

Only backpatch to Postgres 12, the first version where this logic runs
during original execution (following commit 558a9165e0).  The index
latestRemovedXid mechanism has had the same bug since it first appeared
over 10 years ago (in commit a760893d), but backpatching to all
supported versions now seems like a bad idea on balance.  Running the
new improved code during recovery seems risky, especially given the lack
of complaints from the field.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=Eib393+HHcERK_9MtgNS7Ew1HY=RDC_g6GL46zM5C6Q@mail.gmail.com
Backpatch: 12-
2020-12-30 16:29:05 -08:00
Alexander Korotkov
16d531a30a Refactor multirange_in()
This commit preserves the logic of multirange_in() but makes it more clear
what's going on.  Also, this commit fixes the compiler warning spotted by the
buildfarm.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/2246043.1609290699%40sss.pgh.pa.us
2020-12-30 21:17:34 +03:00
Tom Lane
7ca37fb040 Use setenv() in preference to putenv().
Since at least 2001 we've used putenv() and avoided setenv(), on the
grounds that the latter was unportable and not in POSIX.  However,
POSIX added it that same year, and by now the situation has reversed:
setenv() is probably more portable than putenv(), since POSIX now
treats the latter as not being a core function.  And setenv() has
cleaner semantics too.  So, let's reverse that old policy.

This commit adds a simple src/port/ implementation of setenv() for
any stragglers (we have one in the buildfarm, but I'd not be surprised
if that code is never used in the field).  More importantly, extend
win32env.c to also support setenv().  Then, replace usages of putenv()
with setenv(), and get rid of some ad-hoc implementations of setenv()
wannabees.

Also, adjust our src/port/ implementation of unsetenv() to follow the
POSIX spec that it returns an error indicator, rather than returning
void as per the ancient BSD convention.  I don't feel a need to make
all the call sites check for errors, but the portability stub ought
to match real-world practice.

Discussion: https://postgr.es/m/2065122.1609212051@sss.pgh.pa.us
2020-12-30 12:56:06 -05:00
Alexander Korotkov
62097a4cc8 Fix selectivity estimation @> (anymultirange, anyrange) operator
Attempt to get selectivity estimation for @> (anymultirange, anyrange) operator
caused an error in buildfarm, because this operator was missed in switch()
of calc_hist_selectivity().  Fix that and also make regression tests reliably
check that selectivity estimation for (multi)ranges doesn't fall.  Previously,
whether we test selectivity estimation for (multi)ranges depended on
whether autovacuum managed to gather concurrently to the test.

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/X%2BwmgjRItuvHNBeV%40paquier.xyz
2020-12-30 20:31:15 +03:00
Tom Lane
860fe27ee1 Fix up usage of krb_server_keyfile GUC parameter.
secure_open_gssapi() installed the krb_server_keyfile setting as
KRB5_KTNAME unconditionally, so long as it's not empty.  However,
pg_GSS_recvauth() only installed it if KRB5_KTNAME wasn't set already,
leading to a troubling inconsistency: in theory, clients could see
different sets of server principal names depending on whether they
use GSSAPI encryption.  Always using krb_server_keyfile seems like
the right thing, so make both places do that.  Also fix up
secure_open_gssapi()'s lack of a check for setenv() failure ---
it's unlikely, surely, but security-critical actions are no place
to be sloppy.

Also improve the associated documentation.

This patch does nothing about secure_open_gssapi()'s use of setenv(),
and indeed causes pg_GSS_recvauth() to use it too.  That's nominally
against project portability rules, but since this code is only built
with --with-gssapi, I do not feel a need to do something about this
in the back branches.  A fix will be forthcoming for HEAD though.

Back-patch to v12 where GSSAPI encryption was introduced.  The
dubious behavior in pg_GSS_recvauth() goes back further, but it
didn't have anything to be inconsistent with, so let it be.

Discussion: https://postgr.es/m/2187460.1609263156@sss.pgh.pa.us
2020-12-30 11:38:42 -05:00
Michael Paquier
e665769e6d Sanitize IF NOT EXISTS in EXPLAIN for CTAS and matviews
IF NOT EXISTS was ignored when specified in an EXPLAIN query for CREATE
MATERIALIZED VIEW or CREATE TABLE AS.  Hence, if this clause was
specified, the caller would get a failure if the relation already
exists instead of a success with a NOTICE message.

This commit makes the behavior of IF NOT EXISTS in EXPLAIN consistent
with the non-EXPLAIN'd DDL queries, preventing a failure with IF NOT
EXISTS if the relation to-be-created already exists.  The skip is done
before the SELECT query used for the relation is planned or executed,
and a "dummy" plan is generated instead depending on the format used by
EXPLAIN.

Author: Bharath Rupireddy
Reviewed-by: Zhijie Hou, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVa3oJ9O_wcGd+FtHWZds04dEKcakxphGz5POVgD4wC7Q@mail.gmail.com
2020-12-30 21:23:24 +09:00
Amit Kapila
0aa8a01d04 Extend the output plugin API to allow decoding of prepared xacts.
This adds six methods to the output plugin API, adding support for
streaming changes of two-phase transactions at prepare time.

* begin_prepare
* filter_prepare
* prepare
* commit_prepared
* rollback_prepared
* stream_prepare

Most of this is a simple extension of the existing methods, with the
semantic difference that the transaction is not yet committed and maybe
aborted later.

Until now two-phase transactions were translated into regular transactions
on the subscriber, and the GID was not forwarded to it. None of the
two-phase commands were communicated to the subscriber.

This patch provides the infrastructure for logical decoding plugins to be
informed of two-phase commands Like PREPARE TRANSACTION, COMMIT PREPARED
and ROLLBACK PREPARED commands with the corresponding GID.

This also extends the 'test_decoding' plugin, implementing these new
methods.

This commit simply adds these new APIs and the upcoming patch to "allow
the decoding at prepare time in ReorderBuffer" will use these APIs.

Author: Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich
Reviewed-by: Amit Kapila, Peter Smith, Sawada Masahiko, and Dilip Kumar
Discussion:
https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru
https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
2020-12-30 16:17:26 +05:30
Tom Lane
1f9158ba48 Suppress log spam from multiple reports of SIGQUIT shutdown.
When the postmaster sends SIGQUIT to its children, there's no real
need for all the children to log that fact; the postmaster already
made a log entry about it, so adding perhaps dozens or hundreds of
child-process log entries adds nothing of value.  So, let's introduce
a new ereport level to specify "WARNING, but never send to log" and
use that for these messages.

Such a change wouldn't have been desirable before commit 7e784d1dc,
because if someone manually SIGQUIT's a backend, we *do* want to log
that.  But now we can tell the difference between a signal that was
issued by the postmaster and one that was not with reasonable
certainty.

While we're here, also clear error_context_stack before ereport'ing,
to prevent error callbacks from being invoked in the signal-handler
context.  This should reduce the odds of getting hung up while trying
to notify the client.

Per a suggestion from Andres Freund.

Discussion: https://postgr.es/m/20201225230331.hru3u6obyy6j53tk@alap3.anarazel.de
2020-12-29 18:02:38 -05:00
Alexander Korotkov
db6335b5b1 Add support of multirange matching to the existing range GiST indexes
6df7a9698b has introduced a set of operators between ranges and multiranges.
Existing GiST indexes for ranges could easily support majority of them.
This commit adds support for new operators to the existing range GiST indexes.
New operators resides the same strategy numbers as existing ones.  Appropriate
check function is determined using the subtype.

Catversion is bumped.
2020-12-29 23:36:43 +03:00
Alexander Korotkov
d1d61a8b23 Improve the signature of internal multirange functions
There is a set of *_internal() functions exposed in
include/utils/multirangetypes.h.  This commit improves the signatures of these
functions in two ways.
 * Add const qualifies where applicable.
 * Replace multirange typecache argument with range typecache argument.
   Multirange typecache was used solely to find the range typecache.  At the
   same time, range typecache is easier for the caller to find.
2020-12-29 23:35:38 +03:00
Alexander Korotkov
4d7684cc75 Implement operators for checking if the range contains a multirange
We have operators for checking if the multirange contains a range but don't
have the opposite.  This commit improves completeness of the operator set by
adding two new operators: @> (anyrange,anymultirange) and
<@(anymultirange,anyrange).

Catversion is bumped.
2020-12-29 23:35:33 +03:00
Alexander Korotkov
a5b81b6f00 Fix bugs in comparison functions for multirange_bsearch_match()
Two functions multirange_range_overlaps_bsearch_comparison() and
multirange_range_contains_bsearch_comparison() contain bugs of returning -1
instead of 1.  This commit fixes these bugs and adds corresponding regression
tests.
2020-12-29 23:35:26 +03:00
Tom Lane
3995c42498 Improve log messages related to pg_hba.conf not matching a connection.
Include details on whether GSS encryption has been activated;
since we added "hostgssenc" type HBA entries, that's relevant info.

Kyotaro Horiguchi and Tom Lane.  Back-patch to v12 where
GSS encryption was introduced.

Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
2020-12-28 17:58:58 -05:00
Tom Lane
622ae4621e Fix assorted issues in backend's GSSAPI encryption support.
Unrecoverable errors detected by GSSAPI encryption can't just be
reported with elog(ERROR) or elog(FATAL), because attempting to
send the error report to the client is likely to lead to infinite
recursion or loss of protocol sync.  Instead make this code do what
the SSL encryption code has long done, which is to just report any
such failure to the server log (with elevel COMMERROR), then pretend
we've lost the connection by returning errno = ECONNRESET.

Along the way, fix confusion about whether message translation is done
by pg_GSS_error() or its callers (the latter should do it), and make
the backend version of that function work more like the frontend
version.

Avoid allocating the port->gss struct until it's needed; we surely
don't need to allocate it in the postmaster.

Improve logging of "connection authorized" messages with GSS enabled.
(As part of this, I back-patched the code changes from dc11f31a1.)

Make BackendStatusShmemSize() account for the GSS-related space that
will be allocated by CreateSharedBackendStatus().  This omission
could possibly cause out-of-shared-memory problems with very high
max_connections settings.

Remove arbitrary, pointless restriction that only GSS authentication
can be used on a GSS-encrypted connection.

Improve documentation; notably, document the fact that libpq now
prefers GSS encryption over SSL encryption if both are possible.

Per report from Mikael Gustavsson.  Back-patch to v12 where
this code was introduced.

Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
2020-12-28 17:44:17 -05:00
Michael Paquier
643428c54b Fix inconsistent code with shared invalidations of snapshots
The code in charge of processing a single invalidation message has been
using since 568d413 the structure for relation mapping messages.  This
had fortunately no consequence as both locate the database ID at the
same location, but it could become a problem in the future if this area
of the code changes.

Author: Konstantin Knizhnik
Discussion: https://postgr.es/m/8044c223-4d3a-2cdb-42bf-29940840ce94@postgrespro.ru
Backpatch-through: 9.5
2020-12-28 22:16:49 +09:00
Bruce Momjian
3187ef7c46 Revert "Add key management system" (978f869b99) & later commits
The patch needs test cases, reorganization, and cfbot testing.
Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive)
and 08db7c63f3..ccbe34139b.

Reported-by: Tom Lane, Michael Paquier

Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
2020-12-27 21:37:42 -05:00
Jeff Davis
05c0258966 Fix bug #16784 in Disk-based Hash Aggregation.
Before processing tuples, agg_refill_hash_table() was setting all
pergroup pointers to NULL to signal to advance_aggregates() that it
should not attempt to advance groups that had spilled.

The problem was that it also set the pergroups for sorted grouping
sets to NULL, which caused rescanning to fail.

Instead, change agg_refill_hash_table() to only set the pergroups for
hashed grouping sets to NULL; and when compiling the expression, pass
doSort=false.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/16784-7ff169bf2c3d1588%40postgresql.org
Backpatch-through: 13
2020-12-26 17:25:30 -08:00
Bruce Momjian
ba6725df36 auth commands: list specific commands to install in Makefile
Previously I used Makefile functions.

Backpatch-through: master
2020-12-26 09:25:05 -05:00
Bruce Momjian
d7602afa2e Add scripts for retrieving the cluster file encryption key
Scripts are passphrase, direct, AWS, and two Yubikey ones.

Backpatch-through: master
2020-12-26 01:19:09 -05:00
Bruce Momjian
300e430c76 Allow ssl_passphrase_command to prompt the terminal
Previously the command could not access the terminal for a passphrase.

Backpatch-through: master
2020-12-25 20:41:06 -05:00
Noah Misch
08db7c63f3 Invalidate acl.c caches when pg_authid changes.
This makes existing sessions reflect "ALTER ROLE ... [NO]INHERIT" as
quickly as they have been reflecting "GRANT role_name".  Back-patch to
9.5 (all supported versions).

Reviewed by Nathan Bossart.

Discussion: https://postgr.es/m/20201221095028.GB3777719@rfd.leadboat.com
2020-12-25 10:41:59 -08:00
Bruce Momjian
978f869b99 Add key management system
This adds a key management system that stores (currently) two data
encryption keys of length 128, 192, or 256 bits.  The data keys are
AES256 encrypted using a key encryption key, and validated via GCM
cipher mode.  A command to obtain the key encryption key must be
specified at initdb time, and will be run at every database server
start.  New parameters allow a file descriptor open to the terminal to
be passed.  pg_upgrade support has also been added.

Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com
Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us

Author: Masahiko Sawada, me, Stephen Frost
2020-12-25 10:19:44 -05:00
Bruce Momjian
c3826f831e move hex_decode() to /common so it can be called from frontend
This allows removal of a copy of hex_decode() from ecpg, and will be
used by the soon-to-be added pg_alterckey command.

Backpatch-through: master
2020-12-24 17:25:48 -05:00
Tom Lane
7519bd16d1 Fix race condition between shutdown and unstarted background workers.
If a database shutdown (smart or fast) is commanded between the time
some process decides to request a new background worker and the time
that the postmaster can launch that worker, then nothing happens
because the postmaster won't launch any bgworkers once it's exited
PM_RUN state.  This is fine ... unless the requesting process is
waiting for that worker to finish (or even for it to start); in that
case the requestor is stuck, and only manual intervention will get us
to the point of being able to shut down.

To fix, cancel pending requests for workers when the postmaster sends
shutdown (SIGTERM) signals, and similarly cancel any new requests that
arrive after that point.  (We can optimize things slightly by only
doing the cancellation for workers that have waiters.)  To fit within
the existing bgworker APIs, the "cancel" is made to look like the
worker was started and immediately stopped, causing deregistration of
the bgworker entry.  Waiting processes would have to deal with
premature worker exit anyway, so this should introduce no bugs that
weren't there before.  We do have a side effect that registration
records for restartable bgworkers might disappear when theoretically
they should have remained in place; but since we're shutting down,
that shouldn't matter.

Back-patch to v10.  There might be value in putting this into 9.6
as well, but the management of bgworkers is a bit different there
(notably see 8ff518699) and I'm not convinced it's worth the effort
to validate the patch for that branch.

Discussion: https://postgr.es/m/661570.1608673226@sss.pgh.pa.us
2020-12-24 17:00:43 -05:00
Tom Lane
7e784d1dc1 Improve client error messages for immediate-stop situations.
Up to now, if the DBA issued "pg_ctl stop -m immediate", the message
sent to clients was the same as for a crash-and-restart situation.
This is confusing, not least because the message claims that the
database will soon be up again, something we have no business
predicting.

Improve things so that we can generate distinct messages for the two
cases (and also recognize an ad-hoc SIGQUIT, should somebody try that).
To do that, add a field to pmsignal.c's shared memory data structure
that the postmaster sets just before broadcasting SIGQUIT to its
children.  No interlocking seems to be necessary; the intervening
signal-sending and signal-receipt should sufficiently serialize accesses
to the field.  Hence, this isn't any riskier than the existing usages
of pmsignal.c.

We might in future extend this idea to improve other
postmaster-to-children signal scenarios, although none of them
currently seem to be as badly overloaded as SIGQUIT.

Discussion: https://postgr.es/m/559291.1608587013@sss.pgh.pa.us
2020-12-24 12:58:32 -05:00
Michael Paquier
90fbf7c57d Fix typos and grammar in docs and comments
This fixes several areas of the documentation and some comments in
matters of style, grammar, or even format.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20201222041153.GK30237@telsasoft.com
2020-12-24 17:05:49 +09:00
Michael Paquier
6db27037b9 Fix portability issues with parsing of recovery_target_xid
The parsing of this parameter has been using strtoul(), which is not
portable across platforms.  On most Unix platforms, unsigned long has a
size of 64 bits, while on Windows it is 32 bits.  It is common in
recovery scenarios to rely on the output of txid_current() or even the
newer pg_current_xact_id() to get a transaction ID for setting up
recovery_target_xid.  The value returned by those functions includes the
epoch in the computed result, which would cause strtoul() to fail where
unsigned long has a size of 32 bits once the epoch is incremented.

WAL records and 2PC data include only information about 32-bit XIDs and
it is not possible to have XIDs across more than one epoch, so
discarding the high bits from the transaction ID set has no impact on
recovery.  On the contrary, the use of strtoul() prevents a consistent
behavior across platforms depending on the size of unsigned long.

This commit changes the parsing of recovery_target_xid to use
pg_strtouint64() instead, available down to 9.6.  There is one TAP test
stressing recovery with recovery_target_xid, where a tweak based on
pg_reset{xlog,wal} is added to bump the XID epoch so as this change gets
tested, as per an idea from Alexander Lakhin.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/16780-107fd0c0385b1035@postgresql.org
Backpatch-through: 9.6
2020-12-23 12:51:22 +09:00
Tomas Vondra
1ca2eb1031 Improve find_em_expr_usable_for_sorting_rel comment
Clarify the relationship between find_em_expr_usable_for_sorting_rel and
prepare_sort_from_pathkeys, i.e. what restrictions need to be shared
between those two places.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com
2020-12-22 02:00:51 +01:00
Tomas Vondra
9aff4dc01f Don't search for volatile expr in find_em_expr_usable_for_sorting_rel
While prepare_sort_from_pathkeys has to be concerned about matching up
a volatile expression to the proper tlist entry, we don't need to do
that in find_em_expr_usable_for_sorting_rel becausee such a sort will
have to be postponed anyway.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com
2020-12-21 20:05:11 +01:00
Tomas Vondra
fac1b470a9 Disallow SRFs when considering sorts below Gather Merge
While we do allow SRFs in ORDER BY, scan/join processing should not
consider such cases - such sorts should only happen via final Sort atop
a ProjectSet. So make sure we don't try adding such sorts below Gather
Merge, just like we do for expressions that are volatile and/or not
parallel safe.

Backpatch to PostgreSQL 13, where this code was introduced as part of
the Incremental Sort patch.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
Discussion: https://postgr.es/m/295524.1606246314%40sss.pgh.pa.us
2020-12-21 19:36:22 +01:00
Tom Lane
ff5d5611c0 Remove "invalid concatenation of jsonb objects" error case.
The jsonb || jsonb operator arbitrarily rejected certain combinations
of scalar and non-scalar inputs, while being willing to concatenate
other combinations.  This was of course quite undocumented.  Rather
than trying to document it, let's just remove the restriction,
creating a uniform rule that unless we are handling an object-to-object
concatenation, non-array inputs are converted to one-element arrays,
resulting in an array-to-array concatenation.  (This does not change
the behavior for any case that didn't throw an error before.)

Per complaint from Joel Jacobson.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/163099.1608312033@sss.pgh.pa.us
2020-12-21 13:11:50 -05:00
Tomas Vondra
86b7cca72d Check parallel safety in generate_useful_gather_paths
Commit ebb7ae839d ensured we ignore pathkeys with volatile expressions
when considering adding a sort below a Gather Merge. Turns out we need
to care about parallel safety of the pathkeys too, otherwise we might
try sorting e.g. on results of a correlated subquery (as demonstrated
by a report from Luis Roberto).

Initial investigation by Tom Lane, patch by James Coleman. Backpatch
to 13, where the code was instroduced (as part of Incremental Sort).

Reported-by: Luis Roberto
Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/622580997.37108180.1604080457319.JavaMail.zimbra%40siscobra.com.br
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
2020-12-21 18:29:49 +01:00
Tomas Vondra
f4a3c0b062 Consider unsorted paths in generate_useful_gather_paths
generate_useful_gather_paths used to skip unsorted paths (without any
pathkeys), but that is unnecessary - the later code actually can handle
such paths just fine by adding a Sort node. This is clearly a thinko,
preventing construction of useful plans.

Backpatch to 13, where Incremental Sort was introduced.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
2020-12-21 18:10:20 +01:00
Alexander Korotkov
29f8f54676 Fix compiler warning in multirange_constructor0()
Discussion: https://postgr.es/m/X%2BBP8XE0UpIB6Yvh%40paquier.xyz
Author: Michael Paquier
2020-12-21 14:25:32 +03:00
Michael Paquier
93e8ff8701 Refactor logic to check for ASCII-only characters in string
The same logic was present for collation commands, SASLprep and
pgcrypto, so this removes some code.

Author: Michael Paquier
Reviewed-by: Stephen Frost, Heikki Linnakangas
Discussion: https://postgr.es/m/X9womIn6rne6Gud2@paquier.xyz
2020-12-21 09:37:11 +09:00
Alexander Korotkov
4e1ee79e31 Fix typalign in rangetypes statistics
6df7a9698b introduces multirange types, whose typanalyze function shares
infrastructure with range types typanalyze function.  Since 6df7a9698b,
information about type gathered by statistics is filled from typcache.
But typalign is mistakenly always set to double.  This commit fixes this
oversight.
2020-12-21 00:31:11 +03:00
Tom Lane
ed6329cfa9 Avoid memcpy() with same source and destination in pgstat_recv_replslot.
Same type of issue as in commit 53d4f5fef and earlier fixes; also
found by apparently-more-picky-than-the-buildfarm valgrind testing.
This one is an oversight in commit 986816750.  Since that's new in
HEAD, no need for a back-patch.
2020-12-20 12:38:32 -05:00
Alexander Korotkov
11072e8693 Fix compiler warning introduced in 6df7a9698b 2020-12-20 16:27:01 +03:00
Alexander Korotkov
6df7a9698b Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.

Since v14, each range type automatically gets a corresponding multirange
datatype.  There are both manual and automatic mechanisms for naming multirange
types.  Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE.  Otherwise, a multirange type name is generated
automatically.  If the range type name contains "range" then we change that to
"multirange".  Otherwise, we add "_multirange" to the end.

Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids.  Altogether this format allows fetching a particular range by its index
in O(n).

Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.

Catversion is bumped.

Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 07:20:33 +03:00
Amit Kapila
20659fd8e5 Update comment atop of ReorderBufferQueueMessage().
The comments atop of this function describes behaviour in case of a
transactional WAL message only, but it accepts both transactional and
non-transactional WAL messages. Update the comments to describe
behaviour in case of non-transactional WAL message as well.

Ashutosh Bapat, rephrased by Amit Kapila
Discussion: https://postgr.es/m/CAGEoWWTTzNzHOi8bj0wfAo1siGi-YEh6wqH1oaz4DrkTJ6HbTQ@mail.gmail.com
2020-12-19 10:08:46 +05:30
Tom Lane
53d4f5fef0 Avoid memcpy() with same source and destination during relmapper init.
A narrow reading of the C standard says that memcpy(x,x,n) is undefined,
although it's hard to envision an implementation that would really
misbehave.  However, analysis tools such as valgrind might whine about
this; accordingly, let's band-aid relmapper.c to not do it.

See also 5b630501e, d3f4e8a8a, ad7b48ea0, and other similar fixes.
Apparently, none of those folk tried valgrinding initdb?  This has been
like this for long enough that I'm surprised it hasn't been reported
before.

Back-patch, just in case anybody wants to use a back branch on a platform
that complains about this; we back-patched those earlier fixes too.

Discussion: https://postgr.es/m/161790.1608310142@sss.pgh.pa.us
2020-12-18 15:46:44 -05:00
Fujii Masao
00f690a239 Revert "Get rid of the dedicated latch for signaling the startup process".
Revert ac22929a26, as well as the followup fix 113d3591b8. Because it broke
the assumption that the startup process waiting for the recovery conflict
on buffer pin should be waken up only by buffer unpin or the timeout enabled
in ResolveRecoveryConflictWithBufferPin(). It caused, for example,
SIGHUP signal handler or walreceiver process to wake that startup process
up unnecessarily frequently.

Additionally, add the comments about why that dedicated latch that
the reverted patch tried to get rid of should not be removed.

Thanks to Kyotaro Horiguchi for the discussion.

Author: Fujii Masao
Discussion: https://postgr.es/m/d8c0c608-021b-3c73-fffd-3240829ee986@oss.nttdata.com
2020-12-17 18:06:51 +09:00
Peter Geoghegan
41ddc27f66 Remove obsolete btrescan() comment.
"Ordering stuff" refered to a _bt_first() call to _bt_orderkeys().
However, the _bt_orderkeys() function was renamed to
_bt_preprocess_keys() by commit fa5c8a055a.

_bt_preprocess_keys() is directly referenced just after the removed
comment already, which seems sufficient.
2020-12-15 15:55:07 -08:00
Alvaro Herrera
a18422a3ad
Remove useless variable stores
Mistakenly introduced in 4cbe3ac3e867; bug repaired in 148e632c05 but
the stores were accidentally.
2020-12-15 19:51:16 -03:00
Tomas Vondra
6bc2769832 Error out when Gather Merge input is not sorted
To build Gather Merge path, the input needs to be sufficiently sorted.
Ensuring this is the responsibility of the code constructing the paths,
but create_gather_merge_plan tried to handle unsorted paths by adding
an explicit Sort. In light of the recent issues related to Incremental
Sort, this is rather fragile. Some of the expressions may be volatile
or parallel unsafe, in which case we can't add the Sort here.

We could do more checks and add the Sort in at least some cases, but
it seems cleaner to just error out and make it clear this is a bug in
code constructing those paths.

Author: James Coleman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com
Discussion: https://postgr.es/m/CAJGNTeNaxpXgBVcRhJX%2B2vSbq%2BF2kJqGBcvompmpvXb7pq%2BoFA%40mail.gmail.com
2020-12-15 23:19:41 +01:00
Tom Lane
b3817f5f77 Improve hash_create()'s API for some added robustness.
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which
was formerly the default; and add assertions insisting that exactly
one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set.
This is in hopes of preventing recurrences of the type of oversight
fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS).

Also, when HASH_STRINGS is specified, insist that the keysize be
more than 8 bytes.  This is a heuristic, but it should catch
accidental use of HASH_STRINGS for integer or pointer keys.
(Nearly all existing use-cases set the keysize to NAMEDATALEN or
more, so there's little reason to think this restriction should
be problematic.)

Tweak hash_create() to insist that the HASH_ELEM flag be set, and
remove the defaults it had for keysize and entrysize.  Since those
defaults were undocumented and basically useless, no callers
omitted HASH_ELEM anyway.

Also, remove memset's zeroing the HASHCTL parameter struct from
those callers that had one.  This has never been really necessary,
and while it wasn't a bad coding convention it was confusing that
some callers did it and some did not.  We might as well save a few
cycles by standardizing on "not".

Also improve the documentation for hash_create().

In passing, improve reinit.c's usage of a hash table by storing
the key as a binary Oid rather than a string; and, since that's
a temporary hash table, allocate it in CurrentMemoryContext for
neatness.

Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
2020-12-15 11:38:53 -05:00
Jeff Davis
a58db3aa10 Revert "Cannot use WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE."
This reverts commit 3a9e64aa0d.

Commit 4bad60e3 fixed the root of the problem that 3a9e64aa worked
around.

This enables proper pipelining of commands after terminating
replication, eliminating an undocumented limitation.

Discussion: https://postgr.es/m/3d57bc29-4459-578b-79cb-7641baf53c57%40iki.fi
Backpatch-through: 9.5
2020-12-14 23:47:30 -08:00
Michael Paquier
df9274adf3 Add some checkpoint/restartpoint status to ps display
This is done for end-of-recovery and shutdown checkpoints/restartpoints
(end-of-recovery restartpoints don't exist) rather than all types of
checkpoints, in cases where it may not be possible to rely on
pg_stat_activity to get a status from the startup or checkpointer
processes.

For example, at the end of a crash recovery, this is useful to know if a
checkpoint is running in the startup process, while previously the ps
display may only show some information about "recovering" something,
that can be confusing while a checkpoint runs.

Author: Justin Pryzby
Reviewed-by: Nathan Bossart, Kirk Jamison, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/20200818225238.GP17022@telsasoft.com
2020-12-14 11:53:58 +09:00
Noah Misch
a1b8aa1e4e Use HASH_BLOBS for xidhash.
This caused BufFile errors on buildfarm member sungazer, and SIGSEGV was
possible.  Conditions for reaching those symptoms were more frequent on
big-endian systems.

Discussion: https://postgr.es/m/20201129214441.GA691200@rfd.leadboat.com
2020-12-12 21:38:36 -08:00
Noah Misch
73aae4522b Correct behavior descriptions in comments, and correct a test name. 2020-12-12 20:12:25 -08:00
Tom Lane
8c15a29745 Allow ALTER TYPE to update an existing type's typsubscript value.
This is essential if we'd like to allow existing extension data types
to support subscripting in future, since dropping and recreating the
type isn't a practical thing for an extension upgrade script, and
direct manipulation of pg_type isn't a great answer either.

There was some discussion about also allowing alteration of typelem,
but it's less clear whether that's a good idea or not, so for now
I forebore.

Discussion: https://postgr.es/m/3724341.1607551174@sss.pgh.pa.us
2020-12-11 18:58:21 -05:00
Tom Lane
653aa603f5 Provide an error cursor for "can't subscript" error messages.
Commit c7aba7c14 didn't add this, but after more fooling with the
feature I feel that it'd be useful.  To make this possible, refactor
getSubscriptingRoutines() so that the caller is responsible for
throwing any error.  (In clauses.c, I just chose to make the
most conservative assumption rather than throwing an error.  We don't
expect failures there anyway really, so the code space for an error
message would be a poor investment.)
2020-12-11 18:58:21 -05:00
Tom Lane
c7aba7c14e Support subscripting of arbitrary types, not only arrays.
This patch generalizes the subscripting infrastructure so that any
data type can be subscripted, if it provides a handler function to
define what that means.  Traditional variable-length (varlena) arrays
all use array_subscript_handler(), while the existing fixed-length
types that support subscripting use raw_array_subscript_handler().
It's expected that other types that want to use subscripting notation
will define their own handlers.  (This patch provides no such new
features, though; it only lays the foundation for them.)

To do this, move the parser's semantic processing of subscripts
(including coercion to whatever data type is required) into a
method callback supplied by the handler.  On the execution side,
replace the ExecEvalSubscriptingRef* layer of functions with direct
calls to callback-supplied execution routines.  (Thus, essentially
no new run-time overhead should be caused by this patch.  Indeed,
there is room to remove some overhead by supplying specialized
execution routines.  This patch does a little bit in that line,
but more could be done.)

Additional work is required here and there to remove formerly
hard-wired assumptions about the result type, collation, etc
of a SubscriptingRef expression node; and to remove assumptions
that the subscript values must be integers.

One useful side-effect of this is that we now have a less squishy
mechanism for identifying whether a data type is a "true" array:
instead of wiring in weird rules about typlen, we can look to see
if pg_type.typsubscript == F_ARRAY_SUBSCRIPT_HANDLER.  For this
to be bulletproof, we have to forbid user-defined types from using
that handler directly; but there seems no good reason for them to
do so.

This patch also removes assumptions that the number of subscripts
is limited to MAXDIM (6), or indeed has any hard-wired limit.
That limit still applies to types handled by array_subscript_handler
or raw_array_subscript_handler, but to discourage other dependencies
on this constant, I've moved it from c.h to utils/array.h.

Dmitry Dolgov, reviewed at various times by Tom Lane, Arthur Zakirov,
Peter Eisentraut, Pavel Stehule

Discussion: https://postgr.es/m/CA+q6zcVDuGBv=M0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w@mail.gmail.com
Discussion: https://postgr.es/m/CA+q6zcVovR+XY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA@mail.gmail.com
2020-12-09 12:40:37 -05:00
Peter Eisentraut
8b069ef5dc Change get_constraint_index() to use pg_constraint.conindid
It was still using a scan of pg_depend instead of using the conindid
column that has been added since.

Since it is now just a catalog lookup wrapper and not related to
pg_depend, move from pg_depend.c to lsyscache.c.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/4688d55c-9a2e-9a5a-d166-5f24fe0bf8db%40enterprisedb.com
2020-12-09 15:41:45 +01:00
Andres Freund
df99ddc70b jit: Reference function pointer types via llvmjit_types.c.
It is error prone (see 5da871bfa1) and verbose to manually create function
types. Add a helper that can reference a function pointer type via
llvmjit_types.c and and convert existing instances of manual creation.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20201207212142.wz5tnbk2jsaqzogb@alap3.anarazel.de
2020-12-08 16:55:20 -08:00
Tom Lane
62ee703313 Teach contain_leaked_vars that assignment SubscriptingRefs are leaky.
array_get_element and array_get_slice qualify as leakproof, since
they will silently return NULL for bogus subscripts.  But
array_set_element and array_set_slice throw errors for such cases,
making them clearly not leakproof.  contain_leaked_vars was evidently
written with only the former case in mind, as it gave the wrong answer
for assignment SubscriptingRefs (nee ArrayRefs).

This would be a live security bug, were it not that assignment
SubscriptingRefs can only occur in INSERT and UPDATE target lists,
while we only care about leakproofness for qual expressions; so the
wrong answer can't occur in practice.  Still, that's a rather shaky
answer for a security-related question; and maybe in future somebody
will want to ask about leakproofness of a tlist.  So it seems wise to
fix and even back-patch this correction.

(We would need some change here anyway for the upcoming
generic-subscripting patch, since extensions might make different
tradeoffs about whether to throw errors.  Commit 558d77f20 attempted
to lay groundwork for that by asking check_functions_in_node whether a
SubscriptingRef contains leaky functions; but that idea fails now that
the implementation methods of a SubscriptingRef are not SQL-visible
functions that could be marked leakproof or not.)

Back-patch to 9.6.  While 9.5 has the same issue, the code's a bit
different.  It seems quite unlikely that we'd introduce any actual bug
in the short time 9.5 has left to live, so the work/risk/reward balance
isn't attractive for changing 9.5.

Discussion: https://postgr.es/m/3143742.1607368115@sss.pgh.pa.us
2020-12-08 17:50:54 -05:00
Tom Lane
a676386b58 Remove operator_precedence_warning.
This GUC was always intended as a temporary solution to help with
finding 9.4-to-9.5 migration issues.  Now that all pre-9.5 branches
are out of support, and 9.5 will be too before v14 is released,
it seems like it's okay to drop it.  Doing so allows removal of
several hundred lines of poorly-tested code in parse_expr.c,
which have been a fertile source of bugs when people did use this.

Discussion: https://postgr.es/m/2234320.1607117945@sss.pgh.pa.us
2020-12-08 16:29:52 -05:00
Dean Rasheed
4f5760d4af Improve estimation of ANDs under ORs using extended statistics.
Formerly, extended statistics only handled clauses that were
RestrictInfos. However, the restrictinfo machinery doesn't create
sub-AND RestrictInfos for AND clauses underneath OR clauses.
Therefore teach extended statistics to handle bare AND clauses,
looking for compatible RestrictInfo clauses underneath them.

Dean Rasheed, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com
2020-12-08 20:10:11 +00:00
Dean Rasheed
88b0898fe3 Improve estimation of OR clauses using multiple extended statistics.
When estimating an OR clause using multiple extended statistics
objects, treat the estimates for each set of clauses for each
statistics object as independent of one another. The overlap estimates
produced for each statistics object do not apply to clauses covered by
other statistics objects.

Dean Rasheed, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com
2020-12-08 19:39:24 +00:00
Fujii Masao
e2ac3fed3b Speed up rechecking if relation needs to be vacuumed or analyze in autovacuum.
After autovacuum collects the relations to vacuum or analyze, it rechecks
whether each relation still needs to be vacuumed or analyzed before actually
doing that. Previously this recheck could be a significant overhead
especially when there were a very large number of relations. This was
because each recheck forced the statistics to be refreshed, and the refresh
of the statistics for a very large number of relations could cause heavy
overhead. There was the report that this issue caused autovacuum workers
to have gotten “stuck” in a tight loop of table_recheck_autovac() that
rechecks whether a relation needs to be vacuumed or analyzed.

This commit speeds up the recheck by making autovacuum worker reuse
the previously-read statistics for the recheck if possible. Then if that
"stale" statistics says that a relation still needs to be vacuumed or analyzed,
autovacuum refreshes the statistics and does the recheck again.

The benchmark shows that the more relations exist and autovacuum workers
are running concurrently, the more this change reduces the autovacuum
execution time. For example, when there are 20,000 tables and 10 autovacuum
workers are running, the benchmark showed that the change improved
the performance of autovacuum more than three times. On the other hand,
even when there are only 1000 tables and only a single autovacuum worker
is running, the benchmark didn't show any big performance regression by
the change.

Firstly POC patch was proposed by Jim Nasby. As the result of discussion,
we used Tatsuhito Kasahara's version of the patch using the approach
suggested by Tom Lane.

Reported-by: Jim Nasby
Author: Tatsuhito Kasahara
Reviewed-by: Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/3FC6C2F2-8A47-44C0-B997-28830B5716D0@amazon.com
2020-12-08 23:59:39 +09:00
Andres Freund
5da871bfa1 jit: Correct parameter type for generated expression evaluation functions.
clang only uses the 'i1' type for scalar booleans, not for pointers to
booleans (as the pointer might be pointing into a larger memory
allocation). Therefore a pointer-to-bool needs to the "storage" boolean.

There's no known case of wrong code generation due to this, but it seems quite
possible that it could cause problems (see e.g. 72559438f9).

Author: Andres Freund
Discussion: https://postgr.es/m/20201207212142.wz5tnbk2jsaqzogb@alap3.anarazel.de
Backpatch: 11-, where jit support was added
2020-12-07 19:34:13 -08:00
Michael Paquier
947789f1f5 Avoid using tuple from syscache for update of pg_database.datfrozenxid
pg_database.datfrozenxid gets updated using an in-place update at the
end of vacuum or autovacuum.  Since 96cdeae, as pg_database has a toast
relation, it is possible for a pg_database tuple to have toast values
if there is a large set of ACLs in place.  In such a case, the in-place
update would fail because of the flattening of the toast values done for
the catcache entry fetched.  Instead of using a copy from the catcache,
this changes the logic to fetch the copy of the tuple by directly
scanning pg_database.

Per the lack of complaints on the matter, no backpatch is done.  Note
that before 96cdeae, attempting to insert such a tuple to pg_database
would cause a "row is too big" error, so the end-of-vacuum problem was
not reachable.

Author: Ashwin Agrawal, Junfeng Yang
Discussion: https://postgr.es/m/DM5PR0501MB38800D9E4605BCA72DD35557CCE10@DM5PR0501MB3880.namprd05.prod.outlook.com
2020-12-08 12:13:19 +09:00
Tom Lane
e98c900993 Fix missed step in removal of useless RESULT RTEs in the planner.
Commit 4be058fe9 forgot that the append_rel_list would already be
populated at the time we remove useless result RTEs, and it might contain
PlaceHolderVars that need to be adjusted like the ones in the main parse
tree.  This could lead to "no relation entry for relid N" failures later
on, when the planner tries to do something with an unadjusted PHV.

Per report from Tom Ellis.  Back-patch to v12 where the bug came in.

Discussion: https://postgr.es/m/20201205173056.GF30712@cloudinit-builder
2020-12-05 16:16:13 -05:00
Peter Eisentraut
eb93f3a0b6 Convert elog(LOG) calls to ereport() where appropriate
User-visible log messages should go through ereport(), so they are
subject to translation.  Many remaining elog(LOG) calls are really
debugging calls.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com
2020-12-04 14:25:23 +01:00
Peter Eisentraut
a6964bc1bb Remove unnecessary grammar symbols
Instead of publication_name_list, we can use name_list.  We already
refer to publications everywhere else by the 'name' or 'name_list'
symbols, so this only improves consistency.

Reviewed-by: https://www.postgresql.org/message-id/flat/3e3ccddb-41bd-ecd8-29fe-195e34d9886f%40enterprisedb.com
Discussion: Tom Lane <tgl@sss.pgh.pa.us>
2020-12-04 11:16:26 +01:00
Amit Kapila
8ae4ef4fb0 Remove incorrect assertion in reorderbuffer.c.
We start recording changes in ReorderBufferTXN even before we reach
SNAPBUILD_CONSISTENT state so that if the commit is encountered after
reaching that we should be able to send the changes of the entire transaction.
Now, while recording changes if the reorder buffer memory has exceeded
logical_decoding_work_mem then we can start streaming if it is allowed and
we haven't yet streamed that data. However, we must not allow streaming to
start unless the snapshot has reached SNAPBUILD_CONSISTENT state.

In passing, improve the comments atop ReorderBufferResetTXN to mention the
case when we need to continue streaming after getting an error.

Author: Amit Kapila
Reviewed-by: Dilip Kumar
Discussion: https://postgr.es/m/CAA4eK1KoOH0byboyYY40NBcC7Fe812trwTa+WY3jQF7WQWZbQg@mail.gmail.com
2020-12-04 13:54:50 +05:30
Michael Paquier
bd94a9c04e Rename cryptohashes.c to cryptohashfuncs.c
87ae969 has created two new files called cryptohash{_openssl}.c in
src/common/, whose names overlap with the existing backend file called
cryptohashes.c dedicated to the SQL wrappers for SHA2 and MD5.  This
file is renamed to cryptohashfuncs.c to be more consistent with the
surroundings and reduce the confusion with the new cryptohash interface
of src/common/.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/X8hHhaQgbMbW+aGU@paquier.xyz
2020-12-04 12:58:44 +09:00
Michael Paquier
4f48a6fbe2 Change SHA2 implementation based on OpenSSL to use EVP digest routines
The use of low-level hash routines is not recommended by upstream
OpenSSL since 2000, and pgcrypto already switched to EVP as of 5ff4a67.
This takes advantage of the refactoring done in 87ae969 that has
introduced the allocation and free routines for cryptographic hashes.

Since 1.1.0, OpenSSL does not publish the contents of the cryptohash
contexts, forcing any consumers to rely on OpenSSL for all allocations.
Hence, the resource owner callback mechanism gains a new set of routines
to track and free cryptohash contexts when using OpenSSL, preventing any
risks of leaks in the backend.  Nothing is needed in the frontend thanks
to the refactoring of 87ae969, and the resowner knowledge is isolated
into cryptohash_openssl.c.

Note that this also fixes a failure with SCRAM authentication when using
FIPS in OpenSSL, but as there have been few complaints about this
problem and as this causes an ABI breakage, no backpatch is done.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson, Heikki Linnakangas
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
Discussion: https://postgr.es/m/20180911030250.GA27115@paquier.xyz
2020-12-04 10:49:23 +09:00
Peter Eisentraut
6114040711 Small code simplifications
strVal() can be used in a couple of places instead of coding the same
thing by hand.
2020-12-03 11:44:13 +01:00
Dean Rasheed
25a9e54d2d Improve estimation of OR clauses using extended statistics.
Formerly we only applied extended statistics to an OR clause as part
of the clauselist_selectivity() code path for an OR clause appearing
in an implicitly-ANDed list of clauses. This meant that it could only
use extended statistics if all sub-clauses of the OR clause were
covered by a single extended statistics object.

Instead, teach clause_selectivity() how to apply extended statistics
to an OR clause by handling its ORed list of sub-clauses in a similar
manner to an implicitly-ANDed list of sub-clauses, but with different
combination rules. This allows one or more extended statistics objects
to be used to estimate all or part of the list of sub-clauses. Any
remaining sub-clauses are then treated as if they are independent.

Additionally, to avoid double-application of extended statistics, this
introduces "extended" versions of clause_selectivity() and
clauselist_selectivity(), which include an option to ignore extended
statistics. This replaces the old clauselist_selectivity_simple()
function which failed to completely ignore extended statistics when
called from the extended statistics code.

A known limitation of the current infrastructure is that an AND clause
under an OR clause is not treated as compatible with extended
statistics (because we don't build RestrictInfos for such sub-AND
clauses). Thus, for example, "(a=1 AND b=1) OR (a=2 AND b=2)" will
currently be treated as two independent AND clauses (each of which may
be estimated using extended statistics), but extended statistics will
not currently be used to account for any possible overlap between
those clauses. Improving that is left as a task for the future.

Original patch by Tomas Vondra, with additional improvements by me.

Discussion: https://postgr.es/m/20200113230008.g67iyk4cs3xbnjju@development
2020-12-03 10:03:49 +00:00
Michael Paquier
b5913f6120 Refactor CLUSTER and REINDEX grammar to use DefElem for option lists
This changes CLUSTER and REINDEX so as a parenthesized grammar becomes
possible for options, while unifying the grammar parsing rules for
option lists with the existing ones.

This is a follow-up of the work done in 873ea9e for VACUUM, ANALYZE and
EXPLAIN.  This benefits REINDEX for a potential backend-side filtering
for collatable-sensitive indexes and TABLESPACE, while CLUSTER would
benefit from the latter.

Author: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
2020-12-03 10:13:21 +09:00
Stephen Frost
dc11f31a1a Add GSS information to connection authorized log message
GSS information (if used) such as if the connection was authorized using
GSS or if it was encrypted using GSS, and perhaps most importantly, what
the GSS principal used for the authentication was, is extremely useful
but wasn't being included in the connection authorized log message.

Therefore, add to the connection authorized log message that
information, in a similar manner to how we log SSL information when SSL
is used for a connection.

Author: Vignesh C
Reviewed-by: Bharath Rupireddy
Discussion: https://www.postgresql.org/message-id/CALDaNm2N1385_Ltoo%3DS7VGT-ESu_bRQa-sC1wg6ikrM2L2Z49w%40mail.gmail.com
2020-12-02 14:41:53 -05:00
Fujii Masao
01469241b2 Track total number of WAL records, FPIs and bytes generated in the cluster.
Commit 6b466bf5f2 allowed pg_stat_statements to track the number of
WAL records, full page images and bytes that each statement generated.
Similarly this commit allows us to track the cluster-wide WAL statistics
counters.

New columns wal_records, wal_fpi and wal_bytes are added into the
pg_stat_wal view, and reports the total number of WAL records,
full page images and bytes generated in the , respectively.

Author: Masahiro Ikeda
Reviewed-by: Amit Kapila, Movead Li, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/35ef960128b90bfae3b3fdf60a3a860f@oss.nttdata.com
2020-12-02 13:00:15 +09:00
Fujii Masao
942305a363 Allow restore_command parameter to be changed with reload.
This commit changes restore_command from PGC_POSTMASTER to PGC_SIGHUP.

As the side effect of this commit, restore_command can be reset to
empty during archive recovery. In this setting, archive recovery
tries to replay only WAL files available in pg_wal directory. This is
the same behavior as when the command that always fails is specified
in restore_command.

Note that restore_command still must be specified (not empty) when
starting archive recovery, even after applying this commit. This is
necessary as the safeguard to prevent users from forgetting to
specify restore_command and starting archive recovery.

Thanks to Peter Eisentraut, Michael Paquier, Andres Freund,
Robert Haas and Anastasia Lubennikova for discussion.

Author: Sergei Kornilov
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/2317771549527294@sas2-985f744271ca.qloud-c.yandex.net
2020-12-02 11:00:15 +09:00
Michael Paquier
87ae9691d2 Move SHA2 routines to a new generic API layer for crypto hashes
Two new routines to allocate a hash context and to free it are created,
as these become necessary for the goal behind this refactoring: switch
the all cryptohash implementations for OpenSSL to use EVP (for FIPS and
also because upstream does not recommend the use of low-level cryptohash
functions for 20 years).  Note that OpenSSL hides the internals of
cryptohash contexts since 1.1.0, so it is necessary to leave the
allocation to OpenSSL itself, explaining the need for those two new
routines.  This part is going to require more work to properly track
hash contexts with resource owners, but this not introduced here.
Still, this refactoring makes the move possible.

This reduces the number of routines for all SHA2 implementations from
twelve (SHA{224,256,386,512} with init, update and final calls) to five
(create, free, init, update and final calls) by incorporating the hash
type directly into the hash context data.

The new cryptohash routines are moved to a new file, called cryptohash.c
for the fallback implementations, with SHA2 specifics becoming a part
internal to src/common/.  OpenSSL specifics are part of
cryptohash_openssl.c.  This infrastructure is usable for more hash
types, like MD5 or HMAC.

Any code paths using the internal SHA2 routines are adapted to report
correctly errors, which are most of the changes of this commit.  The
zones mostly impacted are checksum manifests, libpq and SCRAM.

Note that e21cbb4 was a first attempt to switch SHA2 to EVP, but it
lacked the refactoring needed for libpq, as done here.

This patch has been tested on Linux and Windows, with and without
OpenSSL, and down to 1.0.1, the oldest version supported on HEAD.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
2020-12-02 10:37:20 +09:00
Tom Lane
f7f83a55bf Ensure that expandTableLikeClause() re-examines the same table.
As it stood, expandTableLikeClause() re-did the same relation_openrv
call that transformTableLikeClause() had done.  However there are
scenarios where this would not find the same table as expected.
We hold lock on the LIKE source table, so it can't be renamed or
dropped, but another table could appear before it in the search path.
This explains the odd behavior reported in bug #16758 when cloning a
table as a temp table of the same name.  This case worked as expected
before commit 502898192 introduced the need to open the source table
twice, so we should fix it.

To make really sure we get the same table, let's re-open it by OID not
name.  That requires adding an OID field to struct TableLikeClause,
which is a little nervous-making from an ABI standpoint, but as long
as it's at the end I don't think there's any serious risk.

Per bug #16758 from Marc Boeren.  Like the previous patch,
back-patch to all supported branches.

Discussion: https://postgr.es/m/16758-840e84a6cfab276d@postgresql.org
2020-12-01 14:02:27 -05:00
Alvaro Herrera
677f74e5bb
Avoid memcpy() with a NULL source pointer and count == 0
When memcpy() is called on a pointer, the compiler is entitled to assume
that the pointer is not null, which can lead to optimizing nearby code
in potentially undesirable ways.  We still want such optimizations
(gcc's -fdelete-null-pointer-checks) in cases where they're valid.

Related: commit 13bba02271.

Backpatch to pg11, where this particular instance appeared.

Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Reported-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/CAEudQApUndmQkr5fLrCKXQ7+ib44i7S+Kk93pyVThS85PnG3bQ@mail.gmail.com
Discussion: https://postgr.es/m/CALNJ-vSdhwSM5f4tnNn1cdLHvXMVe_S+V3nR5GwNrmCPNB2VtQ@mail.gmail.com
2020-12-01 11:46:56 -03:00
Thomas Munro
57faaf376e Use truncate(2) where appropriate.
When truncating files by name, use truncate(2).  Windows hasn't got it,
so keep our previous coding based on ftruncate(2) as a fallback.

Discussion: https://postgr.es/m/16663-fe97ccf9932fc800%40postgresql.org
2020-12-01 15:42:22 +13:00
Thomas Munro
9f35f94373 Free disk space for dropped relations on commit.
When committing a transaction that dropped a relation, we previously
truncated only the first segment file to free up disk space (the one
that won't be unlinked until the next checkpoint).

Truncate higher numbered segments too, even though we unlink them on
commit.  This frees the disk space immediately, even if other backends
have open file descriptors and might take a long time to get around to
handling shared invalidation events and closing them.  Also extend the
same behavior to the first segment, in recovery.

Back-patch to all supported releases.

Bug: #16663
Reported-by: Denis Patron <denis.patron@previnet.it>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Reviewed-by: David Zhang <david.zhang@highgo.ca>
Discussion: https://postgr.es/m/16663-fe97ccf9932fc800%40postgresql.org
2020-12-01 13:21:03 +13:00
Tom Lane
8286223f3d Fix missing outfuncs.c support for IncrementalSortPath.
For debugging purposes, Path nodes are supposed to have outfuncs
support, but this was overlooked in the original incremental sort patch.

While at it, clean up a couple other minor oversights, as well as
bizarre choice of return type for create_incremental_sort_path().
(All the existing callers just cast it to "Path *" immediately, so
they don't care, but some future caller might care.)

outfuncs.c fix by Zhijie Hou, the rest by me

Discussion: https://postgr.es/m/324c4d81d8134117972a5b1f6cdf9560@G08CNEXMBPEKD05.g08.fujitsu.local
2020-11-30 16:33:09 -05:00
Tom Lane
275b3411d9 Prevent parallel index build in a standalone backend.
This can't work if there's no postmaster, and indeed the code got an
assertion failure trying.  There should be a check on IsUnderPostmaster
gating the use of parallelism, as the planner has for ordinary
parallel queries.

Commit 40d964ec9 got this right, so follow its model of checking
IsUnderPostmaster at the same place where we check for
max_parallel_maintenance_workers == 0.  In general, new code
implementing parallel utility operations should do the same.

Report and patch by Yulin Pei, cosmetically adjusted by me.
Back-patch to v11 where this code came in.

Discussion: https://postgr.es/m/HK0PR01MB22747D839F77142D7E76A45DF4F50@HK0PR01MB2274.apcprd01.prod.exchangelabs.com
2020-11-30 14:38:00 -05:00
Tom Lane
b1738ff6ab Fix miscomputation of direct_lateral_relids for join relations.
If a PlaceHolderVar is to be evaluated at a join relation, but
its value is only needed there and not at higher levels, we neglected
to update the joinrel's direct_lateral_relids to include the PHV's
source rel.  This causes problems because join_is_legal() then won't
allow joining the joinrel to the PHV's source rel at all, leading
to "failed to build any N-way joins" planner failures.

Per report from Andreas Seltenreich.  Back-patch to 9.5
where the problem originated.

Discussion: https://postgr.es/m/87blfgqa4t.fsf@aurora.ydns.eu
2020-11-30 12:22:43 -05:00
Michael Paquier
873ea9ee69 Refactor parsing rules for option lists of EXPLAIN, VACUUM and ANALYZE
Those three commands have been using the same grammar rules to handle a
a list of parenthesized options.  This refactors the code so as they use
the same parsing rules, shaving some code.  A future commit will make
use of those option parsing rules for more utility commands, like
REINDEX and CLUSTER.

Author: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
2020-11-30 20:27:37 +09:00
Heikki Linnakangas
2bc588798b Remove leftover comments, left behind by removal of WITH OIDS.
Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGaRoF3XrhPW-Y7P%2BG7bKo84Z_h%3DkQHvMh-80%3Dav3wmOw%40mail.gmail.com
2020-11-30 10:26:43 +02:00
Fujii Masao
98e2d58d66 Improve log message about termination of background workers.
Previously the shutdown of a background worker that uses die() as
SIGTERM signal handler produced the log message "terminating
connection due to administrator command". This log message was
confusing because a background worker is not a connection.
This commit improves that log message to "terminating background
worker XXX due to administrator command" (XXX is replaced with
the name of the background worker). This is the same log message
as another SIGTERM signal handler bgworker_die() for a background
worker reports.

Author: Bharath Rupireddy
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/3f292fbb-f155-9a01-7cb2-7ccc9007ab3f@oss.nttdata.com
2020-11-30 11:05:19 +09:00
Tom Lane
9c83b54a9c Fix a recently-introduced race condition in LISTEN/NOTIFY handling.
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error.  Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend.  While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies.  (A server restart is enough to
make that go away, but it's still pretty unpleasant.)

The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away).  To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.

Per report from Mikael Gustavsson.  Like the previous patch,
back-patch to all supported branches.

Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
2020-11-28 14:03:40 -05:00
Fujii Masao
3df51ca8b3 Fix CLUSTER progress reporting of number of blocks scanned.
Previously pg_stat_progress_cluster view reported the current block
number in heap scan as the number of heap blocks scanned (i.e.,
heap_blks_scanned). This reported number could be incorrect when
synchronize_seqscans is enabled, because it allowed the heap scan to
start at block in middle. This could result in wraparounds in the
heap_blks_scanned column when the heap scan wrapped around.
This commit fixes the bug by calculating the number of blocks from
the block that the heap scan starts at to the current block in scan,
and reporting that number in the heap_blks_scanned column.

Also, in pg_stat_progress_cluster view, previously heap_blks_scanned
could not reach heap_blks_total at the end of heap scan phase
if the last pages scanned were empty. This commit fixes the bug by
manually updating heap_blks_scanned to the same value as
heap_blks_total when the heap scan phase finishes.

Back-patch to v12 where pg_stat_progress_cluster view was introduced.

Reported-by: Matthias van de Meent
Author: Matthias van de Meent
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CAEze2WjCBWSGkVfYag001Rc4+-nNLDpWM7QbyD6yPvuhKs-gYQ@mail.gmail.com
2020-11-27 20:16:44 +09:00
Amit Kapila
0926e96c49 Fix replication of in-progress transactions in tablesync worker.
Tablesync worker runs under a single transaction but in streaming mode, we
were committing the transaction on stream_stop, stream_abort, and
stream_commit. We need to avoid committing the transaction in a streaming
mode in tablesync worker.

In passing move the call to process_syncing_tables in
apply_handle_stream_commit after clean up of stream files. This will
allow clean up of files to happen before the exit of tablesync worker
which would otherwise be handled by one of the proc exit routines.

Author: Dilip Kumar
Reviewed-by: Amit Kapila and Peter Smith
Tested-by: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pt4PyKQCwqzQ=EFF=bpKKJD7XKt_S23F6L20ayQNxg77A@mail.gmail.com
2020-11-27 07:43:34 +05:30
Alvaro Herrera
dcfff74fb1
Restore lock level to update statusFlags
Reverts 27838981be (some comments are kept).  Per discussion, it does
not seem safe to relax the lock level used for this; in order for it to
be safe, there would have to be memory barriers between the point we set
the flag and the point we set the trasaction Xid, which perhaps would
not be so bad; but there would also have to be barriers at the readers'
side, which from a performance perspective might be bad.

Now maybe this analysis is wrong and it *is* safe for some reason, but
proof of that is not trivial.

Discussion: https://postgr.es/m/20201118190928.vnztes7c2sldu43a@alap3.anarazel.de
2020-11-26 12:30:48 -03:00
Amit Kapila
f3a8f73ec2 Use Enums for logical replication message types at more places.
Commit 644f0d7cc9 added logical replication message type enums to use
instead of character literals but some char substitutions were overlooked.

Author: Peter Smith
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAHut+PsTG=Vrv8hgrvOnAvCNR21jhqMdPk2n0a1uJPoW0p+UfQ@mail.gmail.com
2020-11-26 09:21:14 +05:30
Alvaro Herrera
c98763bf51
Avoid spurious waits in concurrent indexing
In the various waiting phases of CREATE INDEX CONCURRENTLY (CIC) and
REINDEX CONCURRENTLY (RC), we wait for other processes to release their
snapshots; this is necessary in general for correctness.  However,
processes doing CIC in other tables cannot possibly affect CIC or RC
done in "this" table, so we don't need to wait for those.  This commit
adds a flag in MyProc->statusFlags to indicate that the current process
is doing CIC, so that other processes doing CIC or RC can ignore it when
waiting.

Note that this logic is only valid if the index does not access other
tables.  For simplicity we avoid setting the flag if the index has a
column that's an expression, or has a WHERE predicate.  (It is possible
to have expressional or partial indexes that do not access other tables,
but figuring that out would require more work.)

This flag can potentially also be used by processes doing REINDEX
CONCURRENTLY to be skipped; and by VACUUM to ignore processes in CIC or
RC for the purposes of computing an Xmin.  That's left for future
commits.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Dimitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20200810233815.GA18970@alvherre.pgsql
2020-11-25 18:22:57 -03:00
Tom Lane
2432b1a040 Avoid spamming the client with multiple ParameterStatus messages.
Up to now, we sent a ParameterStatus message to the client immediately
upon any change in the active value of any GUC_REPORT variable.  This
was only barely okay when the feature was designed; now that we have
things like function SET clauses, there are very plausible use-cases
where a GUC_REPORT variable might change many times within a query
--- and even end up back at its original value, perhaps.  Fortunately
most of our GUC_REPORT variables are unlikely to be changed often;
but there are proposals in play to enlarge that set, or even make it
user-configurable.

Hence, let's fix things to not generate more than one ParameterStatus
message per variable per query, and to not send any message at all
unless the end-of-query value is different from what we last reported.

Discussion: https://postgr.es/m/5708.1601145259@sss.pgh.pa.us
2020-11-25 11:40:44 -05:00
Peter Eisentraut
d5d91acdcc Make error hint from bind() failure more accurate
The hint "Is another postmaster already running ..." should only be
printed for errors that are really about something else already using
the address.  In other cases it is misleading.  So only show that hint
if errno == EADDRINUSE.

Also, since Unix-domain sockets in the file-system namespace never
report EADDRINUSE for an existing file (they would just overwrite it),
the part of the hint saying "If not, remove socket file \"%s\" and
retry." can never happen, so remove it.  Unix-domain sockets in the
abstract namespace can report EADDRINUSE, but in that case there is no
file to remove, so the hint doesn't work there either.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/6dee8574-b0ad-fc49-9c8c-2edc796f0033@2ndquadrant.com
2020-11-25 08:33:57 +01:00
Peter Eisentraut
c9f0624bc2 Add support for abstract Unix-domain sockets
This is a variant of the normal Unix-domain sockets that don't use the
file system but a separate "abstract" namespace.  At the user
interface, such sockets are represented by names starting with "@".
Supported on Linux and Windows right now.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/6dee8574-b0ad-fc49-9c8c-2edc796f0033@2ndquadrant.com
2020-11-25 08:33:57 +01:00
Thomas Munro
a7e65dc88b Fix WaitLatch(NULL) on Windows.
Further to commit 733fa9aa, on Windows when a latch is triggered but we
aren't currently waiting for it, we need to locate the latch's HANDLE
rather than calling ResetEvent(NULL).

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://postgr.es/m/CAEudQArTPi1YBc%2Bn1fo0Asy3QBFhVjp_QgyKG-8yksVn%2ByRTiw%40mail.gmail.com
2020-11-25 17:55:49 +13:00
Amit Kapila
805b816305 Remove obsolete comment atop ri_PlanCheck.
Commit 5b7ba75f7f removed the unused parameter but forgot to update the
nearby comments.

Author: Li Japin
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/0E2F62A2-B2F1-4052-83AE-F0BEC8A75789@hotmail.com
2020-11-25 09:14:45 +05:30
Michael Paquier
7b94e99960 Remove catalog function currtid()
currtid() and currtid2() are an undocumented set of functions whose sole
known user is the Postgres ODBC driver, able to retrieve the latest TID
version for a tuple given by the caller of those functions.

As used by Postgres ODBC, currtid() is a shortcut able to retrieve the
last TID loaded into a backend by passing an OID of 0 (magic value)
after a tuple insertion.  This is removed in this commit, as it became
obsolete after the driver began using "RETURNING ctid" with inserts, a
clause supported since Postgres 8.2 (using RETURNING is better for
performance anyway as it reduces the number of round-trips to the
backend).

currtid2() is still used by the driver, so this remains around for now.
Note that this function is kept in its original shape for backward
compatibility reasons.

Per discussion with many people, including Andres Freund, Peter
Eisentraut, Álvaro Herrera, Hiroshi Inoue, Tom Lane and myself.

Bump catalog version.

Discussion: https://postgr.es/m/20200603021448.GB89559@paquier.xyz
2020-11-25 12:18:26 +09:00
Andrew Gierth
660b89928d Properly check index mark/restore in ExecSupportsMarkRestore.
Previously this code assumed that all IndexScan nodes supported
mark/restore, which is not true since it depends on optional index AM
support functions. This could lead to errors about missing support
functions in rare edge cases of mergejoins with no sort keys, where an
unordered non-btree index scan was placed on the inner path without a
protecting Materialize node. (Normally, the fact that merge join
requires ordered input would avoid this error.)

Backpatch all the way since this bug is ancient.

Per report from Eugen Konkov on irc.

Discussion: https://postgr.es/m/87o8jn50be.fsf@news-spur.riddles.org.uk
2020-11-24 21:58:32 +00:00
Tom Lane
ec05bafdbb Put "inline" marker on declarations of inline functions.
I'm having a hard time telling whether the letter of the C standard
requires this, but we do have a couple of buildfarm members that
throw warnings when this is not done.  Oversight in c532d15dd.
2020-11-24 15:43:01 -05:00
Heikki Linnakangas
0a2bc5d61e Move per-agg and per-trans duplicate finding to the planner.
This has the advantage that the cost estimates for aggregates can count
the number of calls to transition and final functions correctly.

Bump catalog version, because views can contain Aggrefs.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/b2e3536b-1dbc-8303-c97e-89cb0b4a9a48%40iki.fi
2020-11-24 10:45:00 +02:00
Michael Paquier
d03d7549b2 Use macros instead of hardcoded offsets for LWLock initialization
This makes the code slightly easier to follow, as the initialization
relies on an offset that overlapped with an equivalent set of macros
defined, which are used in other places already.

Author: Japin Li
Discussion: https://postgr.es/m/MEYP282MB1669FB410006758402F2C3A2B6E00@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2020-11-24 12:39:58 +09:00
Tom Lane
789b938bf2 Centralize logic for skipping useless ereport/elog calls.
While ereport() and elog() themselves are quite cheap when the
error message level is too low to be printed, some places need to do
substantial work before they can call those macros at all.  To allow
optimizing away such setup work when nothing is to be printed, make
elog.c export a new function message_level_is_interesting(elevel)
that reports whether ereport/elog will do anything.  Make use of that
in various places that had ad-hoc direct tests of log_min_messages etc.
Also teach ProcSleep to use it to avoid some work.  (There may well
be other places that could usefully use this; I didn't search hard.)

Within elog.c, refactor a little bit to avoid having duplicate copies
of the policy-setting logic.  When that code was written, we weren't
relying on the availability of inline functions; so it had some
duplications in the name of efficiency, which I got rid of.

Alvaro Herrera and Tom Lane

Discussion: https://postgr.es/m/129515.1606166429@sss.pgh.pa.us
2020-11-23 19:10:46 -05:00
David Rowley
913ec71d68 Improve compiler code layout in elog/ereport ERROR calls
Here we use a bit of preprocessor trickery to coax supporting compilers
into laying out their generated code so that the code that's in the same
branch as elog(ERROR)/ereport(ERROR) calls is moved away from the hot
path.  Effectively, this reduces the size of the hot code meaning that it
can sit on fewer cache lines.

Performance improvements of between 10-15% have been seen on highly CPU
bound workloads using pgbench's TPC-b benchmark.

What's achieved here is very similar to putting the error condition inside
an unlikely() macro. For example;

if (unlikely(x < 0))
    elog(ERROR, "invalid x value");

now there's no need to make use of unlikely() here as the common macro
used by elog and ereport will now see that elevel is >= ERROR and make use
of a pg_attribute_cold marked version of errstart().

When elevel < ERROR or if it cannot be determined to be constant, the
original behavior is maintained.

Author: David Rowley
Reviewed-by: Andres Freund, Peter Eisentraut
Discussion: https://postgr.es/m/CAApHDvrVpasrEzLL2er7p9iwZFZ%3DJj6WisePcFeunwfrV0js_A%40mail.gmail.com
2020-11-24 12:04:42 +13:00
Alvaro Herrera
450c8230b1
Don't hold ProcArrayLock longer than needed in rare cases
While cancelling an autovacuum worker, we hold ProcArrayLock while
formatting a debugging log string.  We can make this shorter by saving
the data we need to produce the message and doing the formatting outside
the locked region.

This isn't terribly critical, as it only occurs pretty rarely: when a
backend runs deadlock detection and it happens to be blocked by a
autovacuum running autovacuum.  Still, there's no need to cause a hiccup
in ProcArrayLock processing, which can be very high-traffic in some
cases.

While at it, rework code so that we only print the string when it is
really going to be used, as suggested by Michael Paquier.

Discussion: https://postgr.es/m/20201118214127.GA3179@alvherre.pgsql
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2020-11-23 18:55:23 -03:00
Tom Lane
0cc9932788 Rename the "point is strictly above/below point" comparison operators.
Historically these were called >^ and <^, but that is inconsistent
with the similar box, polygon, and circle operators, which are named
|>> and <<| respectively.  Worse, the >^ and <^ names are used for
*not* strict above/below tests for the box type.

Hence, invent new operators following the more common naming.  The
old operators remain available for now, and are still accepted by
the relevant index opclasses too.  But there's a deprecation notice,
so maybe we can get rid of them someday.

Emre Hasegeli, reviewed by Pavel Borisov

Discussion: https://postgr.es/m/24348.1587444160@sss.pgh.pa.us
2020-11-23 11:38:37 -05:00
Tom Lane
d36228a9fc Improve wording of two error messages related to generated columns.
Clarify that you can "insert" into a generated column as long as what
you're inserting is a DEFAULT placeholder.

Also, use ERRCODE_GENERATED_ALWAYS in place of ERRCODE_SYNTAX_ERROR;
there doesn't seem to be any reason to use the less specific errcode.

Discussion: https://postgr.es/m/9q0sgcr416t.fsf@gmx.us
2020-11-23 11:15:12 -05:00
Alvaro Herrera
fe05129155
Make some sanity-check elogs more verbose
A few sanity checks in funcapi.c were not mentioning all the possible
clauses for failure, confusing developers who fat-fingered catalog data
additions.  Make the errors more detailed to avoid wasting time in
pinpointing mistakes.

Per complaint from Craig Ringer.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMsr+YH7Kd87A3cU5m_wKo46HPQ46zFv5wesFNL0YWxkGhGv3g@mail.gmail.com
2020-11-23 13:10:03 -03:00
Heikki Linnakangas
68b1a4877e Fix a few comments that referred to copy.c.
Missed these in the previous commit.
2020-11-23 11:36:13 +02:00
Heikki Linnakangas
c532d15ddd Split copy.c into four files.
Copy.c has grown really large. Split it into more manageable parts:

- copy.c now contains only a few functions that are common to COPY FROM
  and COPY TO.

- copyto.c contains code for COPY TO.

- copyfrom.c contains code for initializing COPY FROM, and inserting the
  tuples to the correct table.

- copyfromparse.c contains code for reading from the client/file/program,
  and parsing the input text/CSV/binary format into tuples.

All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.

The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.

Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
2020-11-23 10:50:50 +02:00
Tom Lane
17958972fe Allow a multi-row INSERT to specify DEFAULTs for a generated column.
One can say "INSERT INTO tab(generated_col) VALUES (DEFAULT)" and not
draw an error.  But the equivalent case for a multi-row VALUES list
always threw an error, even if one properly said DEFAULT in each row.
Fix that.  While here, improve the test cases for nearby logic about
OVERRIDING SYSTEM/USER values.

Dean Rasheed

Discussion: https://postgr.es/m/9q0sgcr416t.fsf@gmx.us
2020-11-22 15:48:32 -05:00
Tom Lane
9fe649ea29 In geo_ops.c, represent infinite slope as Infinity, not DBL_MAX.
Since we're assuming IEEE floats these days, there seems little
reason not to do this.  It has the advantage that when the slope is
computed as infinite due to the presence of Inf coordinates, we get
saner behavior than before from line_construct(), and thence also
in some dependent operations such as finding the closest point.

Also fix line_construct() to special-case slope zero.  The previous
coding got the right answer in most cases, but it could compute
C as NaN when the point has Inf coordinates.

Discussion: https://postgr.es/m/CAGf+fX70rWFOk5cd00uMfa__0yP+vtQg5ck7c2Onb-Yczp0URA@mail.gmail.com
2020-11-21 17:24:07 -05:00
Tom Lane
8597a48d01 Fix FPeq() and friends to get the right answers for infinities.
"FPeq(infinity, infinity)" returned false, on account of getting NaN
when it subtracts the two inputs.  Fix that by adding a separate
check for exact equality.

FPle() and FPge() similarly got the wrong answer for two like-signed
infinities.  In those cases, we can just rearrange the comparisons
to avoid potentially subtracting infinities.

While the sibling functions FPne() etc accidentally gave the right
answers even with the internal NaN results, it seems best to make
similar adjustments to them to avoid depending on this.

FPeq() has to be converted to an inline function to avoid double
evaluations of its arguments, and I did the same for the others
just for consistency.

In passing, make the handling of NaN cases in line_eq() and
point_eq_point() simpler and easier to reason about, and perhaps
faster.

This results in just one visible regression test change: slope()
now gives DBL_MAX for two inputs of (inf,1e300), which is consistent
with what it does for (1e300,inf), so that seems like a bug fix.

Discussion: https://postgr.es/m/CAGf+fX70rWFOk5cd00uMfa__0yP+vtQg5ck7c2Onb-Yczp0URA@mail.gmail.com
2020-11-21 16:46:43 -05:00
Michael Paquier
878f3a19c6 Remove INSERT privilege check at table creation of CTAS and matview
As per discussion with Peter Eisentraunt, the SQL standard specifies
that any tuple insertion done as part of CREATE TABLE AS happens without
any extra ACL check, so it makes little sense to keep a check for INSERT
privileges when using WITH DATA.  Materialized views are not part of the
standard, but similarly, this check can be confusing as this refers to
an access check on a table created within the same command as the one
that would insert data into this table.

This commit removes the INSERT privilege check for WITH DATA, the
default, that 846005e removed partially, but only for WITH NO DATA.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/d049c272-9a47-d783-46b0-46665b011598@enterprisedb.com
2020-11-21 19:45:30 +09:00
Peter Eisentraut
b5acf10cfc Replace a macro by a function
Using a macro is ugly and not justified here.

Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
2020-11-20 11:25:25 +01:00
Thomas Munro
ca051d8b10 Add collation versions for FreeBSD.
On FreeBSD 13, use querylocale() to read the current version of libc
collations.  Similar to commits 352f6f2d for Windows and d5ac14f9 for
GNU/Linux.

Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-20 21:49:57 +13:00
Fujii Masao
a4ef0329c2 Emit log when restore_command succeeds but archived file faills to be restored.
Previously, when restore_command claimed to succeed but failed to restore
the file with the right name, for example, due to mis-configuration of
restore_command, no log message was reported. Then the recovery failed
later with an error message not directly related to the issue.

This commit changes the recovery so that a log message is emitted in
this error case. This would enable us to investigate what happened in
this case more easily.

Author: Jeff Janes, Fujii Masao
Reviewed-by: Pavel Borisov, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAMkU=1xkFs3Omp4JR4wMYWdam_KLuj6LXnTYfU8u3T0h=PLLMQ@mail.gmail.com
2020-11-20 15:42:47 +09:00
Tom Lane
926fa801ac Remove undocumented IS [NOT] OF syntax.
This feature was added a long time ago, in 7c1e67bd5 and eb121ba2c,
but never documented in any user-facing way.  (Documentation added
in 6126d3e70 was commented out almost immediately, in 8272fc3f7.)
That's because, while this syntax is defined by SQL:99, our
implementation is only vaguely related to the standard's semantics.
The standard appears to intend a run-time not parse-time test, and
it definitely intends that the test should understand subtype
relationships.

No one has stepped up to fix that in the intervening years, but
people keep coming across the code and asking why it's not documented.
Let's just get rid of it: if anyone ever wants to make it work per
spec, they can easily recover whatever parts of this code are still
of value from our git history.

If there's anyone out there who's actually using this despite its
undocumented status, they can switch to using pg_typeof() instead,
eg. "pg_typeof(something) = 'mytype'::regtype".  That gives
essentially the same semantics as what our IS OF code did.
(We didn't have that function last time this was discussed, or
we would have ripped out IS OF then.)

Discussion: https://postgr.es/m/CAKFQuwZ2pTc-DSkOiTfjauqLYkNREeNZvWmeg12Q-_69D+sYZA@mail.gmail.com
Discussion: https://postgr.es/m/BAY20-F23E9F2B4DAB3E4E88D3623F99B0@phx.gbl
Discussion: https://postgr.es/m/3E7CF81D.1000203@joeconway.com
2020-11-19 17:39:39 -05:00
Tom Lane
97390fe8a6 Further fixes for CREATE TABLE LIKE: cope with self-referential FKs.
Commit 502898192 was too careless about the order of execution of the
additional ALTER TABLE operations generated by expandTableLikeClause.
It just stuck them all at the end, which seems okay for most purposes.
But it falls down in the case where LIKE is importing a primary key
or unique index and the outer CREATE TABLE includes a FOREIGN KEY
constraint that needs to depend on that index.  Weird as that is,
it used to work, so we ought to keep it working.

To fix, make parse_utilcmd.c insert LIKE clauses between index-creation
and FK-creation commands in the transformed list of commands, and change
utility.c so that the commands generated by expandTableLikeClause are
executed immediately not at the end.  One could imagine scenarios where
this wouldn't work either; but currently expandTableLikeClause only
makes column default expressions, CHECK constraints, and indexes, and
this ordering seems fine for those.

Per bug #16730 from Sofoklis Papasofokli.  Like the previous patch,
back-patch to all supported branches.

Discussion: https://postgr.es/m/16730-b902f7e6e0276b30@postgresql.org
2020-11-19 15:03:17 -05:00
Peter Eisentraut
01e658fa74 Hash support for row types
Add hash functions for the record type as well as a hash operator
family and operator class for the record type.  This enables all the
hash functionality for the record type such as hash-based plans for
UNION/INTERSECT/EXCEPT DISTINCT, recursive queries using UNION
DISTINCT, hash joins, and hash partitioning.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/38eccd35-4e2d-6767-1b3c-dada1eac3124%402ndquadrant.com
2020-11-19 09:32:47 +01:00
Thomas Munro
7888b09994 Add BarrierArriveAndDetachExceptLast().
Provide a way for one process to continue the remaining phases of a
(previously) parallel computation alone.  Later patches will use this to
extend Parallel Hash Join.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com
2020-11-19 18:13:46 +13:00
Alvaro Herrera
27838981be
Relax lock level for setting PGPROC->statusFlags
We don't actually need a lock to set PGPROC->statusFlags itself; what we
do need is a shared lock on either XidGenLock or ProcArrayLock in order to
ensure MyProc->pgxactoff keeps still while we modify the mirror array in
ProcGlobal->statusFlags.  Some places were using an exclusive lock for
that, which is excessive.  Relax those to use shared lock only.

procarray.c has a couple of places with somewhat brittle assumptions
about PGPROC changes: ProcArrayEndTransaction uses only shared lock, so
it's permissible to change MyProc only.  On the other hand,
ProcArrayEndTransactionInternal also changes other procs, so it must
hold exclusive lock.  Add asserts to ensure those assumptions continue
to hold.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20201117155501.GA13805@alvherre.pgsql
2020-11-18 13:24:22 -03:00
Heikki Linnakangas
2cccb627f1 Skip allocating hash table in EXPLAIN-only mode.
Author: Alexey Bashtanov
Discussion: https://www.postgresql.org/message-id/36823f65-050d-ae24-aa4d-a37726998240%40imap.cc
2020-11-18 12:39:15 +02:00
Peter Geoghegan
cf2acaf4dc Deprecate nbtree's BTP_HAS_GARBAGE flag.
Streamline handling of the various strategies that we have to avoid a
page split in nbtinsert.c.  When it looks like a leaf page is about to
overflow, we now perform deleting LP_DEAD items and deduplication in one
central place.  This greatly simplifies _bt_findinsertloc().

This has an independently useful consequence: nbtree no longer relies on
the BTP_HAS_GARBAGE page level flag/hint for anything important.  We
still set and unset the flag in the same way as before, but it's no
longer treated as a gating condition when considering if we should check
for already-set LP_DEAD bits.  This happens at the point where the page
looks like it might have to be split anyway, so simply checking the
LP_DEAD bits in passing is practically free.  This avoids missing
LP_DEAD bits just because the page-level hint is unset, which is
probably reasonably common (e.g. it happens when VACUUM unsets the
page-level flag without actually removing index tuples whose LP_DEAD-bit
was set recently, after the VACUUM operation began but before it reached
the leaf page in question).

Note that this isn't a big behavioral change compared to PostgreSQL 13.
We were already checking for set LP_DEAD bits regardless of whether the
BTP_HAS_GARBAGE page level flag was set before we considered doing a
deduplication pass.  This commit only goes slightly further by doing the
same check for all indexes, even indexes where deduplication won't be
performed.

We don't completely remove the BTP_HAS_GARBAGE flag.  We still rely on
it as a gating condition with pg_upgrade'd indexes from before B-tree
version 4/PostgreSQL 12.  That makes sense because we sometimes have to
make a choice among pages full of duplicates when inserting a tuple with
pre version 4 indexes.  It probably still pays to avoid accessing the
line pointer array of a page there, since it won't yet be clear whether
we'll insert on to the page in question at all, let alone split it as a
result.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz%3DYpc1PDdk8OVJDChGJBjT06%3DA0Mbv9HyTLCsOknGcUFg%40mail.gmail.com
2020-11-17 09:45:56 -08:00
Alvaro Herrera
7684b6fbed
indexcmds.c: reorder function prototypes
... out of an overabundance of neatnikism, perhaps.
2020-11-17 14:22:26 -03:00
Peter Geoghegan
a034f8b60c nbtree: Rename nbtinsert.c variables for consistency.
Stop naming special area/opaque pointer variables 'lpageop' in contexts
where it doesn't make sense.  This is a holdover from a time when logic
that performs tasks that are now spread across _bt_insertonpg(),
_bt_findinsertloc(), and _bt_split() was more centralized.  'lpageop'
denotes "left page", which doesn't make sense outside of contexts in
which there isn't also a right page.

Also acquire page flag variables up front within _bt_insertonpg().  This
makes it closer to _bt_split() following refactoring commit bc3087b626.
This allows the page split and retail insert paths to both make use of
the same variables.
2020-11-17 09:01:14 -08:00
Amit Kapila
9653f24ad8 Fix 'skip-empty-xacts' option in test_decoding for streaming mode.
In streaming mode, the transaction can be decoded in multiple streams and
those streams can be interleaved with streams of other transactions. So,
we can't remember the transaction's write status in the logical decoding
context because that might get changed due to some other transactions and
lead to wrong answers for 'skip-empty-xacts' option. We decided to keep
each transaction's write status in the ReorderBufferTxn to avoid
interleaved streams changing the status of some unrelated transactions.

Diagnosed-by: Amit Kapila
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1LR7=XNM_TLmpZMFuV8ZQpoxkem--NZJYf8YXmesbvwLA@mail.gmail.com
2020-11-17 12:14:53 +05:30
Tom Lane
2bd49b493a Don't Insert() a VFD entry until it's fully built.
Otherwise, if FDDEBUG is enabled, the debugging output fails because
it tries to read the fileName, which isn't set up yet (and should in
fact always be NULL).

AFAICT, this has been wrong since Berkeley.  Before 96bf88d52,
it would accidentally fail to crash on platforms where snprintf()
is forgiving about being passed a NULL pointer for %s; but the
file name intended to be included in the debug output wouldn't
ever have shown up.

Report and fix by Greg Nancarrow.  Although this is only visibly
broken in custom-made builds, it still seems worth back-patching
to all supported branches, as the FDDEBUG code is pretty useless
as it stands.

Discussion: https://postgr.es/m/CAJcOf-cUDgm9qYtC_B6XrC6MktMPNRby2p61EtSGZKnfotMArw@mail.gmail.com
2020-11-16 20:32:55 -05:00
Alvaro Herrera
cd9c1b3e19
Rename PGPROC->vacuumFlags to statusFlags
With more flags associated to a PGPROC entry that are not related to
vacuum (currently existing or planned), the name "statusFlags" describes
its purpose better.

(The same is done to the mirroring PROC_HDR->vacuumFlags.)

No functional changes in this commit.

This was suggested first by Hari Babu Kommi in [1] and then by Michael
Paquier at [2].

[1] https://postgr.es/m/CAJrrPGcsDC-oy1AhqH0JkXYa0Z2AgbuXzHPpByLoBGMxfOZMEQ@mail.gmail.com
[2] https://postgr.es/m/20200820060929.GB3730@paquier.xyz

Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20201116182446.qcg3o6szo2zookyr@localhost
2020-11-16 19:42:55 -03:00
Tom Lane
4025e6c466 Do not return NULL for error cases in satisfies_hash_partition().
Since this function is used as a CHECK constraint condition,
returning NULL is tantamount to returning TRUE, which would have the
effect of letting in a row that doesn't satisfy the hash condition.
Admittedly, the cases for which this is done should be unreachable
in practice, but that doesn't make it any less a bad idea.  It also
seems like a dartboard was used to decide which error cases should
throw errors as opposed to returning NULL.

For the checks for NULL input values, I just switched it to returning
false.  There's some argument that an error would be better; but the
case really should be can't-happen in a generated hash constraint,
so it's likely not worth more code for.

For the parent-relation-open-failure case, it seems like we might
as well let relation_open throw an error, instead of having an
impossible-to-diagnose constraint failure.

Back-patch to v11 where this code came in.

Discussion: https://postgr.es/m/24067.1605134819@sss.pgh.pa.us
2020-11-16 16:39:59 -05:00
Tom Lane
ad84ecc98d Use "true" not "TRUE" in one ICU function call.
This was evidently missed in commit 6337865f3, which generally did
s/TRUE/true/ everywhere.  It escaped notice up to now because ICU
versions before ICU 68 provided definitions of "TRUE" and "FALSE"
regardless.  With ICU 68, it fails to compile.

Per report from Condor.  Back-patch to v11 where 6337865f3 came in.
(I've not tested v10, where this call originated, but I imagine
it's fine since we defined TRUE in c.h back then.)

Discussion: https://postgr.es/m/7a6f3336165bfe3ca66abcda7966f9d0@stz-bg.com
2020-11-16 15:16:39 -05:00
Peter Eisentraut
d93ccdea1d Remove unused and deprecated strategy numbers from BRIN code
These were dead code.

Discussion: https://www.postgresql.org/message-id/flat/20201027032511.GF9241@telsasoft.com
2020-11-16 17:25:41 +01:00
Peter Eisentraut
5664b7be5b Normalize comment in empty grammar rules
Change lower case /* empty */ to /* EMPTY */ for consistency with the
majority.

Discussion: https://www.postgresql.org/message-id/flat/e9eed669-e32d-6919-fed4-acc0daea857b%40enterprisedb.com
2020-11-16 11:54:52 +01:00
Peter Eisentraut
591d282e8d Remove code handling removed deprecated containment operators
This removes the code that was there for handling the operators
removed by 2f70fdb064.

Discussion: https://www.postgresql.org/message-id/flat/20201027032511.GF9241@telsasoft.com
2020-11-16 11:51:12 +01:00
Fujii Masao
2945a488a3 Make the standby server promptly handle interrupt signals.
This commit changes the startup process in the standby server so that
it handles the interrupt signals after waiting for wal_retrieve_retry_interval
on the latch and resetting it, before entering another wait on the latch.
This change causes the standby server to promptly handle interrupt signals.

Otherwise, previously, there was the case where the standby needs to
wait extra five seconds to shutdown when the shutdown request arrived
while the startup process was waiting for wal_retrieve_retry_interval
on the latch.

Author: Fujii Masao, but implementation idea is from Soumyadeep Chakraborty
Reviewed-by: Soumyadeep Chakraborty
Discussion: https://postgr.es/m/9d7e6ab0-8a53-ddb9-63cd-289bcb25fe0e@oss.nttdata.com
2020-11-16 18:27:51 +09:00
Michael Paquier
846005e4f3 Relax INSERT privilege requirement for CTAS and matviews WITH NO DATA
When specified, WITH NO DATA does not insert any data into the relation
created, so skip checking for the insert permissions.  With WITH DATA or
WITH NO DATA, it is always required for the user to have CREATE
privileges on the schema targeted for the relation.

Note that plain CREATE TABLE AS or CREATE MATERIALIZED VIEW queries have
begun to work accidentally without INSERT privilege checks as of
874fe3ae, while using EXECUTE or EXPLAIN ANALYZE would fail with the ACL
check, so this makes the behavior for all the command flavors consistent
with each other.  This is arguably a bug fix, but there have been no
complaints about the current behavior either so stable branches are not
changed.

While on it, document properly the privileges requirements for each
commands with more tests for all the scenarios possible, and avoid a
useless bulk-insert allocation when using WITH NO DATA.

Author: Bharath Rupireddy
Reviewed-by: Anastasia Lubennikova, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACWc3N8j0_9nMPz9wcAUnVcdKHzFdDZJ3hVFNEbqtcyG9w@mail.gmail.com
2020-11-16 11:52:40 +09:00
Tom Lane
29d29d652f Fix fuzzy thinking about amcanmulticol versus amcaninclude.
These flags should be independent: in particular an index AM should
be able to say that it supports include columns without necessarily
supporting multiple key columns.  The included-columns patch got
this wrong, possibly aided by the fact that it didn't bother to
update the documentation.

While here, clarify some text about amcanreturn, which was a little
vague about what should happen when amcanreturn reports that only
some of the index columns are returnable.

Noted while reviewing the SP-GiST included-columns patch, which
quite incorrectly (and unsafely) changed SP-GiST to claim
amcanmulticol = true as a workaround for this bug.

Backpatch to v11 where included columns were introduced.
2020-11-15 16:10:58 -05:00
Peter Geoghegan
46cf3c72c3 nbtree: Demote incomplete split "can't happen" error.
Only a basic logic bug in a _bt_insertonpg() caller could lead to a
violation of this invariant (index corruption won't do it).  A "can't
happen" error seems inappropriate (it is arbitrary at best).

Demote the error to a simple assertion.  This matches similar nearby
sanity checks.
2020-11-15 11:53:37 -08:00
Tom Lane
ff94205787 Suppress "warning: variable 'collcollate' set but not used".
Buildfarm members that lack both HAVE_LOCALE_T and USE_ICU have been
complaining about pg_newlocale_from_collation's collcollate variable.
This is evidently fallout from commit 7d1297df0, which removed the
only usage outside those two #ifdef'd code paths.  Mark the variable
pg_attribute_unused(), like its sibling collctype, which has been that
way for a long time.
2020-11-15 12:39:49 -05:00
Tom Lane
92bf7e2d02 Provide the OR REPLACE option for CREATE TRIGGER.
This is mostly straightforward.  However, we disallow replacing
constraint triggers or changing the is-constraint property; perhaps
that can be added later, but the complexity versus benefit tradeoff
doesn't look very good.

Also, no special thought is taken here for whether replacing an
existing trigger should result in changes to queued-but-not-fired
trigger actions.  We just document that if you're surprised by the
results, too bad, don't do that.  (Note that any such pending trigger
activity would have to be within the current session.)

Takamichi Osumi, reviewed at various times by Surafel Temesgen,
Peter Smith, and myself

Discussion: https://postgr.es/m/0DDF369B45A1B44B8A687ED43F06557C010BC362@G01JPEXMBYT03
2020-11-14 17:05:34 -05:00
Michael Paquier
788dd0b839 Fix some typos
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/C36ADFDF-D09A-4EE5-B186-CB46C3653F4C@yesql.se
2020-11-14 11:43:10 +09:00
Tom Lane
ec0294fb2c Support negative indexes in split_part().
This provides a handy way to get, say, the last field of the string.
Use of a negative index in this way has precedent in the nearby
left() and right() functions.

The implementation scans the string twice when N < -1, but it seems
likely that N = -1 will be the huge majority of actual use cases,
so I'm not really excited about adding complexity to avoid that.

Nikhil Benesch, reviewed by Jacob Champion; cosmetic tweakage by me

Discussion: https://postgr.es/m/cbb7f861-6162-3a51-9823-97bc3aa0b638@gmail.com
2020-11-13 13:49:48 -05:00
Bruce Momjian
66a8f09048 change wire protocol data type for history file content
This was marked as BYTEA, but is more like TEXT, which is how we already
pass the history timeline file name.  Internally, we don't do any
encoding or bytea escape handling, but TEXT seems closest.  This should
cause no behavioral change.

Reported-by: Brar Piening

Discussion: https://postgr.es/m/6a1b9cd9-17e3-df67-be55-86102af6bdf5@gmx.de

Backpatch-through: master
2020-11-12 14:08:59 -05:00
Peter Eisentraut
1b2b19f758 Clean up optional rules in grammar
Various rules for optional keywords contained unnecessary rules and
type declarations.  Remove those, thus making the output a tiny bit
smaller.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/e9eed669-e32d-6919-fed4-acc0daea857b%40enterprisedb.com
2020-11-12 08:06:08 +01:00
Fujii Masao
1a2ae7c50f Use standard SIGHUP and SIGTERM handlers in walreceiver.
Commit 1e53fe0e70 changed background processes so that they use
standard SIGHUP handler. Like that, this commit makes walreceiver
use standard SIGHUP and SIGTERM handlers, to simplify the code.

As the side effect of this commit, walreceiver can wake up and process
the configuration files promptly when receiving SIGHUP. Because the
standard SIGHUP handler sets the latch. On the other hand, previously
there could be a time lag between the receipt of SIGHUP and
the process of configuration files since the dedicated handler didn't
set the latch.

Author: Bharath Rupireddy, tweaked by Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/CALj2ACXPorUqePswDtOeM_s82v9RW32E1fYmOPZ5NuE+TWKj_A@mail.gmail.com
2020-11-12 13:25:23 +09:00
Fujii Masao
b62e6056a0 pg_stat_statements: track number of rows processed by REFRESH MATERIALIZED VIEW.
Commit 6023b7ea71 allowed pg_stat_statements to track the number
of rows retrieved or affected by some utility commands including
CREATE MATERIALIZED VIEW. However it did not track the rowcount
of REFRESH MATERIALIZED VIEW. This commit allows pg_stat_statements
to track that.

To track that, this commit changes the query completion for
REFRESH MATERIALIZED VIEW so that it saves the rowcount. But note that
the rowcount is still not displayed in the command completion tag output.
That is, the display_rowcount flag of CMDTAG_REFRESH_MATERIALIZED_VIEW
command tag is left false in cmdtaglist.h. Otherwise, the change of
completion tag output might break applications using it.

Author: Katsuragi Yuta, Seino Yuki
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/71f6bc72f8bbaa06e701f8bd2562c347@oss.nttdata.com
Discussion: https://postgr.es/m/aadbfba9-e4bb-9531-6b3a-d13c31c8f4fe@oss.nttdata.com
2020-11-12 11:26:55 +09:00
Michael Paquier
03f9cd93ea Remove useless SHA256 initialization when not using backup manifests
Attempting to take a base backup with Postgres linking to a build of
OpenSSL with FIPS enabled currently fails with or even without a backup
manifest requested because of this mandatory SHA256 initialization used
for the manifest file itself.  However, there is no need to do this
initialization at all if backup manifests are not needed because there
is no data to append to the manifest.

Note that being able to use backup manifests with OpenSSL+FIPS requires
a switch of the SHA2 implementation to use EVP, which would cause an ABI
breakage so this cannot be backpatched to 13 as it has been already
released, but at least avoiding this SHA256 initialization gives users
the possibility to take a base backup even when specifying --no-manifest
with pg_basebackup.

Author: Michael Paquier
Discussion: https://postgr.es/m/20201110020014.GE1887@paquier.xyz
Backpatch-through: 13
2020-11-12 10:56:33 +09:00
Tomas Vondra
42c63ab6e2 Remove duplicate code in brin_memtuple_initialize
Commit 8bf74967da moved some of the code from brin_new_memtuple to
brin_memtuple_initialize, but this resulted in some of the code being
duplicate. Fix by removing the duplicate lines and backpatch to 10.

Author: Tomas Vondra
Backpatch-through: 10
Discussion: https://postgr.es/m/5eb50c97-9a8e-b691-8c40-1b2a55611c4c%40enterprisedb.com
2020-11-11 18:37:59 +01:00
Peter Eisentraut
0af302af40 Fix some stray whitespace in parser files 2020-11-11 17:37:18 +01:00
Peter Eisentraut
c77f6f50e4 Fix cases of discarding result from list API functions
Two cases violated list APIs by throwing away the return value.  While
the code was technically correct, it relied on internal knowledge of
the list implementation, and the code wasn't really gaining anything
that way.  It is planned to make this a compiler warning in the
future, so just fix these cases by assigning the return value
properly.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/e3753562-99cd-b65f-5aca-687dfd1ec2fc@2ndquadrant.com
2020-11-11 08:03:51 +01:00
Tom Lane
ec29427ce2 Fix and simplify some usages of TimestampDifference().
Introduce TimestampDifferenceMilliseconds() to simplify callers
that would rather have the difference in milliseconds, instead of
the select()-oriented seconds-and-microseconds format.  This gets
rid of at least one integer division per call, and it eliminates
some apparently-easy-to-mess-up arithmetic.

Two of these call sites were in fact wrong:

* pg_prewarm's autoprewarm_main() forgot to multiply the seconds
by 1000, thus ending up with a delay 1000X shorter than intended.
That doesn't quite make it a busy-wait, but close.

* postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute
microseconds not milliseconds, thus ending up with a delay 1000X longer
than intended.  Somebody along the way had noticed this problem but
misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather
than fixing the units.  This was relatively harmless in context, because
we don't care that much about exactly how long this delay is; still,
it's wrong.

There are a few more callers of TimestampDifference() that don't
have a direct need for seconds-and-microseconds, but can't use
TimestampDifferenceMilliseconds() either because they do need
microsecond precision or because they might possibly deal with
intervals long enough to overflow 32-bit milliseconds.  It might be
worth inventing another API to improve that, but that seems outside
the scope of this patch; so those callers are untouched here.

Given the fact that we are fixing some bugs, and the likelihood
that future patches might want to back-patch code that uses this
new API, back-patch to all supported branches.

Alexey Kondratov and Tom Lane

Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru
2020-11-10 22:51:54 -05:00
Magnus Hagander
3f16cb505d Fix out of date comment 2020-11-10 13:15:44 +01:00
Magnus Hagander
d2e4bf688e Remove -o option to postmaster
This option was declared obsolete many years ago.

Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/CABUevEyOE=9CQwZm2j=vwP5+6OLCSoxn9pBjK8gyRdkTzMfqtQ@mail.gmail.com
2020-11-10 13:15:01 +01:00
Andres Freund
6c57f2ed16 jit: Add support for LLVM 12.
LLVM 12, to be released in a few months, made some breaking changes to
the Orc JIT interface. OrcV2 eventually will make it easier to support
features like concurrent JIT compilation, but this commit only allows
to compile against LLVM 12.

This commit is a bit bigger than desirable. That partially is because
the V2 interface is more granular than V1 interface, but also because
I chose to make some minor changes to < LLVM 12 code to keep the code
somewhat readable.

The LLVM 12 support will need to be backpatched. I plan to do so after
the patch stewed on the buildfarm for a few days.

Author: Andres Freund
Discussion: https://postgr.es/m/20201016011244.pmyvr3ee2gbzplq4@alap3.anarazel.de
2020-11-09 20:01:33 -08:00
Peter Geoghegan
180cf876d4 Remove ineffective heapam CHECK_FOR_INTERRUPTS().
Remove a CHECK_FOR_INTERRUPTS() call that could never actually handle an
interrupt.  We always have a heap page buffer lock at this point.
Having a useless CHECK_FOR_INTERRUPTS() call is harmless but misleading.

It is probably possible to work around the immediate problem by moving
the CHECK_FOR_INTERRUPTS() to before the heap page buffer lock is
acquired.  That isn't enough to make the function responsive to
interrupts, though.  The index AM caller will still hold an exclusive
buffer lock of its own.
2020-11-09 09:00:12 -08:00
Noah Misch
0c3185e963 In security-restricted operations, block enqueue of at-commit user code.
Specifically, this blocks DECLARE ... WITH HOLD and firing of deferred
triggers within index expressions and materialized view queries.  An
attacker having permission to create non-temp objects in at least one
schema could execute arbitrary SQL functions under the identity of the
bootstrap superuser.  One can work around the vulnerability by disabling
autovacuum and not manually running ANALYZE, CLUSTER, REINDEX, CREATE
INDEX, VACUUM FULL, or REFRESH MATERIALIZED VIEW.  (Don't restore from
pg_dump, since it runs some of those commands.)  Plain VACUUM (without
FULL) is safe, and all commands are fine when a trusted user owns the
target object.  Performance may degrade quickly under this workaround,
however.  Back-patch to 9.5 (all supported versions).

Reviewed by Robert Haas.  Reported by Etienne Stalmans.

Security: CVE-2020-25695
2020-11-09 07:32:09 -08:00
Thomas Munro
d50e3b1f8d Fix assertion in collation version lookup.
Commit 257836a7 included an assertion that a version lookup routine is
not trying to look up "C" or "POSIX", but that case is reachable with
the user-facing SQL function pg_collation_actual_version().  Remove the
assertion.
2020-11-08 20:45:29 +13:00
Peter Geoghegan
5a2f154a2e Improve nbtree README's LP_DEAD section.
The description of how LP_DEAD bit setting by index scans works
following commit 2ed5b87f was rather unclear.  Clean that up a bit.

Also refer to LP_DEAD bit setting within _bt_check_unique() at the start
of the same section.  This mechanism may actually be more important than
the generic kill_prior_tuple mechanism that the section focuses on, so
it at least deserves to be mentioned in passing.
2020-11-07 18:51:12 -08:00
Alvaro Herrera
52eec1c53a
Message style improvements
* Avoid pointlessly highlighting that an index vacuum was executed by a
  parallel worker; user doesn't care.

* Don't give the impression that a non-concurrent reindex of an invalid
  index on a TOAST table would work, because it wouldn't.

* Add a "translator:" comment for a mysterious message.

Discussion: https://postgr.es/m/20201107034943.GA16596@alvherre.pgsql
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2020-11-07 19:33:43 -03:00
Peter Eisentraut
bdc4edbea6 Move catalog index declarations
Move the system catalog index declarations from catalog/indexing.h to
the respective parent tables' catalog/pg_*.h files.  The original
reason for having it split was that the old genbki system produced the
output in the order of the catalog files it read, so all the indexing
stuff needed to come separately.  But this is no longer the case, and
keeping it together makes more sense.

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/c7cc82d6-f976-75d6-2e3e-b03d2cab26bb@2ndquadrant.com
2020-11-07 12:26:24 +01:00
Peter Eisentraut
b4c9695e79 Move catalog toast table declarations
Move the system catalog toast table declarations from
catalog/toasting.h to the respective parent tables' catalog/pg_*.h
files.  The original reason for having it split was that the old
genbki system produced the output in the order of the catalog files it
read, so all the toasting stuff needed to come separately.  But this
is no longer the case, and keeping it together makes more sense.

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/c7cc82d6-f976-75d6-2e3e-b03d2cab26bb@2ndquadrant.com
2020-11-07 12:26:24 +01:00
Alvaro Herrera
623644f02c
Plug memory leak in index_get_partition
The list of indexes was being leaked when asked for an index that
doesn't have an index partition in the table partition.  Not a common
case admittedly --and in most cases where it occurs, caller throws an
error anyway-- but worth fixing for cleanliness and in case any
third-party code is calling this function.

While at it, remove use of lfirst_oid() to obtain a value we already
have.

Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20201105203606.GF22691@telsasoft.com
2020-11-06 22:52:16 -03:00
Michael Paquier
a05dbf477b Add GUC_LIST_INPUT and GUC_LIST_QUOTE to unix_socket_directories
This should have been done in the initial commit that made
unix_socket_directories a list as of c9b0cbe.  This change allows to
support correctly the case of ALTER SYSTEM, where it is possible to
specify multiple paths as a list, like the following pattern where
flattening is applied to each item:
ALTER SYSTEM SET unix_socket_directories = '/path1', '/path2';

Any parameters specified in postgresql.conf are parsed the same way, so
there is no compatibility change.  pg_dump has a hardcoded list of
parameters marked with GUC_LIST_QUOTE, that gets its routine update.
These are reordered alphabetically for clarity.

Author: Ian Lawrence Barwick
Reviewed-by: Peter Eisentraunt, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CAB8KJ=iMOtNY6_sUwV=LQVCJ2zgYHBDyNzVfvE5GN3WQ3v9kQg@mail.gmail.com
2020-11-07 10:30:22 +09:00
Tomas Vondra
7577dd8480 Properly detoast data in brin_form_tuple
brin_form_tuple failed to consider the values may be toasted, inserting
the toast pointer into the index. This may easily result in index
corruption, as the toast data may be deleted and cleaned up by vacuum.
The cleanup however does not care about indexes, leaving invalid toast
pointers behind, which triggers errors like this:

  ERROR:  missing chunk number 0 for toast value 16433 in pg_toast_16426

A less severe consequence are inconsistent failures due to the index row
being too large, depending on whether brin_form_tuple operated on plain
or toasted version of the row. For example

    CREATE TABLE t (val TEXT);
    INSERT INTO t VALUES ('... long value ...')
    CREATE INDEX idx ON t USING brin (val);

would likely succeed, as the row would likely include toast pointer.
Switching the order of INSERT and CREATE INDEX would likely fail:

    ERROR:  index row size 8712 exceeds maximum 8152 for index "idx"

because this happens before the row values are toasted.

The bug exists since PostgreSQL 9.5 where BRIN indexes were introduced.
So backpatch all the way back.

Author: Tomas Vondra
Reviewed-by: Alvaro Herrera
Backpatch-through: 9.5
Discussion: https://postgr.es/m/20201001184133.oq5uq75sb45pu3aw@development
Discussion: https://postgr.es/m/20201104010544.zexj52mlldagzowv%40development
2020-11-07 00:39:19 +01:00
Tom Lane
eeda7f6338 Revert "Accept relations of any kind in LOCK TABLE".
Revert 59ab4ac32, as well as the followup fix 33862cb9c, in all
branches.  We need to think a bit harder about what the behavior
of LOCK TABLE on views should be, and there's no time for that
before next week's releases.  We'll take another crack at this
later.

Discussion: https://postgr.es/m/16703-e348f58aab3cf6cc@postgresql.org
2020-11-06 16:17:56 -05:00
Magnus Hagander
5ee180a394 Add pg_strong_random_init function to initialize random number generator
Currently only OpenSSL requires this initialization, but in the future
other SSL implementations are likely to need it as well. Abstracting
this functionality out into a separate function makes this cleaner and
more clear, and also removes the dependency on OpenSSL headers from
fork_process.c.

OpenSSL is special in that we need to initialize this random number
generator even if we're not going to use it directly, until we drop
support for everything prior to OpenSSL 1.1.1. (And of course also if we
actually use it). All other implementations are left empty at this time,
but more are expected to be added in the future.

Author: Daniel Gustafsson <daniel@yesql.se>, Michael Paquier <michael@paquier.xyz>
Reviewed-By: Magnus Hagander <magnus@hagander.net>
Discussion: https://postgr.es/m/F6291C3C-747C-4C93-BCE0-28BB420B1FF5@yesql.se
2020-11-06 13:21:28 +01:00
Amit Kapila
4f841ce3f7 Use strlcpy instead of memcpy for copying the slot name in pgstat.c.
There is no outright bug here but it is better to be consistent with the
usage at other places in the same file. In the passing, fix a wrong
assertion in pgstat_recv_replslot.

Author: Kyotaro Horiguchi
Reviewed-by: Sawada Masahiko and Amit Kapila
Discussion: https://postgr.es/m/20201104.175523.1704166915688949637.horikyota.ntt@gmail.com
2020-11-06 08:12:48 +05:30
Peter Geoghegan
efc5dcfd8a Fix wal_consistency_checking nbtree bug.
wal_consistency_checking indicated an inconsistency in certain cases
involving nbtree page deletion.  The underlying issue is that there was
a minor difference between the page image produced after a REDO routine
ran and the corresponding page image following original execution.

This harmless inconsistency has been around forever.  We more or less
expect total consistency among even deleted nbtree pages these days,
though, so this won't do anymore.

To fix, tweak the REDO routine to match original execution.

Oversight in commit f47b5e13.
2020-11-05 15:01:40 -08:00
Tom Lane
5b7bfc3972 Don't throw an error for LOCK TABLE on a self-referential view.
LOCK TABLE has complained about "infinite recursion" when applied
to a self-referential view, ever since we made it recurse into views
in v11.  However, that breaks pg_dump's new assumption that it's
okay to lock every relation.  There doesn't seem to be any good
reason to throw an error: if we just abandon the recursion, we've
still satisfied the requirement of locking every referenced relation.

Per bug #16703 from Andrew Bille (via Alexander Lakhin).

Discussion: https://postgr.es/m/16703-e348f58aab3cf6cc@postgresql.org
2020-11-05 11:44:32 -05:00
Peter Geoghegan
48e1291342 Fix nbtree cleanup-only VACUUM stats inaccuracies.
Logic for counting heap TIDs from posting list tuples (added by commit
0d861bbb) was faulty.  It didn't count any TIDs/index tuples in the
event of no callback being set.  This meant that we incorrectly counted
no index tuples in clean-up only VACUUMs, which could lead to
pg_class.reltuples being spuriously set to 0 in affected indexes.

To fix, go back to counting items from the page in cases where there is
no callback.  This approach isn't very accurate, but it works well
enough in practice while avoiding the expense of accessing every index
tuple during cleanup-only VACUUMs.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
https://postgr.es/m/20201023174451.69e358f1@firost
Backpatch: 13-, where nbtree deduplication was introduced
2020-11-04 18:42:27 -08:00
Thomas Munro
c732c3f8c1 Fix unlinking of SLRU segments.
Commit dee663f7 intended to drop any queued up fsync requests before
unlinking segment files, but missed a code path.  Fix, by centralizing
the forget-and-unlink code into a single function.

Reported-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Discussion: https://postgr.es/m/20201104013205.icogbi773przyny5%40development
2020-11-05 13:49:49 +13:00
Tom Lane
40c24bfef9 Improve our ability to regurgitate SQL-syntax function calls.
The SQL spec calls out nonstandard syntax for certain function calls,
for example substring() with numeric position info is supposed to be
spelled "SUBSTRING(string FROM start FOR count)".  We accept many
of these things, but up to now would not print them in the same format,
instead simplifying down to "substring"(string, start, count).
That's long annoyed me because it creates an interoperability
problem: we're gratuitously injecting Postgres-specific syntax into
what might otherwise be a perfectly spec-compliant view definition.
However, the real reason for addressing it right now is to support
a planned change in the semantics of EXTRACT() a/k/a date_part().
When we switch that to returning numeric, we'll have the parser
translate EXTRACT() to some new function name (might as well be
"extract" if you ask me) and then teach ruleutils.c to reverse-list
that per SQL spec.  In this way existing calls to date_part() will
continue to have the old semantics.

To implement this, invent a new CoercionForm value COERCE_SQL_SYNTAX,
and make the parser insert that rather than COERCE_EXPLICIT_CALL when
the input has SQL-spec decoration.  (But if the input has the form of
a plain function call, continue to mark it COERCE_EXPLICIT_CALL, even
if it's calling one of these functions.)  Then ruleutils.c recognizes
COERCE_SQL_SYNTAX as a cue to emit SQL call syntax.  It can know
which decoration to emit using hard-wired knowledge about the
functions that could be called this way.  (While this solution isn't
extensible without manual additions, neither is the grammar, so this
doesn't seem unmaintainable.)  Notice that this solution will
reverse-list a function call with SQL decoration only if it was
entered that way; so dump-and-reload will not by itself produce any
changes in the appearance of views.

This requires adding a CoercionForm field to struct FuncCall.
(I couldn't resist the temptation to rearrange that struct's
field order a tad while I was at it.)  FuncCall doesn't appear
in stored rules, so that change isn't a reason for a catversion
bump, but I did one anyway because the new enum value for
CoercionForm fields could confuse old backend code.

Possible future work:

* Perhaps CoercionForm should now be renamed to DisplayForm,
or something like that, to reflect its more general meaning.
This'd require touching a couple hundred places, so it's not
clear it's worth the code churn.

* The SQLValueFunction node type, which was invented partly for
the same goal of improving SQL-compatibility of view output,
could perhaps be replaced with regular function calls marked
with COERCE_SQL_SYNTAX.  It's unclear if this would be a net
code savings, however.

Discussion: https://postgr.es/m/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
2020-11-04 12:34:50 -05:00
Tom Lane
f21636e5d5 Remove useless entries for aggregate functions from fmgrtab.c.
Gen_fmgrtab.pl treated aggregate functions the same as other built-in
functions, which is wasteful because there is no real need to have
entries for them in the fmgr_builtins[] table.  Suppressing those
entries saves about 3KB in the compiled table on my machine; which
is not a lot but it's not nothing either, considering that that
table is pretty "hot".  The only outside code change needed is
that ExecInitWindowAgg() can't be allowed to call fmgr_info_cxt()
on a plain aggregate function.  But that saves a few cycles anyway.

Having done that, the aggregate_dummy() function is unreferenced
and might as well be dropped.  Using "aggregate_dummy" as the prosrc
value for an aggregate is now just a documentation convention not
something that matters.  There was some discussion of using NULL
instead to save a few bytes in pg_proc, but we'd have to remove
prosrc's BKI_FORCE_NOT_NULL marking which doesn't seem a great idea.
Anyway, it's possible there's client-side code that expects to
see "aggregate_dummy" there, so I'm loath to change it without a
strong reason.

Discussion: https://postgr.es/m/533989.1604263665@sss.pgh.pa.us
2020-11-04 11:25:56 -05:00
Fujii Masao
113d3591b8 Fix segmentation fault that commit ac22929a26 caused.
Commit ac22929a26 changed recoveryWakeupLatch so that it's reset to
NULL at the end of recovery. This change could cause a segmentation fault
in the buildfarm member 'elver'.

Previously the latch was reset to NULL after calling ShutdownWalRcv().
But there could be a window between ShutdownWalRcv() and the actual
exit of walreceiver. If walreceiver set the latch during that window,
the segmentation fault could happen.

To fix the issue, this commit changes walreceiver so that it sets
the latch only when the latch has not been reset to NULL yet.

Author: Fujii Masao
Discussion: https://postgr.es/m/5c1f8a85-747c-7bf9-241e-dd467d8a3586@iki.fi
2020-11-04 21:49:00 +09:00
Peter Eisentraut
560564d3ad Enable hash partitioning of text arrays
hash_array_extended() needs to pass PG_GET_COLLATION() to the hash
function of the element type.  Otherwise, the hash function of a
collation-aware data type such as text will error out, since the
introduction of nondeterministic collation made hash functions require
a collation, too.

The consequence of this is that before this change, hash partitioning
using an array over text in the partition key would not work.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/32c1fdae-95c6-5dc6-058a-a90330a3b621%40enterprisedb.com
2020-11-04 12:46:28 +01:00
Fujii Masao
ac22929a26 Get rid of the dedicated latch for signaling the startup process.
This commit gets rid of the dedicated latch for signaling the startup
process in favor of using its procLatch,  since that comports better
with possible generic signal handlers using that latch.

Commit 1e53fe0e70 changed background processes so that they use standard
SIGHUP handler. Like that, this commit also makes the startup process use
standard SIGHUP handler to simplify the code.

Author: Fujii Masao
Reviewed-by: Bharath Rupireddy, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACXPorUqePswDtOeM_s82v9RW32E1fYmOPZ5NuE+TWKj_A@mail.gmail.com
2020-11-04 16:43:43 +09:00
Fujii Masao
02d332297f Use standard SIGHUP handler in syslogger.
Commit 1e53fe0e70 changed background processes so that they use
standard SIGHUP handler. Like that, this commit makes syslogger use
standard SIGHUP handler to simplify the code.

Author: Bharath Rupireddy
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CALj2ACXPorUqePswDtOeM_s82v9RW32E1fYmOPZ5NuE+TWKj_A@mail.gmail.com
2020-11-04 14:48:02 +09:00
Thomas Munro
9f12a3b95d Tolerate version lookup failure for old style Windows locale names.
Accept that we can't get versions for such locale names for now.  Users
will need to specify the newer language tag format to enable the
collation versioning feature.  It's not clear that we can do automatic
conversion from the old style to the new style reliably enough for this
purpose.

Unfortunately, this means that collation versioning probably won't work
for the default collation unless you provide something like en-US at
initdb or CREATE DATABASE time (though, for reasons not yet understood,
it does seem to work on some systems).  It'd be nice to find a better
solution, or document this quirk if we settle on it, but this should
unbreak the 3 failing build farm animals in the meantime.

Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-04 15:13:08 +13:00