Commit Graph

318 Commits

Author SHA1 Message Date
Alvaro Herrera eb43bdbf51
Make detach-partition-concurrently-4 less timing sensitive
Same as 5e0b1aeb2d, for the companion test file.

This one seems lower probability (only two failures in a month of runs);
I was hardly able to reproduce a failure without a patch, so the fact
that I was also unable to reproduce one with it doesn't say anything.
We'll have to wait for further buildfarm results to see if we need any
further adjustments.

Discussion: https://postgr.es/m/20210524090712.GA3771394@rfd.leadboat.com
2021-05-25 19:44:55 -04:00
Alvaro Herrera 5e0b1aeb2d
Make detach-partition-concurrently-3 less timing-sensitive
This recently added test has shown to be too sensitive to timing when
sending a cancel to a session waiting for a lock.

We fix this by running a no-op query in the blocked session immediately
after the cancel; this avoids the session that sent the cancel sending
another query immediately before the cancel has been reported.
Idea by Noah Misch.

With that fix, we sometimes see that the cancel error report is shown
only relative to the step that is cancelled, instead of together with
the step that sends the cancel.  To increase the probability that both
steps are shown togeter, add a 0.1s sleep to the cancel.  In normal
conditions this appears sufficient to silence most failures, but we'll
see that the slower buildfarm members say about it.

Reported-by: Takamichi Osumi <osumi.takamichi@fujitsu.com>
Discussion: https://postgr.es/m/OSBPR01MB4888C4ABA361C7E81094AC66ED269@OSBPR01MB4888.jpnprd01.prod.outlook.com
2021-05-25 12:53:57 -04:00
Tom Lane 84f5c2908d Restore the portal-level snapshot after procedure COMMIT/ROLLBACK.
COMMIT/ROLLBACK necessarily destroys all snapshots within the session.
The original implementation of intra-procedure transactions just
cavalierly did that, ignoring the fact that this left us executing in
a rather different environment than normal.  In particular, it turns
out that handling of toasted datums depends rather critically on there
being an outer ActiveSnapshot: otherwise, when SPI or the core
executor pop whatever snapshot they used and return, it's unsafe to
dereference any toasted datums that may appear in the query result.
It's possible to demonstrate "no known snapshots" and "missing chunk
number N for toast value" errors as a result of this oversight.

Historically this outer snapshot has been held by the Portal code,
and that seems like a good plan to preserve.  So add infrastructure
to pquery.c to allow re-establishing the Portal-owned snapshot if it's
not there anymore, and add enough bookkeeping support that we can tell
whether it is or not.

We can't, however, just re-establish the Portal snapshot as part of
COMMIT/ROLLBACK.  As in normal transaction start, acquiring the first
snapshot should wait until after SET and LOCK commands.  Hence, teach
spi.c about doing this at the right time.  (Note that this patch
doesn't fix the problem for any PLs that try to run intra-procedure
transactions without using SPI to execute SQL commands.)

This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD,
rename that to allow_nonatomic.

replication/logical/worker.c also needs some fixes, because it wasn't
careful to hold a snapshot open around AFTER trigger execution.
That code doesn't use a Portal, which I suspect someday we're gonna
have to fix.  But for now, just rearrange the order of operations.
This includes back-patching the recent addition of finish_estate()
to centralize the cleanup logic there.

This also back-patches commit 2ecfeda3e into v13, to improve the
test coverage for worker.c (it was that test that exposed that
worker.c's snapshot management is wrong).

Per bug #15990 from Andreas Wicht.  Back-patch to v11 where
intra-procedure COMMIT was added.

Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
2021-05-21 14:03:59 -04:00
Tom Lane f21fadafaf Avoid detoasting failure after COMMIT inside a plpgsql FOR loop.
exec_for_query() normally tries to prefetch a few rows at a time
from the query being iterated over, so as to reduce executor
entry/exit overhead.  Unfortunately this is unsafe if we have
COMMIT or ROLLBACK within the loop, because there might be
TOAST references in the data that we prefetched but haven't
yet examined.  Immediately after the COMMIT/ROLLBACK, we have
no snapshots in the session, meaning that VACUUM is at liberty
to remove recently-deleted TOAST rows.

This was originally reported as a case triggering the "no known
snapshots" error in init_toast_snapshot(), but even if you miss
hitting that, you can get "missing toast chunk", as illustrated
by the added isolation test case.

To fix, just disable prefetching in non-atomic contexts.  Maybe
there will be performance complaints prompting us to work harder
later, but it's not clear at the moment that this really costs
much, and I doubt we'd want to back-patch any complicated fix.

In passing, adjust that error message in init_toast_snapshot()
to be a little clearer about the likely cause of the problem.

Patch by me, based on earlier investigation by Konstantin Knizhnik.

Per bug #15990 from Andreas Wicht.  Back-patch to v11 where
intra-procedure COMMIT was added.

Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
2021-05-20 18:32:37 -04:00
Alvaro Herrera d6b8d29419
Allow a partdesc-omitting-partitions to be cached
Makes partition descriptor acquisition faster during the transient
period in which a partition is in the process of being detached.

This also adds the restriction that only one partition can be in
pending-detach state for a partitioned table.

While at it, return find_inheritance_children() API to what it was
before 71f4c8c6f7, and create a separate
find_inheritance_children_extended() that returns detailed info about
detached partitions.

(This incidentally fixes a bug in 8aba932251 whereby a memory context
holding a transient partdesc is reparented to a NULL PortalContext,
leading to permanent leak of that memory.  The fix is to no longer rely
on reparenting contexts to PortalContext.   Reported by Amit Langote.)

Per gripe from Amit Langote
Discussion: https://postgr.es/m/CA+HiwqFgpP1LxJZOBYGt9rpvTjXXkg5qG2+Xch2Z1Q7KrqZR1A@mail.gmail.com
2021-04-28 15:44:35 -04:00
Alvaro Herrera 8aba932251
Fix relcache inconsistency hazard in partition detach
During queries coming from ri_triggers.c, we need to omit partitions
that are marked pending detach -- otherwise, the RI query is tricked
into allowing a row into the referencing table whose corresponding row
is in the detached partition.  Which is bogus: once the detach operation
completes, the row becomes an orphan.

However, the code was not doing that in repeatable-read transactions,
because relcache kept a copy of the partition descriptor that included
the partition, and used it in the RI query.  This commit changes the
partdesc cache code to only keep descriptors that aren't dependent on
a snapshot (namely: those where no detached partition exist, and those
where detached partitions are included).  When a partdesc-without-
detached-partitions is requested, we create one afresh each time; also,
those partdescs are stored in PortalContext instead of
CacheMemoryContext.

find_inheritance_children gets a new output *detached_exist boolean,
which indicates whether any partition marked pending-detach is found.
Its "include_detached" input flag is changed to "omit_detached", because
that name captures desired the semantics more naturally.
CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are
identically renamed.

This was noticed because a buildfarm member that runs with relcache
clobbering, which would not keep the improperly cached partdesc, broke
one test, which led us to realize that the expected output of that test
was bogus.  This commit also corrects that expected output.

Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us
2021-04-22 15:13:25 -04:00
Noah Misch 455dbc010b Use "-I." in directories holding Bison parsers, for Oracle compilers.
With the Oracle Developer Studio 12.6 compiler, #line directives alter
the current source file location for purposes of #include "..."
directives.  Hence, a VPATH build failed with 'cannot find include file:
"specscanner.c"'.  With two exceptions, parser-containing directories
already add "-I. -I$(srcdir)"; eliminate the exceptions.  Back-patch to
9.6 (all supported versions).
2021-04-12 19:24:41 -07:00
Alvaro Herrera 71f4c8c6f7
ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY
Allow a partition be detached from its partitioned table without
blocking concurrent queries, by running in two transactions and only
requiring ShareUpdateExclusive in the partitioned table.

Because it runs in two transactions, it cannot be used in a transaction
block.  This is the main reason to use dedicated syntax: so that users
can choose to use the original mode if they need it.  But also, it
doesn't work when a default partition exists (because an exclusive lock
would still need to be obtained on it, in order to change its partition
constraint.)

In case the second transaction is cancelled or a crash occurs, there's
ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final
steps.

The main trick to make this work is the addition of column
pg_inherits.inhdetachpending, initially false; can only be set true in
the first part of this command.  Once that is committed, concurrent
transactions that use a PartitionDirectory will include or ignore
partitions so marked: in optimizer they are ignored if the row is marked
committed for the snapshot; in executor they are always included.  As a
result, and because of the way PartitionDirectory caches partition
descriptors, queries that were planned before the detach will see the
rows in the detached partition and queries that are planned after the
detach, won't.

A CHECK constraint is created that duplicates the partition constraint.
This is probably not strictly necessary, and some users will prefer to
remove it afterwards, but if the partition is re-attached to a
partitioned table, the constraint needn't be rechecked.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-03-25 18:00:28 -03:00
Heikki Linnakangas 6214e2b228 Fix permission checks on constraint violation errors on partitions.
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.

The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.

ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.

With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.

NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.

Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.

Reviewed-by: Amit Langote
Security: CVE-2021-3393
2021-02-08 11:01:51 +02:00
Noah Misch 8a54e12a38 Fix CREATE INDEX CONCURRENTLY for simultaneous prepared transactions.
In a cluster having used CREATE INDEX CONCURRENTLY while having enabled
prepared transactions, queries that use the resulting index can silently
fail to find rows.  Fix this for future CREATE INDEX CONCURRENTLY by
making it wait for prepared transactions like it waits for ordinary
transactions.  This expands the VirtualTransactionId structure domain to
admit prepared transactions.  It may be necessary to reindex to recover
from past occurrences.  Back-patch to 9.5 (all supported versions).

Andrey Borodin, reviewed (in earlier versions) by Tom Lane and Michael
Paquier.

Discussion: https://postgr.es/m/2E712143-97F7-4890-B470-4A35142ABC82@yandex-team.ru
2021-01-30 00:00:27 -08:00
Tom Lane 58cd8dca3d Avoid redundantly prefixing PQerrorMessage for a connection failure.
libpq's error messages for connection failures pretty well stand on
their own, especially since commits 52a10224e/27a48e5a1.  Prefixing
them with 'could not connect to database "foo"' or the like is just
redundant, and perhaps even misleading if the specific database name
isn't relevant to the failure.  (When it is, we trust that the
backend's error message will include the DB name.)  Indeed, psql
hasn't used any such prefix in a long time.  So, make all our other
programs and documentation examples agree with psql's practice.

Discussion: https://postgr.es/m/1094524.1611266589@sss.pgh.pa.us
2021-01-22 16:52:31 -05:00
Tom Lane 800d93f314 Allow pg_regress.c wrappers to postprocess test result files.
Add an optional callback to regression_main() that, if provided,
is invoked on each test output file before we try to compare it
to the expected-result file.

The main and isolation test programs don't need this (yet).
In pg_regress_ecpg, add a filter that eliminates target-host
details from "could not connect" error reports.  This filter
doesn't do anything as of this commit, but it will be needed
by the next one.

In the long run we might want to provide some more general,
perhaps pattern-based, filtering mechanism for test output.
For now, this will solve the immediate problem.

Discussion: https://postgr.es/m/BN6PR05MB3492948E4FD76C156E747E8BC9160@BN6PR05MB3492.namprd05.prod.outlook.com
2021-01-11 13:43:19 -05:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Tom Lane 7ca37fb040 Use setenv() in preference to putenv().
Since at least 2001 we've used putenv() and avoided setenv(), on the
grounds that the latter was unportable and not in POSIX.  However,
POSIX added it that same year, and by now the situation has reversed:
setenv() is probably more portable than putenv(), since POSIX now
treats the latter as not being a core function.  And setenv() has
cleaner semantics too.  So, let's reverse that old policy.

This commit adds a simple src/port/ implementation of setenv() for
any stragglers (we have one in the buildfarm, but I'd not be surprised
if that code is never used in the field).  More importantly, extend
win32env.c to also support setenv().  Then, replace usages of putenv()
with setenv(), and get rid of some ad-hoc implementations of setenv()
wannabees.

Also, adjust our src/port/ implementation of unsetenv() to follow the
POSIX spec that it returns an error indicator, rather than returning
void as per the ancient BSD convention.  I don't feel a need to make
all the call sites check for errors, but the portability stub ought
to match real-world practice.

Discussion: https://postgr.es/m/2065122.1609212051@sss.pgh.pa.us
2020-12-30 12:56:06 -05:00
Andres Freund 94bc27b576 Centralize horizon determination for temp tables, fixing bug due to skew.
This fixes a bug in the edge case where, for a temp table, heap_page_prune()
can end up with a different horizon than heap_vacuum_rel(). Which can trigger
errors like "ERROR: cannot freeze committed xmax ...".

The bug was introduced due to interaction of a7212be8b9 "Set cutoff xmin more
aggressively when vacuuming a temporary table." with dc7420c2c9 "snapshot
scalability: Don't compute global horizons while building snapshots.".

The problem is caused by lazy_scan_heap() assuming that the only reason its
HeapTupleSatisfiesVacuum() call would return HEAPTUPLE_DEAD is if the tuple is
a HOT tuple, or if the tuple's inserting transaction has aborted since the
heap_page_prune() call. But after a7212be8b9 that was also possible in other
cases for temp tables, because heap_page_prune() uses a different visibility
test after dc7420c2c9.

The fix is fairly simple: Move the special case logic for temp tables from
vacuum_set_xid_limits() to the infrastructure introduced in dc7420c2c9. That
ensures that the horizon used for pruning is at least as aggressive as the one
used by lazy_scan_heap(). The concrete horizon used for temp tables is
slightly different than the logic in dc7420c2c9, but should always be as
aggressive as before (see comments).

A significant benefit to centralizing the logic procarray.c is that now the
more aggressive horizons for temp tables does not just apply to VACUUM but
also to e.g. HOT pruning and the nbtree killtuples logic.

Because isTopLevel is not needed by vacuum_set_xid_limits() anymore, I
undid the the related changes from a7212be8b9.

This commit also adds an isolation test ensuring that the more aggressive
vacuuming and pruning of temp tables keeps working.

Debugged-By: Amit Kapila <amit.kapila16@gmail.com>
Debugged-By: Tom Lane <tgl@sss.pgh.pa.us>
Debugged-By: Ashutosh Sharma <ashu.coek88@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20201014203103.72oke6hqywcyhx7s@alap3.anarazel.de
Discussion: https://postgr.es/m/20201015083735.derdzysdtqdvxshp@alap3.anarazel.de
2020-10-28 18:02:31 -07:00
Alvaro Herrera 2203ede9ae Install pg_isolation_regress and isolationtester
We already install assorted tools for testing extensions, but these two
were missing.  Having them installed, and after ISOLATION support was
added to PGXS's  makefiles by d3c09b9b13, helps third-party modules
usefully include isolation tests.  Compare c3a0818460.

Author: Craig Ringer <craig.ringer@enterprisedb.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAMsr+YFsCMH3B4uOPFE+2qWM6k=o=hf9LGiPNCfwqKdUPz_BsQ@mail.gmail.com
2020-10-15 13:09:29 -03:00
Alvaro Herrera f481d28232
Check default partitions constraints while descending
Partitioning tuple route code assumes that the partition chosen while
descending the partition hierarchy is always the correct one.  This is
true except when the partition is the default partition and another
partition has been added concurrently: the partition constraint changes
and we don't recheck it.  This can lead to tuples mistakenly being added
to the default partition that should have been rejected.

Fix by rechecking the default partition constraint while descending the
hierarchy.

An isolation test based on the reproduction steps described by Hao Wu
(with tweaks for extra coverage) is included.

Backpatch to 12, where this bug came in with 898e5e3290.

Reported by: Hao Wu <hawu@vmware.com>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CA+HiwqFqBmcSSap4sFnCBUEL_VfOMmEKaQ3gwUhyfa4c7J_-nA@mail.gmail.com
Discussion: https://postgr.es/m/DM5PR0501MB3910E97A9EDFB4C775CF3D75A42F0@DM5PR0501MB3910.namprd05.prod.outlook.com
2020-09-08 19:35:15 -03:00
Michael Paquier 6785974287 Remove isolation test reindex-partitions
The isolation test added by a6642b3 is proving to be unstable, as once
the first transaction holding a lock on the top-most partitioned table
or on a partition commits, the commit order of the follow-up DROP TABLE
and REINDEX could become reversed depending on the timing.

The only part of the test that could be entirely reliable is the one
using a SHARE lock, allowing REINDEX to commit first, but it is the
least interesting of the set.

Per buildfarm members rorqual and mylodon.

Discussion: https://postgr.es/m/E1kFSBj-00062c-Mu@gemulon.postgresql.org
2020-09-08 11:15:21 +09:00
Michael Paquier a6642b3ae0 Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one.  This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).

This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on.  This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.

Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed.  However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.

Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.

Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 10:09:22 +09:00
Michael Paquier 1d65416661 Improve handling of dropped relations for REINDEX DATABASE/SCHEMA/SYSTEM
When multiple relations are reindexed, a scan of pg_class is done first
to build the list of relations to work on.  However the REINDEX logic
has never checked if a relation listed still exists when beginning the
work on it, causing for example sudden cache lookup failures.

This commit adds safeguards against dropped relations for REINDEX,
similarly to VACUUM or CLUSTER where we try to open the relation,
ignoring it if it is missing.  A new option is added to the REINDEX
routines to control if a missed relation is OK to ignore or not.

An isolation test, based on REINDEX SCHEMA, is added for the concurrent
and non-concurrent cases.

Author: Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/20200813043805.GE11663@paquier.xyz
2020-09-02 09:08:12 +09:00
Peter Geoghegan 5940ffb221 Avoid update conflict out serialization anomalies.
SSI's HeapCheckForSerializableConflictOut() test failed to correctly
handle conditions involving a concurrently inserted tuple which is later
concurrently updated by a separate transaction .  A SELECT statement
that called HeapCheckForSerializableConflictOut() could end up using the
same XID (updater's XID) for both the original tuple, and the successor
tuple, missing the XID of the xact that created the original tuple
entirely.  This only happened when neither tuple from the chain was
visible to the transaction's MVCC snapshot.

The observable symptoms of this bug were subtle.  A pair of transactions
could commit, with the later transaction failing to observe the effects
of the earlier transaction (because of the confusion created by the
update to the non-visible row).  This bug dates all the way back to
commit dafaa3ef, which added SSI.

To fix, make sure that we check the xmin of concurrently inserted tuples
that happen to also have been updated concurrently.

Author: Peter Geoghegan
Reported-By: Kyle Kingsbury
Reviewed-By: Thomas Munro
Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443@jepsen.io
Backpatch: All supported versions
2020-06-11 10:09:47 -07:00
Tom Lane 474e7da648 Change locktype "speculative token" to "spectoken".
It's just weird that this name wasn't chosen to look like an
identifier.  The suspicion that it wasn't thought about too
hard is reinforced by the fact that it wasn't documented in
the pg_locks view (until I did so, a day or two back).

Update, and add a comment reminding future adjusters of this
array to fix the docs too.

Do some desultory wordsmithing on various entries in the wait
events tables.

Discussion: https://postgr.es/m/24595.1589326879@sss.pgh.pa.us
2020-05-15 21:47:34 -04:00
Tomas Vondra d2d8a229bc Implement Incremental Sort
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.

This has a number of benefits:

- Reduced memory consumption, because only a single group (determined by
  values in the sorted prefix) needs to be kept in memory. This may also
  eliminate the need to spill to disk.

- Lower startup cost, because Incremental Sort produce results after each
  prefix group, which is beneficial for plans where startup cost matters
  (like for example queries with LIMIT clause).

We consider both Sort and Incremental Sort, and decide based on costing.

The implemented algorithm operates in two different modes:

- Fetching a minimum number of tuples without check of equality on the
  prefix keys, and sorting on all columns when safe.

- Fetching all tuples for a single prefix group and then sorting by
  comparing only the remaining (non-prefix) keys.

We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.

This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.

There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.

Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-06 21:35:10 +02:00
Andres Freund 997563dfcb Try to harden insert-conflict-specconflict against autovacuum.
Looks like guaibasaurus had a autovacuum running during the
controller_print_speculative_locks step (just added in
43e0841970). Which does indeed seem quite possible.

Avoid the problem by only looking for the backends participating in
the test.
2020-02-11 21:18:14 -08:00
Andres Freund 43e0841970 Test additional speculative conflict scenarios.
Previously, the speculative insert tests did not cover the case when a
tuple t is inserted into a table with a unique index on a column but
before it can insert into the index, a concurrent transaction has
inserted a conflicting value into the index and the insertion of tuple t
must be aborted.

The basic permutation is one session successfully inserts into the table
and an associated unique index while a concurrent session successfully
inserts into the table but discovers a conflict before inserting into
the index and must abort the insertion.

Several variants on this include:
- swap which session is successful
- first session insert transaction does not commit, so second session
must wait on a transaction lock
- first session insert does not "complete", so second session must wait
on a speculative insertion lock

Also, refactor the existing TOAST table upsert test to be in the same
spec and reuse the steps.

Author: Melanie Plageman, Ashwin Agrawal, Andres Freund
Reviewed-by: Andres Freund, Taylor Vesely
Discussion: https://postgr.es/m/CAAKRu_ZRmxy_OEryfY3G8Zp01ouhgw59_-_Cm8n7LzRH5BAvng@mail.gmail.com
2020-02-11 16:32:11 -08:00
Tom Lane 01d9676a53 Fix dangling pointer in EvalPlanQual machinery.
EvalPlanQualStart() supposed that it could re-use the relsubs_rowmark
and relsubs_done arrays from a prior instantiation.  But since they are
allocated in the es_query_cxt of the recheckestate, that's just wrong;
EvalPlanQualEnd() will blow away that storage.  Therefore we were using
storage that could have been reallocated to something else, causing all
sorts of havoc.

I think this was modeled on the old code's handling of es_epqTupleSlot,
but since the code was anyway clearing the arrays at re-use, there's
clearly no expectation of importing any outside state.  So it's just
a dubious savings of a couple of pallocs, which is negligible compared
to setting up a new planstate tree.  Therefore, just allocate the
arrays always.  (I moved the allocations slightly for readability.)

In principle this bug could cause a problem whenever EPQ rechecks are
needed in more than one target table of a ModifyTable plan node.
In practice it seems not quite so easy to trigger as that; I couldn't
readily duplicate a crash with a partitioned target table, for instance.
That's probably down to incidental choices about when to free or
reallocate stuff.  The added isolation test case does seem to reliably
show an assertion failure, though.

Per report from Oleksii Kliukin.  Back-patch to v12 where the bug was
introduced (evidently by commit 3fb307bc4).

Discussion: https://postgr.es/m/EEF05F66-2871-4786-992B-5F45C92FEE2E@hintbits.com
2020-01-28 17:26:37 -05:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Tom Lane 99351a8b5a Improve isolationtester's timeout management.
isolationtester.c had a hard-wired limit of 3 minutes per test step.
It now emerges that this isn't quite enough for some of the slowest
buildfarm animals.  This isn't the first time we've had to raise
this limit (cf. 1db439ad4), so let's make it configurable.  This
patch raises the default to 5 minutes, and introduces an environment
variable PGISOLATIONTIMEOUT that can be set if more time is needed,
following the precedent of PGCTLTIMEOUT.

Also, modify isolationtester so that when the timeout is hit,
it explicitly reports having sent a cancel.  This makes the regression
failure log considerably more intelligible.  (In the worst case, a
timed-out test might actually be reported as "passing" without this
extra output, so arguably this is a bug fix in itself.)

In passing, update the README file, which had apparently not gotten
touched when we added "make check" support here.

Back-patch to 9.6; older versions don't have comparable timeout logic.

Discussion: https://postgr.es/m/22964.1575842935@sss.pgh.pa.us
2019-12-09 14:31:57 -05:00
Tom Lane 6b802cfc7f Avoid assertion failure with LISTEN in a serializable transaction.
If LISTEN is the only action in a serializable-mode transaction,
and the session was not previously listening, and the notify queue
is not empty, predicate.c reported an assertion failure.  That
happened because we'd acquire the transaction's initial snapshot
during PreCommit_Notify, which was called *after* predicate.c
expects any such snapshot to have been established.

To fix, just swap the order of the PreCommit_Notify and
PreCommit_CheckForSerializationFailure calls during CommitTransaction.
This will imply holding the notify-insertion lock slightly longer,
but the difference could only be meaningful in serializable mode,
which is an expensive option anyway.

It appears that this is just an assertion failure, with no
consequences in non-assert builds.  A snapshot used only to scan
the notify queue could not have been involved in any serialization
conflicts, so there would be nothing for
PreCommit_CheckForSerializationFailure to do except assign it a
prepareSeqNo and set the SXACT_FLAG_PREPARED flag.  And given no
conflicts, neither of those omissions affect the behavior of
ReleasePredicateLocks.  This admittedly once-over-lightly analysis
is backed up by the lack of field reports of trouble.

Per report from Mark Dilger.  The bug is old, so back-patch to all
supported branches; but the new test case only goes back to 9.6,
for lack of adequate isolationtester infrastructure before that.

Discussion: https://postgr.es/m/3ac7f397-4d5f-be8e-f354-440020675694@gmail.com
Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
2019-11-24 15:57:49 -05:00
Michael Paquier 3db0598d90 Improve stability of tests for VACUUM (SKIP_LOCKED)
Concurrent autovacuums running with the main regression test suite
could cause the tests with VACUUM (SKIP_LOCKED) to generate randomly
WARNING messages.  For these tests, set client_min_messages to ERROR to
get rid of those random failures, as disabling autovacuum for the
relations operated would not completely close the failure window.

For isolation tests, disable autovacuum for the relations vacuumed with
SKIP_LOCKED.  The tests are designed so as LOCK commands are taken
in a first session before running a concurrent VACUUM (SKIP_LOCKED) in a
second to generate WARNING messages, but a concurrent autovacuum could
cause the tests to be slower.

Reported-by: Tom Lane
Author: Michael Paquier
Reviewed-by: Andres Freund, Tom Lane
Discussion: https://postgr.es/m/25294.1573077278@sss.pgh.pa.us
Backpatch-through: 12
2019-11-16 15:23:12 +09:00
Andres Freund 01368e5d9d Split all OBJS style lines in makefiles into one-line-per-entry style.
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.

By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.

Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05 14:41:07 -08:00
Amit Kapila dddf4cdc33 Make the order of the header file includes consistent in non-backend modules.
Similar to commit 7e735035f2, this commit makes the order of header file
inclusion consistent for non-backend modules.

In passing, fix the case where we were using angle brackets (<>) for the
local module includes instead of quotes ("").

Author: Vignesh C
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-10-25 07:41:52 +05:30
Andres Freund 3a68105154 Disable one more set of tests from c884119950.
Discussion: https://postgr.es/m/20191004222437.45qmglpto43pd3jb@alap3.anarazel.de
Backpatch: 9.6-, just like c884119950 and 6e61d75f52
2019-10-05 08:05:11 -07:00
Andres Freund 6e61d75f52 Disable one set of tests from c884119950.
One of the upsert related tests is unstable (sometimes even hanging
until isolationtester's step timeout is reached). Based on preliminary
analysis that might be a problem outside of just that test, but not
really related to EPQ and triggers.  Disable for now, to get the
buildfarm greener again.

Discussion: https://postgr.es/m/20191004222437.45qmglpto43pd3jb@alap3.anarazel.de
Backpatch: 9.6-, just like c884119950.
2019-10-04 21:18:08 -07:00
Andres Freund c884119950 Add isolation tests for the combination of EPQ and triggers.
As evidenced by bug #16036 this area is woefully under-tested. Add
fairly extensive tests for the combination.

Backpatch back to 9.6 - before that isolationtester was not capable
enough. While we don't backpatch tests all the time, future fixes to
trigger.c would potentially look different enough in 12+ from the
earlier branches that introducing bugs during backpatching is more
likely than normal. Also, it's just a crucial and undertested area of
the code.

Author: Andres Freund
Discussion: https://postgr.es/m/16036-28184c90d952fb7f@postgresql.org
Backpatch: 9.6-, the earliest these tests work
2019-10-04 13:56:04 -07:00
Andres Freund 27cc7cd2bc Reorder EPQ work, to fix rowmark related bugs and improve efficiency.
In ad0bda5d24 I changed the EvalPlanQual machinery to store
substitution tuples in slot, instead of using plain HeapTuples. The
main motivation for that was that using HeapTuples will be inefficient
for future tableams.  But it turns out that that conversion was buggy
for non-locking rowmarks - the wrong tuple descriptor was used to
create the slot.

As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ
earlier, to allow to fetch the locked rows directly into the EPQ
slots, instead of having to copy tuples around. Unfortunately, as Tom
complained, that forces some expensive initialization to happen
earlier.

As a third issue, the test coverage for EPQ was clearly insufficient.

Fixing the first issue is unfortunately not trivial: Non-locked row
marks were fetched at the start of EPQ, and we don't have the type
information for the rowmarks available at that point. While we could
change that, it's not easy. It might be worthwhile to change that at
some point, but to fix this bug, it seems better to delay fetching
non-locking rowmarks when they're actually needed, rather than
eagerly. They're referenced at most once, and in cases where EPQ
fails, might never be referenced. Fetching them when needed also
increases locality a bit.

To be able to fetch rowmarks during execution, rather than
initialization, we need to be able to access the active EPQState, as
that contains necessary data. To do so move EPQ related data from
EState to EPQState, and, only for EStates creates as part of EPQ,
reference the associated EPQState from EState.

To fix the second issue, change EPQ initialization to allow use of
EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but
obviously still requiring EvalPlanQualInit() to have been done).

As these changes made struct EState harder to understand, e.g. by
adding multiple EStates, significantly reorder the members, and add a
lot more comments.

Also add a few more EPQ tests, including one that fails for the first
issue above. More is needed.

Reported-By: yi huang
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion:
    https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.com
    https://postgr.es/m/24530.1562686693@sss.pgh.pa.us
Backpatch: 12-, where the EPQ changes were introduced
2019-09-09 05:14:11 -07:00
Tom Lane b1907d6882 Set application_name per-test in isolation and ecpg tests.
Commit a4327296d taught pg_regress proper to do this, but
missed the opportunity to do likewise in the isolationtester
and ecpg variants of pg_regress.  Seems like this might be
helpful for tracking down issues exposed by those tests.
2019-08-27 19:49:09 -04:00
Michael Paquier 989d23b04b Detect unused steps in isolation specs and do some cleanup
This is useful for developers to find out if an isolation spec is
over-engineered or if it needs more work by warning at the end of a
test run if a step is not used, generating a failure with extra diffs.

While on it, clean up all the specs which include steps not used in any
permutations to simplify them.

Author: Michael Paquier
Reviewed-by: Asim Praveen, Melanie Plageman
Discussion: https://postgr.es/m/20190819080820.GG18166@paquier.xyz
2019-08-24 11:45:05 +09:00
Michael Paquier 9903338b5e Remove dry-run mode from isolationtester
The original purpose of the dry-run mode is to be able to print all the
possible permutations from a spec file, but it has become less useful
since isolation tests has improved regarding deadlock detection as one
step not wanted by the author could block indefinitely now (originally
the step blocked would have been detected rather quickly).  Per
discussion, let's remove it.

Author: Michael Paquier
Reviewed-by: Asim Praveen, Melanie Plageman
Discussion: https://postgr.es/m/20190819080820.GG18166@paquier.xyz
2019-08-24 11:35:43 +09:00
Michael Paquier c96581abe4 Fix inconsistencies and typos in the tree, take 11
This fixes various typos in docs and comments, and removes some orphaned
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
2019-08-19 16:21:39 +09:00
Tom Lane 9be4ce4fa3 Make deadlock-parallel isolation test more robust.
This test failed fairly reproducibly on some CLOBBER_CACHE_ALWAYS
buildfarm animals.  The cause seems to be that if a parallel worker
is slow enough to reach its lock wait, it may not be released by
the first deadlock check run, and then later deadlock checks might
decide to unblock the d2 session instead of the d1 session, leaving
us in an undetected deadlock state (since the isolationtester client
is waiting for d1 to complete first).

Fix by introducing an additional lock wait at the end of the d2a1
step, ensuring that the deadlock checker will recognize that d1
has to be unblocked before d2a1 completes.

Also reduce max_parallel_workers_per_gather to 3 in this test.  With the
default max_worker_processes value, we were only getting one parallel
worker for the d2a1 step, which is not the case I hoped to test.  We
should get 3 for d1a2 and 2 for d2a1, as the code stands; and maybe 3
for d2a1 if somebody figures out why the last parallel worker slot isn't
free already.

Discussion: https://postgr.es/m/22195.1566077308@sss.pgh.pa.us
2019-08-17 18:15:38 -04:00
Tom Lane bb5ae8f6c4 Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event.  This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events.  We can improve matters by
using a hash table to probe for duplicates.  So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.

A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)

We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions.  Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.

While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three.  This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.

Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.

Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 12:22:12 -04:00
Michael Paquier 66bde49d96 Fix inconsistencies and typos in the tree, take 10
This addresses some issues with unnecessary code comments, fixes various
typos in docs and comments, and removes some orphaned structures and
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
2019-08-13 13:53:41 +09:00
Heikki Linnakangas 1169fcf129 Fix predicate-locking of HOT updated rows.
In serializable mode, heap_hot_search_buffer() incorrectly acquired a
predicate lock on the root tuple, not the returned tuple that satisfied
the visibility checks. As explained in README-SSI, the predicate lock does
not need to be copied or extended to other tuple versions, but for that to
work, the correct, visible, tuple version must be locked in the first
place.

The original SSI commit had this bug in it, but it was fixed back in 2013,
in commit 81fbbfe335. But unfortunately, it was reintroduced a few months
later in commit b89e151054. Wising up from that, add a regression test
to cover this, so that it doesn't get reintroduced again. Also, move the
code that sets 't_self', so that it happens at the same time that the
other HeapTuple fields are set, to make it more clear that all the code in
the loop operate on the "current" tuple in the chain, not the root tuple.

Bug spotted by Andres Freund, analysis and original fix by Thomas Munro,
test case and some additional changes to the fix by Heikki Linnakangas.
Backpatch to all supported versions (9.4).

Discussion: https://www.postgresql.org/message-id/20190731210630.nqhszuktygwftjty%40alap3.anarazel.de
2019-08-07 12:40:49 +03:00
Michael Paquier 8548ddc61b Fix inconsistencies and typos in the tree, take 9
This addresses more issues with code comments, variable names and
unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
2019-08-05 12:14:58 +09:00
Tom Lane da9456d22a Add an isolation test to exercise parallel-worker deadlock resolution.
Commit a1c1af2a1 added logic in the deadlock checker to handle lock
grouping, but it was very poorly tested, as evidenced by the bug
fixed in 3420851a2.  Add a test case that exercises that a bit better
(and catches the bug --- if you revert 3420851a2, this will hang).

Since it's pretty hard to get parallel workers to take exclusive
regular locks that their parents don't already have, this test operates
by creating a deadlock among advisory locks taken in parallel workers.
To make that happen, we must override the parallel-safety labeling of
the advisory-lock functions, which we do by putting them in mislabeled,
non-inlinable wrapper functions.

We also have to remove the redundant PreventAdvisoryLocksInParallelMode
checks in lockfuncs.c.  That seems fine though; if some user accidentally
does what this test is intentionally doing, not much harm will ensue.
(If there are any remaining bugs that are reachable that way, they're
probably reachable in other ways too.)

Discussion: https://postgr.es/m/3243.1564437314@sss.pgh.pa.us
2019-08-01 11:50:00 -04:00
Tom Lane b10f40bf0e Improve test coverage for LISTEN/NOTIFY.
We had no actual end-to-end test of NOTIFY message delivery.  In the
core async.sql regression test, testing this is problematic because psql
traditionally prints the PID of the sending backend, making the output
unstable.  We also have an isolation test script, but it likewise
failed to prove that delivery worked, because isolationtester.c had
no provisions for detecting/reporting NOTIFY messages.

Hence, add such provisions to isolationtester.c, and extend
async-notify.spec to include direct tests of basic NOTIFY functionality.

I also added tests showing that NOTIFY de-duplicates messages normally,
but not across subtransaction boundaries.  (That's the historical
behavior since we introduced subtransactions, though perhaps we ought
to change it.)

Patch by me, with suggestions/review by Andres Freund.

Discussion: https://postgr.es/m/31304.1564246011@sss.pgh.pa.us
2019-07-28 12:02:27 -04:00
Tom Lane 30717637c1 Fix isolationtester race condition for notices sent before blocking.
If a test sends a notice just before blocking, it's possible on
slow machines for isolationtester to detect the blocked state before
it's consumed the notice.  (For this to happen, the notice would have
to arrive after isolationtester has waited for data for 10ms, so on
fast/lightly-loaded machines it's hard to reproduce the failure.)
But, if we have seen the backend as blocked, it's certainly already
sent any notices it's going to send.  Therefore, one more round of
PQconsumeInput and PQisBusy should be enough to collect and process
any such notices.

This appears to explain the instability noted in commit ebd499282, so undo
the hack therein to not print notices from insert-conflict-specconflict.

Patch by me, diagnosis by Andres Freund.

Discussion: https://postgr.es/m/14616.1564251339@sss.pgh.pa.us
2019-07-27 20:21:54 -04:00
Tom Lane ebd4992821 Don't drop NOTICE messages in isolation tests.
For its entire existence, isolationtester.c has forced client_min_messages
to WARNING, but that seems like a very poor choice of test design.  It
should be up to individual test scripts to manage whether they emit notices
and to ensure that the results are stable.  (There were no NOTICE messages
in the original set of isolation tests, so this was certainly dead code
when committed, but perhaps it was needed at some earlier point.)

It's possible that the original motivation was due to platform-dependent
variations in the timing of stdout vs. stderr output.  That should be
moot since commits 73bcb76b7/6eda3e9c2, but just in case, adjust
isotesterNoticeProcessor to print to stdout not stderr.  (stderr seems
like the wrong thing anyway: it should be for error printouts not expected
test output.)

Testing shows that the notices in insert-conflict-specconflict are indeed
a bit timing-unstable on very slow machines, so hide them; maybe we can
improve that later.  Also, make the notices in plpgsql-toast a bit less
verbose than the original code would've had them.

Discussion: https://postgr.es/m/14616.1564251339@sss.pgh.pa.us
2019-07-27 15:59:57 -04:00
Alvaro Herrera 8b21b416ed Avoid spurious deadlocks when upgrading a tuple lock
This puts back reverted commit de87a084c0, with some bug fixes.

When two (or more) transactions are waiting for transaction T1 to release a
tuple-level lock, and transaction T1 upgrades its lock to a higher level, a
spurious deadlock can be reported among the waiting transactions when T1
finishes.  The simplest example case seems to be:

T1: select id from job where name = 'a' for key share;
Y: select id from job where name = 'a' for update; -- starts waiting for T1
Z: select id from job where name = 'a' for key share;
T1: update job set name = 'b' where id = 1;
Z: update job set name = 'c' where id = 1; -- starts waiting for T1
T1: rollback;

At this point, transaction Y is rolled back on account of a deadlock: Y
holds the heavyweight tuple lock and is waiting for the Xmax to be released,
while Z holds part of the multixact and tries to acquire the heavyweight
lock (per protocol) and goes to sleep; once T1 releases its part of the
multixact, Z is awakened only to be put back to sleep on the heavyweight
lock that Y is holding while sleeping.  Kaboom.

This can be avoided by having Z skip the heavyweight lock acquisition.  As
far as I can see, the biggest downside is that if there are multiple Z
transactions, the order in which they resume after T1 finishes is not
guaranteed.

Backpatch to 9.6.  The patch applies cleanly on 9.5, but the new tests don't
work there (because isolationtester is not smart enough), so I'm not going
to risk it.

Author: Oleksii Kliukin
Discussion: https://postgr.es/m/B9C9D7CD-EB94-4635-91B6-E558ACEC0EC3@hintbits.com
Discussion: https://postgr.es/m/2815.1560521451@sss.pgh.pa.us
2019-06-18 18:23:16 -04:00
Michael Paquier 3412030205 Fix more typos and inconsistencies in the tree
Author: Alexander Lakhin
Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
2019-06-17 16:13:16 +09:00
Alvaro Herrera 9d20b0ec8f Revert "Avoid spurious deadlocks when upgrading a tuple lock"
This reverts commits 3da73d6839 and de87a084c0.

This code has some tricky corner cases that I'm not sure are correct and
not properly tested anyway, so I'm reverting the whole thing for next
week's releases (reintroducing the deadlock bug that we set to fix).
I'll try again afterwards.

Discussion: https://postgr.es/m/E1hbXKQ-0003g1-0C@gemulon.postgresql.org
2019-06-16 22:24:21 -04:00
Alvaro Herrera de87a084c0 Avoid spurious deadlocks when upgrading a tuple lock
When two (or more) transactions are waiting for transaction T1 to release a
tuple-level lock, and transaction T1 upgrades its lock to a higher level, a
spurious deadlock can be reported among the waiting transactions when T1
finishes.  The simplest example case seems to be:

T1: select id from job where name = 'a' for key share;
Y: select id from job where name = 'a' for update; -- starts waiting for X
Z: select id from job where name = 'a' for key share;
T1: update job set name = 'b' where id = 1;
Z: update job set name = 'c' where id = 1; -- starts waiting for X
T1: rollback;

At this point, transaction Y is rolled back on account of a deadlock: Y
holds the heavyweight tuple lock and is waiting for the Xmax to be released,
while Z holds part of the multixact and tries to acquire the heavyweight
lock (per protocol) and goes to sleep; once X releases its part of the
multixact, Z is awakened only to be put back to sleep on the heavyweight
lock that Y is holding while sleeping.  Kaboom.

This can be avoided by having Z skip the heavyweight lock acquisition.  As
far as I can see, the biggest downside is that if there are multiple Z
transactions, the order in which they resume after X finishes is not
guaranteed.

Backpatch to 9.6.  The patch applies cleanly on 9.5, but the new tests don't
work there (because isolationtester is not smart enough), so I'm not going
to risk it.

Author: Oleksii Kliukin
Discussion: https://postgr.es/m/B9C9D7CD-EB94-4635-91B6-E558ACEC0EC3@hintbits.com
2019-06-13 17:28:24 -04:00
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Andres Freund 08e2edc076 Add isolation test for INSERT ON CONFLICT speculative insertion failure.
This path previously was not reliably covered. There was some
heuristic coverage via insert-conflict-toast.spec, but that test is
not deterministic, and only tested for a somewhat specific bug.

Backpatch, as this is a complicated and otherwise untested code
path. Unfortunately 9.5 cannot handle two waiting sessions, and thus
cannot execute this test.

Triggered by a conversion with Melanie Plageman.

Author: Andres Freund
Discussion: https://postgr.es/m/CAAKRu_a7hbyrk=wveHYhr4LbcRnRCG=yPUVoQYB9YO1CdUBE9Q@mail.gmail.com
Backpatch: 9.5-
2019-05-14 11:51:29 -07:00
Tom Lane fc9a62af3f Move logging.h and logging.c from src/fe_utils/ to src/common/.
The original placement of this module in src/fe_utils/ is ill-considered,
because several src/common/ modules have dependencies on it, meaning that
libpgcommon and libpgfeutils now have mutual dependencies.  That makes it
pointless to have distinct libraries at all.  The intended design is that
libpgcommon is lower-level than libpgfeutils, so only dependencies from
the latter to the former are acceptable.

We already have the precedent that fe_memutils and a couple of other
modules in src/common/ are frontend-only, so it's not stretching anything
out of whack to treat logging.c as a frontend-only module in src/common/.
To the extent that such modules help provide a common frontend/backend
environment for the rest of common/ to use, it's a reasonable design.
(logging.c does not yet provide an ereport() emulation, but one can
dream.)

Hence, move these files over, and revert basically all of the build-system
changes made by commit cc8d41511.  There are no places that need to grow
new dependencies on libpgcommon, further reinforcing the idea that this
is the right solution.

Discussion: https://postgr.es/m/a912ffff-f6e4-778a-c86a-cf5c47a12933@2ndquadrant.com
2019-05-14 14:20:10 -04:00
Tom Lane a2418f9e23 Test some more cases with partitioned tables in EvalPlanQual.
We weren't testing anything involving EPQ on UPDATEs that move tuples
into different partitions.  Depending on the implementation,
it might be that these cases aren't actually very interesting ...
but given our thin coverage of EPQ in general, I think it's a good
idea to have a test case.

Amit Langote, minor tweak by me

Discussion: https://postgr.es/m/7889df35-ad1a-691a-00e3-4d4b18f364e3@lab.ntt.co.jp
2019-04-09 11:43:03 -04:00
Tom Lane a8cb8f1246 Fix EvalPlanQualStart to handle partitioned result rels correctly.
The es_root_result_relations array needs to be shallow-copied in the
same way as the main es_result_relations array, else EPQ rechecks on
partitioned result relations fail, as seen in bug #15677 from
Norbert Benkocs.

Amit Langote, isolation test case added by me

Discussion: https://postgr.es/m/15677-0bf089579b4cd02d@postgresql.org
Discussion: https://postgr.es/m/19321.1554567786@sss.pgh.pa.us
2019-04-08 12:20:22 -04:00
Andres Freund 41f5e04aec Fix a number of issues around modifying a previously updated row.
This commit fixes three, unfortunately related, issues:

1) Since 5db6df0c01, the introduction of DML via tableam, it was
   possible to trigger "ERROR: unexpected table_lock_tuple status: 1"
   when updating a row that was previously updated in the same
   transaction - but only when the previously updated row was before
   updated in a concurrent transaction (and READ COMMITTED was
   used). The reason for that was that that case simply wasn't
   expected. Fixing that lead to:

2) Even before the above commit, there were error checks (introduced
   in 6868ed7491) preventing a row being updated by different
   commands within the same statement (say in a function called by an
   UPDATE) - but that check wasn't performed when the row was first
   updated in a concurrent transaction - instead the second update was
   silently skipped in that case. After this change we throw the same
   error as we'd without the concurrent transaction.

3) The error messages (introduced in 6868ed7491) preventing such
   updates emitted the same error message for both DELETE and
   UPDATE ("tuple to be updated was already modified by an operation
   triggered by the current command"). While that could be changed
   separately, it made it hard to write tests that verify the correct
   correct behavior of the code.

This commit changes heap's implementation of table_lock_tuple() to
return TM_SelfModified instead of TM_Invisible (previously loosely
modeled after EvalPlanQualFetch), and teaches nodeModifyTable.c to
handle that in response to table_lock_tuple() and not just in response
to table_(delete|update).

Additionally it fixes the wrong error message (see 3 above). The
comment for table_lock_tuple() is also adjusted to state that
TM_Deleted won't return information in TM_FailureData - it'll not
always be available.

This also adds tests to ensure that DELETE/UPDATE correctly error out
when affecting a row that concurrently was modified by another
transaction.

Author: Andres Freund
Reported-By: Tom Lane, when investigating a bug bug fix to another bug
    by Amit Langote
Discussion: https://postgr.es/m/19321.1554567786@sss.pgh.pa.us
2019-04-07 22:14:47 -07:00
Alvaro Herrera f56f8f8da6 Support foreign keys that reference partitioned tables
Previously, while primary keys could be made on partitioned tables, it
was not possible to define foreign keys that reference those primary
keys.  Now it is possible to do that.

Author: Álvaro Herrera
Reviewed-by: Amit Langote, Jesper Pedersen
Discussion: https://postgr.es/m/20181102234158.735b3fevta63msbj@alvherre.pgsql
2019-04-03 14:40:21 -03:00
Peter Eisentraut cc8d415117 Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.

Features:

- Program name is automatically prefixed.

- Message string does not end with newline.  This removes a common
  source of inconsistencies and omissions.

- Additionally, a final newline is automatically stripped, simplifying
  use of PQerrorMessage() etc., another common source of mistakes.

- I converted error message strings to use %m where possible.

- As a result of the above several points, more translatable message
  strings can be shared between different components and between
  frontends and backend, without gratuitous punctuation or whitespace
  differences.

- There is support for setting a "log level".  This is not meant to be
  user-facing, but can be used internally to implement debug or
  verbose modes.

- Lazy argument evaluation, so no significant overhead if logging at
  some level is disabled.

- Some color in the messages, similar to gcc and clang.  Set
  PG_COLOR=auto to try it out.  Some colors are predefined, but can be
  customized by setting PG_COLORS.

- Common files (common/, fe_utils/, etc.) can handle logging much more
  simply by just using one API without worrying too much about the
  context of the calling program, requiring callbacks, or having to
  pass "progname" around everywhere.

- Some programs called setvbuf() to make sure that stderr is
  unbuffered, even on Windows.  But not all programs did that.  This
  is now done centrally.

Soft goals:

- Reduces vertical space use and visual complexity of error reporting
  in the source code.

- Encourages more deliberate classification of messages.  For example,
  in some cases it wasn't clear without analyzing the surrounding code
  whether a message was meant as an error or just an info.

- Concepts and terms are vaguely aligned with popular logging
  frameworks such as log4j and Python logging.

This is all just about printing stuff out.  Nothing affects program
flow (e.g., fatal exits).  The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.

I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded.  One significant change is that
pg_rewind used to write all error messages to stdout.  That is now
changed to stderr.

Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 20:01:35 +02:00
Peter Eisentraut 5dc92b844e REINDEX CONCURRENTLY
This adds the CONCURRENTLY option to the REINDEX command.  A REINDEX
CONCURRENTLY on a specific index creates a new index (like CREATE
INDEX CONCURRENTLY), then renames the old index away and the new index
in place and adjusts the dependencies, and then drops the old
index (like DROP INDEX CONCURRENTLY).  The REINDEX command also has
the capability to run its other variants (TABLE, DATABASE) with the
CONCURRENTLY option (but not SYSTEM).

The reindexdb command gets the --concurrently option.

Author: Michael Paquier, Andreas Karlsson, Peter Eisentraut
Reviewed-by: Andres Freund, Fujii Masao, Jim Nasby, Sergei Kornilov
Discussion: https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee
2019-03-29 08:26:33 +01:00
Andres Freund 5db6df0c01 tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.

Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one.  It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch().  Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.

The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.

As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.

The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).

Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.

Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.

Author: Andres Freund and Haribabu Kommi
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-23 19:55:57 -07:00
Andres Freund cdcffe2263 Expand EPQ tests for UPDATEs and DELETEs
Previously there was basically no coverage for UPDATEs encountering
deleted rows, and no coverage for DELETE having to perform EPQ. That's
problematic for an upcoming commit in which EPQ is tought to integrate
with tableams.  Also, there was no test for UPDATE to encounter a row
UPDATEd into another partition.

Author: Andres Freund
2019-03-22 19:55:23 -07:00
Thomas Munro bb16aba50c Enable parallel query with SERIALIZABLE isolation.
Previously, the SERIALIZABLE isolation level prevented parallel query
from being used.  Allow the two features to be used together by
sharing the leader's SERIALIZABLEXACT with parallel workers.

An extra per-SERIALIZABLEXACT LWLock is introduced to make it safe to
share, and new logic is introduced to coordinate the early release
of the SERIALIZABLEXACT required for the SXACT_FLAG_RO_SAFE
optimization, as follows:

The first backend to observe the SXACT_FLAG_RO_SAFE flag (set by
some other transaction) will 'partially release' the SERIALIZABLEXACT,
meaning that the conflicts and locks it holds are released, but the
SERIALIZABLEXACT itself will remain active because other backends
might still have a pointer to it.

Whenever any backend notices the SXACT_FLAG_RO_SAFE flag, it clears
its own MySerializableXact variable and frees local resources so that
it can skip SSI checks for the rest of the transaction.  In the
special case of the leader process, it transfers the SERIALIZABLEXACT
to a new variable SavedSerializableXact, so that it can be completely
released at the end of the transaction after all workers have exited.

Remove the serializable_okay flag added to CreateParallelContext() by
commit 9da0cc35, because it's now redundant.

Author: Thomas Munro
Reviewed-by: Haribabu Kommi, Robert Haas, Masahiko Sawada, Kevin Grittner
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
2019-03-15 17:47:04 +13:00
Tom Lane 4be058fe9e In the planner, replace an empty FROM clause with a dummy RTE.
The fact that "SELECT expression" has no base relations has long been a
thorn in the side of the planner.  It makes it hard to flatten a sub-query
that looks like that, or is a trivial VALUES() item, because the planner
generally uses relid sets to identify sub-relations, and such a sub-query
would have an empty relid set if we flattened it.  prepjointree.c contains
some baroque logic that works around this in certain special cases --- but
there is a much better answer.  We can replace an empty FROM clause with a
dummy RTE that acts like a table of one row and no columns, and then there
are no such corner cases to worry about.  Instead we need some logic to
get rid of useless dummy RTEs, but that's simpler and covers more cases
than what was there before.

For really trivial cases, where the query is just "SELECT expression" and
nothing else, there's a hazard that adding the extra RTE makes for a
noticeable slowdown; even though it's not much processing, there's not
that much for the planner to do overall.  However testing says that the
penalty is very small, close to the noise level.  In more complex queries,
this is able to find optimizations that we could not find before.

The new RTE type is called RTE_RESULT, since the "scan" plan type it
gives rise to is a Result node (the same plan we produced for a "SELECT
expression" query before).  To avoid confusion, rename the old ResultPath
path type to GroupResultPath, reflecting that it's only used in degenerate
grouping cases where we know the query produces just one grouped row.
(It wouldn't work to unify the two cases, because there are different
rules about where the associated quals live during query_planner.)

Note: although this touches readfuncs.c, I don't think a catversion
bump is required, because the added case can't occur in stored rules,
only plans.

Patch by me, reviewed by David Rowley and Mark Dilger

Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
2019-01-28 17:54:23 -05:00
Peter Eisentraut be2e329f2e isolationtester: Use atexit()
Replace exit_nicely() calls with standard exit() and register the
cleanup actions using atexit().

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/ec4135ba-84e9-28bf-b584-0e78d47448d5@2ndquadrant.com/
2019-01-07 16:25:16 +01:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Alexander Korotkov 0c6f4f9212 Reduce length of GIN predicate locking isolation test suite
Isolation test suite of GIN predicate locking was criticized for being too slow,
especially under Valgrind.  This commit is intended to accelerate it.  Tests are
simplified in the following ways.

  1) Amount of data is reduced.  We're now close to the minimal amount of data,
     which produces at least one posting tree and at least two pages of entry
     tree.
  2) Three isolation tests are merged into one.
  3) Only one tuple is queried from posting tree.  So, locking of index is the
     same, but tuple locks are not propagated to relation lock.  Also, it is
     faster.
  4) Test cases itself are simplified.  Now each test case run just one INSERT
     and one SELECT involving GIN, which either conflict or not.

Discussion: https://postgr.es/m/20181204000740.ok2q53nvkftwu43a%40alap3.anarazel.de
Reported-by: Andres Freund
Tested-by: Andrew Dunstan
Author: Alexander Korotkov
Backpatch-through: 11
2018-12-28 03:33:10 +03:00
Michael Paquier 1e504f01da Ignore inherited temp relations from other sessions when truncating
Inheritance trees can include temporary tables if the parent is
permanent, which makes possible the presence of multiple temporary
children from different sessions.  Trying to issue a TRUNCATE on the
parent in this scenario causes a failure, so similarly to any other
queries just ignore such cases, which makes TRUNCATE work
transparently.

This makes truncation behave similarly to any other DML query working on
the parent table with queries which need to be work on the children.  A
set of isolation tests is added to cover basic cases.

Reported-by: Zhou Digoal
Author: Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/15565-ce67a48d0244436a@postgresql.org
Backpatch-through: 9.4
2018-12-27 10:16:19 +09:00
Noah Misch 1db439ad49 Raise some timeouts to 180s, in test code.
Slow runs of buildfarm members chipmunk, hornet and mandrill saw the
shorter timeouts expire.  The 180s timeout in poll_query_until has been
trouble-free since 2a0f89cd71 introduced
it two years ago, so use 180s more widely.  Back-patch to 9.6, where the
first of these timeouts was introduced.

Reviewed by Michael Paquier.

Discussion: https://postgr.es/m/20181209001601.GC2973271@rfd.leadboat.com
2018-12-10 20:15:42 -08:00
Michael Paquier ee2b37ae04 Add some missing schema qualifications
This does not improve the security and reliability of the touched areas,
but it makes the style more consistent.

Author: Michael Paquier
Reviewed-by- Noah Misch
Discussion: https://postgr.es/m/20180309075538.GD9376@paquier.xyz
2018-12-03 14:21:52 +09:00
Alvaro Herrera a28e10e82e Indicate session name in isolationtester notices
When a session under isolationtester produces printable notices (NOTICE,
WARNING) we were just printing them unadorned, which can be confusing
when debugging.  Prefix them with the session name, which makes things
clearer.

Author: Álvaro Herrera
Reviewed-by: Hari Babu Kommi
Discussion: https://postgr.es/m/20181024213451.75nh3f3dctmcdbfq@alvherre.pgsql
2018-11-09 13:08:00 -03:00
Tom Lane 9958b2b2a8 Fix minor bug in isolationtester.
If the lock wait query failed, isolationtester would report the
PQerrorMessage from some other connection, meaning there would be
no message or an unrelated one.  This seems like a pretty unlikely
occurrence, but if it did happen, this bug could make it really
difficult/confusing to figure out what happened.  That seems to
justify patching all the way back.

In passing, clean up another place where the "wrong" conn was used
for an error report.  That one's not actually buggy because it's
a different alias for the same connection, but it's still confusing
to the reader.
2018-10-17 15:06:57 -04:00
Michael Paquier 803b1301e8 Add option SKIP_LOCKED to VACUUM and ANALYZE
When specified, this option allows VACUUM to skip the work on a relation
if there is a conflicting lock on it when trying to open it at the
beginning of its processing.

Similarly to autovacuum, this comes with a couple of limitations while
the relation is processed which can cause the process to still block:
- when opening the relation indexes.
- when acquiring row samples for table inheritance trees, partition trees
or certain types of foreign tables, and that a lock is taken on some
leaves of such trees.

Author: Nathan Bossart
Reviewed-by: Michael Paquier, Andres Freund, Masahiko Sawada
Discussion: https://postgr.es/m/9EF7EBE4-720D-4CF1-9D0E-4403D7E92990@amazon.com
Discussion: https://postgr.es/m/20171201160907.27110.74730@wrigleys.postgresql.org
2018-10-04 09:00:33 +09:00
Tom Lane 1f4a920b73 Fix failure with initplans used conditionally during EvalPlanQual rechecks.
The EvalPlanQual machinery assumes that any initplans (that is,
uncorrelated sub-selects) used during an EPQ recheck would have already
been evaluated during the main query; this is implicit in the fact that
execPlan pointers are not copied into the EPQ estate's es_param_exec_vals.
But it's possible for that assumption to fail, if the initplan is only
reached conditionally.  For example, a sub-select inside a CASE expression
could be reached during a recheck when it had not been previously, if the
CASE test depends on a column that was just updated.

This bug is old, appearing to date back to my rewrite of EvalPlanQual in
commit 9f2ee8f28, but was not detected until Kyle Samson reported a case.

To fix, force all not-yet-evaluated initplans used within the EPQ plan
subtree to be evaluated at the start of the recheck, before entering the
EPQ environment.  This could be inefficient, if such an initplan is
expensive and goes unused again during the recheck --- but that's piling
one layer of improbability atop another.  It doesn't seem worth adding
more complexity to prevent that, at least not in the back branches.

It was convenient to use the new-in-v11 ExecEvalParamExecParams function
to implement this, but I didn't like either its name or the specifics of
its API, so revise that.

Back-patch all the way.  Rather than rewrite the patch to avoid depending
on bms_next_member() in the oldest branches, I chose to back-patch that
function into 9.4 and 9.3.  (This isn't the first time back-patches have
needed that, and it exhausted my patience.)  I also chose to back-patch
some test cases added by commits 71404af2a and 342a1ffa2 into 9.4 and 9.3,
so that the 9.x versions of eval-plan-qual.spec are all the same.

Andrew Gierth diagnosed the problem and contributed the added test cases,
though the actual code changes are by me.

Discussion: https://postgr.es/m/A033A40A-B234-4324-BE37-272279F7B627@tripadvisor.com
2018-09-15 13:42:33 -04:00
Michael Paquier a556549d7e Improve VACUUM and ANALYZE by avoiding early lock queue
A caller of VACUUM can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a vacuum fill of a
critical catalog table to block even all incoming connection attempts.

Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after
building the list of relations to VACUUM, which can cause vacuum_rel()
or analyze_rel() to try to lock the relation but the operation would
just block.  When the client specifies a list of relations and the
relation needs to be skipped, ownership checks are done when building
the list of relations to work on, preventing a later lock attempt.

vacuum_rel() already had the sanity checks needed, except that those
were applied too late.  This commit refactors the code so as relation
skips are checked beforehand, making it safer to avoid too early locks,
for both manual VACUUM with and without a list of relations specified.

An isolation test is added emulating the fact that early locks do not
happen anymore, issuing a WARNING message earlier if the user calling
VACUUM is not a relation owner.

When a partitioned table is listed in a manual VACUUM or ANALYZE
command, its full list of partitions is fetched, all partitions get
added to the list to work on, and then each one of them is processed one
by one, with ownership checks happening at the later phase of
vacuum_rel() or analyze_rel().  Trying to do early ownership checks for
each partition is proving to be tedious as this would result in deadlock
risks with lock upgrades, and skipping all partitions if the listed
partitioned table is not owned would result in a behavior change
compared to how Postgres 10 has implemented vacuum for partitioned
tables.  The original problem reported related to early lock queue for
critical relations is fixed anyway, so priority is given to avoiding a
backward-incompatible behavior.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20180812222142.GA6097@paquier.xyz
2018-08-27 09:11:12 +09:00
Tom Lane cc4f6b7786 Clean up assorted misuses of snprintf()'s result value.
Fix a small number of places that were testing the result of snprintf()
but doing so incorrectly.  The right test for buffer overrun, per C99,
is "result >= bufsize" not "result > bufsize".  Some places were also
checking for failure with "result == -1", but the standard only says
that a negative value is delivered on failure.

(Note that this only makes these places correct if snprintf() delivers
C99-compliant results.  But at least now these places are consistent
with all the other places where we assume that.)

Also, make psql_start_test() and isolation_start_test() check for
buffer overrun while constructing their shell commands.  There seems
like a higher risk of overrun, with more severe consequences, here
than there is for the individual file paths that are made elsewhere
in the same functions, so this seemed like a worthwhile change.

Also fix guc.c's do_serialize() to initialize errno = 0 before
calling vsnprintf.  In principle, this should be unnecessary because
vsnprintf should have set errno if it returns a failure indication ...
but the other two places this coding pattern is cribbed from don't
assume that, so let's be consistent.

These errors are all very old, so back-patch as appropriate.  I think
that only the shell command overrun cases are even theoretically
reachable in practice, but there's not much point in erroneous error
checks.

Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
2018-08-15 16:29:31 -04:00
Michael Paquier f841ceb26d Improve TRUNCATE by avoiding early lock queue
A caller of TRUNCATE could previously queue for an access exclusive lock
on a relation it may not have permission to truncate, potentially
interfering with users authorized to work on it.  This can be very
intrusive depending on the lock attempted to be taken.  For example,
pg_authid could be blocked, preventing any authentication attempt to
happen on a PostgreSQL instance.

This commit fixes the case of TRUNCATE so as RangeVarGetRelidExtended is
used with a callback doing the necessary ACL checks at an earlier stage,
avoiding lock queuing issues, so as an immediate failure happens for
unprivileged users instead of waiting on a lock that would not be
taken.

This is rather similar to the type of work done in cbe24a6 for CLUSTER,
and the code of TRUNCATE is this time refactored so as there is no
user-facing changes.  As the commit for CLUSTER, no back-patch is done.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20180806165816.GA19883@paquier.xyz
2018-08-10 18:26:59 +02:00
Amit Kapila 40ca70ebcc Allow using the updated tuple while moving it to a different partition.
An update that causes the tuple to be moved to a different partition was
missing out on re-constructing the to-be-updated tuple, based on the latest
tuple in the update chain.  Instead, it's simply deleting the latest tuple
and inserting a new tuple in the new partition based on the old tuple.
Commit 2f17844104 didn't consider this case, so some of the updates were
getting lost.

In passing, change the argument order for output parameter in ExecDelete
and add some commentary about it.

Reported-by: Pavan Deolasee
Author: Amit Khandekar, with minor changes by me
Reviewed-by: Dilip Kumar, Amit Kapila and Alvaro Herrera
Backpatch-through: 11
Discussion: https://postgr.es/m/CAJ3gD9fRbEzDqdeDq1jxqZUb47kJn+tQ7=Bcgjc8quqKsDViKQ@mail.gmail.com
2018-07-12 12:51:39 +05:30
Peter Eisentraut 6b30d1386f Fix whitespace 2018-05-17 23:04:41 -04:00
Tom Lane 2efc924180 Detoast plpgsql variables if they might live across a transaction boundary.
Up to now, it's been safe for plpgsql to store TOAST pointers in its
variables because the ActiveSnapshot for whatever query called the plpgsql
function will surely protect such TOAST values from being vacuumed away,
even if the owning table rows are committed dead.  With the introduction of
procedures, that assumption is no longer good in "non atomic" executions
of plpgsql code.  We adopt the slightly brute-force solution of detoasting
all TOAST pointers at the time they are stored into variables, if we're in
a non-atomic context, just in case the owning row goes away.

Some care is needed to avoid long-term memory leaks, since plpgsql tends
to run with CurrentMemoryContext pointing to its call-lifespan context,
but we shouldn't assume that no memory is leaked by heap_tuple_fetch_attr.
In plpgsql proper, we can do the detoasting work in the "eval_mcontext".

Most of the code thrashing here is due to the need to add this capability
to expandedrecord.c as well as plpgsql proper.  In expandedrecord.c,
we can't assume that the caller's context is short-lived, so make use of
the short-term sub-context that was already invented for checking domain
constraints.  In view of this repurposing, it seems good to rename that
variable and associated code from "domain_check_cxt" to "short_term_cxt".

Peter Eisentraut and Tom Lane

Discussion: https://postgr.es/m/5AC06865.9050005@anastigmatix.net
2018-05-16 14:56:52 -04:00
Teodor Sigaev 0bef1c0678 Re-think predicate locking on GIN indexes.
The principle behind the locking was not very well thought-out, and not
documented. Add a section in the README to explain how it's supposed to
work, and change the code so that it actually works that way.

This fixes two bugs:

1. If fast update was turned on concurrently, subsequent inserts to the
   pending list would not conflict with predicate locks that were acquired
   earlier, on entry pages. The included 'predicate-gin-fastupdate' test
   demonstrates that. To fix, make all scans acquire a predicate lock on
   the metapage. That lock represents a scan of the pending list, whether
   or not there is a pending list at the moment. Forget about the
   optimization to skip locking/checking for locks, when fastupdate=off.
2. If a scan finds no match, it still needs to lock the entry page. The
   point of predicate locks is to lock the gabs between values, whether
   or not there is a match. The included 'predicate-gin-nomatch' test
   tests that case.

In addition to those two bug fixes, this removes some unnecessary locking,
following the principle laid out in the README. Because all items in
a posting tree have the same key value, a lock on the posting tree root is
enough to cover all the items. (With a very large posting tree, it would
possibly be better to lock the posting tree leaf pages instead, so that a
"skip scan" with a query like "A & B", you could avoid unnecessary conflict
if a new tuple is inserted with A but !B. But let's keep this simple.)

Also, some spelling  fixes.

Author: Heikki Linnakangas with some editorization by me
Review: Andrey Borodin, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/0b3ad2c2-2692-62a9-3a04-5724f2af9114@iki.fi
2018-05-04 11:27:50 +03:00
Tom Lane b39fd897e0 Improve regression test coverage of expand_tuple().
I was dissatisfied with the code coverage report for expand_tuple() in the
wake of commit 7c44c46de: while better than no coverage at all, it was
still not exercising the core function of inserting out-of-line default
values, nor was the HeapTuple-output path covered.  So far as I can find,
the only code path that reaches the latter at present is EvalPlanQual
fetches for non-locked tables.  Hence, extend eval-plan-qual.spec to
test cases where out-of-line defaults must be inserted into a tuple
fetched from a non-locked table.

Discussion: https://postgr.es/m/87woxi24uw.fsf@ansel.ydns.eu
2018-04-14 19:02:30 -04:00
Simon Riggs 08ea7a2291 Revert MERGE patch
This reverts commits d204ef6377,
83454e3c2b and a few more commits thereafter
(complete list at the end) related to MERGE feature.

While the feature was fully functional, with sufficient test coverage and
necessary documentation, it was felt that some parts of the executor and
parse-analyzer can use a different design and it wasn't possible to do that in
the available time. So it was decided to revert the patch for PG11 and retry
again in the future.

Thanks again to all reviewers and bug reporters.

List of commits reverted, in reverse chronological order:

 f1464c5380 Improve parse representation for MERGE
 ddb4158579 MERGE syntax diagram correction
 530e69e59b Allow cpluspluscheck to pass by renaming variable
 01b88b4df5 MERGE minor errata
 3af7b2b0d4 MERGE fix variable warning in non-assert builds
 a5d86181ec MERGE INSERT allows only one VALUES clause
 4b2d44031f MERGE post-commit review
 4923550c20 Tab completion for MERGE
 aa3faa3c7a WITH support in MERGE
 83454e3c2b New files for MERGE
 d204ef6377 MERGE SQL Command following SQL:2016

Author: Pavan Deolasee
Reviewed-by: Michael Paquier
2018-04-12 11:22:56 +01:00
Magnus Hagander a228cc13ae Revert "Allow on-line enabling and disabling of data checksums"
This reverts the backend sides of commit 1fde38beaa.
I have, at least for now, left the pg_verify_checksums tool in place, as
this tool can be very valuable without the rest of the patch as well,
and since it's a read-only tool that only runs when the cluster is down
it should be a lot safer.
2018-04-09 19:03:42 +02:00
Andres Freund f16241bef7 Raise error when affecting tuple moved into different partition.
When an update moves a row between partitions (supported since
2f17844104), our normal logic for following update chains in READ
COMMITTED mode doesn't work anymore. Cross partition updates are
modeled as an delete from the old and insert into the new
partition. No ctid chain exists across partitions, and there's no
convenient space to introduce that link.

Not throwing an error in a partitioned context when one would have
been thrown without partitioning is obviously problematic. This commit
introduces infrastructure to detect when a tuple has been moved, not
just plainly deleted. That allows to throw an error when encountering
a deletion that's actually a move, while attempting to following a
ctid chain.

The row deleted as part of a cross partition update is marked by
pointing it's t_ctid to an invalid block, instead of self as a normal
update would.  That was deemed to be the least invasive and most
future proof way to represent the knowledge, given how few infomask
bits are there to be recycled (there's also some locking issues with
using infomask bits).

External code following ctid chains should be updated to check for
moved tuples. The most likely consequence of not doing so is a missed
error.

Author: Amul Sul, editorialized by me
Reviewed-By: Amit Kapila, Pavan Deolasee, Andres Freund, Robert Haas
Discussion: http://postgr.es/m/CAAJ_b95PkwojoYfz0bzXU8OokcTVGzN6vYGCNVUukeUDrnF3dw@mail.gmail.com
2018-04-07 13:24:27 -07:00
Teodor Sigaev 8224de4f42 Indexes with INCLUDE columns and their support in B-tree
This patch introduces INCLUDE clause to index definition.  This clause
specifies a list of columns which will be included as a non-key part in
the index.  The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans.  Also, such columns don't need to have
appropriate operator classes.  Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.

Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine.  For now, only B-tree indexes support INCLUDE clause.

In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys).  Therefore, B-tree indexes
now might have variable number of attributes.  This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid.  Free 13th bit of t_info is used for indicating
that.  This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn't need special
handling of B-tree indexes for that.

Bump catalog version

Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
			 David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru
2018-04-07 23:00:39 +03:00
Andres Freund bf75fe47e4 Deactive flapping checksum isolation tests.
They've been broken for days, and prevent other tests from being
run. The plan is to revert their addition later.

Discussion: https://postgr.es/m/20180407162252.wfo5aorjrjw2n5ws@alap3.anarazel.de
2018-04-07 09:23:12 -07:00
Teodor Sigaev b508a56f2f Predicate locking in hash indexes.
Hash index searches acquire predicate locks on the primary
page of a bucket. It acquires a lock on both the old and new buckets
for scans that happen concurrently with page splits. During a bucket
split, a predicate lock is copied from the primary page of an old
bucket to the primary page of a new bucket.

Author: Shubham Barai, Amit Kapila
Reviewed by: Amit Kapila, Alexander Korotkov, Thomas Munro
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPvNsM2GTiXdRgaaZ1Pjd1bs+sxfFsf7Ytr+iq+5JJoYXA@mail.gmail.com
2018-04-07 16:59:14 +03:00
Magnus Hagander 1fde38beaa Allow on-line enabling and disabling of data checksums
This makes it possible to turn checksums on in a live cluster, without
the previous need for dump/reload or logical replication (and to turn it
off).

Enabling checkusm starts a background process in the form of a
launcher/worker combination that goes through the entire database and
recalculates checksums on each and every page. Only when all pages have
been checksummed are they fully enabled in the cluster. Any failure of
the process will revert to checksums off and the process has to be
started.

This adds a new WAL record that indicates the state of checksums, so
the process works across replicated clusters.

Authors: Magnus Hagander and Daniel Gustafsson
Review: Tomas Vondra, Michael Banck, Heikki Linnakangas, Andrey Borodin
2018-04-05 22:04:48 +02:00
Simon Riggs 83454e3c2b New files for MERGE 2018-04-03 10:22:21 +01:00
Simon Riggs d204ef6377 MERGE SQL Command following SQL:2016
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.

MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
  UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
  DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
  INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
  DO NOTHING;

MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.

MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.

MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.

Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.

This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.

Various issues reported via sqlsmith by Andreas Seltenreich

Authors: Pavan Deolasee, Simon Riggs
Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs

Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
2018-04-03 09:28:16 +01:00
Simon Riggs aa5877bb26 Revert "MERGE SQL Command following SQL:2016"
This reverts commit e6597dc353.
2018-04-02 21:36:38 +01:00
Simon Riggs 7cf8a5c302 Revert "Modified files for MERGE"
This reverts commit 354f13855e.
2018-04-02 21:34:15 +01:00
Simon Riggs 354f13855e Modified files for MERGE 2018-04-02 21:12:47 +01:00
Simon Riggs e6597dc353 MERGE SQL Command following SQL:2016
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.

MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
  UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
  DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
  INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
  DO NOTHING;

MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.

MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.

MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.

Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.

This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.

Various issues reported via sqlsmith by Andreas Seltenreich

Authors: Pavan Deolasee, Simon Riggs
Reviewers: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs

Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
2018-04-02 21:04:35 +01:00
Teodor Sigaev 43d1ed60fd Predicate locking in GIN index
Predicate locks are used on per page basis only if fastupdate = off, in
opposite case predicate lock on pending list will effectively lock whole index,
to reduce locking overhead, just lock a relation. Entry and posting trees are
essentially B-tree, so locks are acquired on leaf pages only.

Author: Shubham Barai with some editorization by me and Dmitry Ivanov
Review by: Alexander Korotkov, Dmitry Ivanov, Fedor Sigaev
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPt5sWW+EwTaKUGFL5_XFcZ0MuGBcyJ70oqbWqr42YKR8Q@mail.gmail.com
2018-03-30 14:23:17 +03:00
Teodor Sigaev 3ad55863e9 Add predicate locking for GiST
Add page-level predicate locking, due to gist's code organization, patch seems
close to trivial: add check before page changing, add predicate lock before page
scanning.  Although choosing right place to check is not simple: it should not
be called during index build, it should support insertion of new downlink and so
on.

Author: Shubham Barai with editorization by me and Alexander Korotkov
Reviewed by: Alexander Korotkov, Andrey Borodin, me
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPtdcANpw5ePU3LvnTP8HCENFw6wygupQAyNBgD-sG3h0g@mail.gmail.com
2018-03-27 15:43:19 +03:00
Tom Lane 4b538727e2 Fix make rules that generate multiple output files.
For years, our makefiles have correctly observed that "there is no correct
way to write a rule that generates two files".  However, what we did is to
provide empty rules that "generate" the secondary output files from the
primary one, and that's not right either.  Depending on the details of
the creating process, the primary file might end up timestamped later than
one or more secondary files, causing subsequent make runs to consider the
secondary file(s) out of date.  That's harmless in a plain build, since
make will just re-execute the empty rule and nothing happens.  But it's
fatal in a VPATH build, since make will expect the secondary file to be
rebuilt in the build directory.  This would manifest as "file not found"
failures during VPATH builds from tarballs, if we were ever unlucky enough
to ship a tarball with apparently out-of-date secondary files.  (It's not
clear whether that has ever actually happened, but it definitely could.)

To ensure that secondary output files have timestamps >= their primary's,
change our makefile convention to be that we provide a "touch $@" action
not an empty rule.  Also, make sure that this rule actually gets invoked
during a distprep run, else the hazard remains.

It's been like this a long time, so back-patch to all supported branches.

In HEAD, I skipped the changes in src/backend/catalog/Makefile, because
those rules are due to get replaced soon in the bootstrap data format
patch, and there seems no need to create a merge issue for that patch.
If for some reason we fail to land that patch in v11, we'll need to
back-fill the changes in that one makefile from v10.

Discussion: https://postgr.es/m/18556.1521668179@sss.pgh.pa.us
2018-03-23 13:46:00 -04:00