If a tuple is locked, and this lock is later upgraded either to an
update or to a stronger lock, and in the meantime some other process
tries to lock, update or delete the same tuple, it (the tuple) could end
up being updated twice, or having conflicting locks held.
The reason for this is that the second updater checks for a change in
Xmax value, or in the HEAP_XMAX_IS_MULTI infomask bit, after noticing
the first lock; and if there's a change, it restarts and re-evaluates
its ability to update the tuple. But it neglected to check for changes
in lock strength or in lock-vs-update status when those two properties
stayed the same. This would lead it to take the wrong decision and
continue with its own update, when in reality it shouldn't do so but
instead restart from the top.
This could lead to either an assertion failure much later (when a
multixact containing multiple updates is detected), or duplicate copies
of tuples.
To fix, make sure to compare the other relevant infomask bits alongside
the Xmax value and HEAP_XMAX_IS_MULTI bit, and restart from the top if
necessary.
Also, in the belt-and-suspenders spirit, add a check to
MultiXactCreateFromMembers that a multixact being created does not have
two or more members that are claimed to be updates. This should protect
against other bugs that might cause similar bogus situations.
Backpatch to 9.3, where the possibility of multixacts containing updates
was introduced. (In prior versions it was possible to have the tuple
lock upgraded from shared to exclusive, and an update would not restart
from the top; yet we're protected against a bug there because there's
always a sleep to wait for the locking transaction to complete before
continuing to do anything. Really, the fact that tuple locks always
conflicted with concurrent updates is what protected against bugs here.)
Per report from Andrew Dunstan and Josh Berkus in thread at
http://www.postgresql.org/message-id/534C8B33.9050807@pgexperts.com
Bug analysis by Andres Freund.
Once we've completed a PREPARE, our session is not running a transaction,
so its entry in pg_stat_activity should show xact_start as null, rather
than leaving the value as the start time of the now-prepared transaction.
I think possibly this oversight was triggered by faulty extrapolation
from the adjacent comment that says PrepareTransaction should not call
AtEOXact_PgStat, so tweak the wording of that comment.
Noted by Andres Freund while considering bug #10123 from Maxim Boguk,
although this error doesn't seem to explain that report.
Back-patch to all active branches.
pg_sequence_parameters() and pg_identify_object() have had incorrect
proallargtypes entries since 9.1 and 9.3 respectively. This was mostly
masked by the correct information in proargtypes, but a few operations
such as pg_get_function_arguments() (and thus psql's \df display) would
show the wrong data types for these functions' input parameters.
In HEAD, fix the wrong info, bump catversion, and add an opr_sanity
regression test to catch future mistakes of this sort.
In the back branches, just fix the wrong info so that installations
initdb'd with future minor releases will have the right data. We
can't force an initdb, and it doesn't seem like a good idea to add
a regression test that will fail on existing installations.
Andres Freund
Before 9.4, such an aggregate couldn't be declared, because its final
function would have to have polymorphic result type but no polymorphic
argument, which CREATE FUNCTION would quite properly reject. The
ordered-set-aggregate patch found a workaround: allow the final function
to be declared as accepting additional dummy arguments that have types
matching the aggregate's regular input arguments. However, we failed
to notice that this problem applies just as much to regular aggregates,
despite the fact that we had a built-in regular aggregate array_agg()
that was known to be undeclarable in SQL because its final function
had an illegal signature. So what we should have done, and what this
patch does, is to decouple the extra-dummy-arguments behavior from
ordered-set aggregates and make it generally available for all aggregate
declarations. We have to put this into 9.4 rather than waiting till
later because it slightly alters the rules for declaring ordered-set
aggregates.
The patch turned out a bit bigger than I'd hoped because it proved
necessary to record the extra-arguments option in a new pg_aggregate
column. I'd thought we could just look at the final function's pronargs
at runtime, but that didn't work well for variadic final functions.
It's probably just as well though, because it simplifies life for pg_dump
to record the option explicitly.
While at it, fix array_agg() to have a valid final-function signature,
and add an opr_sanity test to notice future deviations from polymorphic
consistency. I also marked the percentile_cont() aggregates as not
needing extra arguments, since they don't.
When marking a branch as half-dead, a pointer to the top of the branch is
stored in the leaf block's hi-key. During normal operation, the high key
was left in place, and the block number was just stored in the ctid field
of the high key tuple, but in WAL replay, the high key was recreated as a
truncated tuple with zero columns. For the sake of easier debugging, also
truncate the tuple in normal operation, so that the page is identical
after WAL replay. Also, rename the 'downlink' field in the WAL record to
'topparent', as that seems like a more descriptive name. And make sure
it's set to invalid when unlinking the leaf page.
Some ancient comments claimed that fn_nargs could be -1 to indicate a
variable number of input arguments; but this was never implemented, and
is at variance with what we ultimately did with "variadic" functions.
Update the comments.
Forgot to update LSN of left sibling's page, when creating a new root.
I fixed this for regular insertions and page splits earlier, but missed
new root creation.
The README incorrectly claimed that GIN posting tree pages contain an array
of uncompressed items in addition to compressed posting lists. Earlier
versions of the GIN posting list compression patch worked that way, but not
the one that was committed.
Don't use simple_heap_insert to insert the tuple to a sequence relation.
simple_heap_insert creates a heap insertion WAL record, and replaying that
will create a regular heap page without the special area containing the
sequence magic constant, which is wrong for a sequence. That was not a bug
because we always created a sequence WAL record after that, and replaying
that overwrote the bogus heap page, and the transient state could never be
seen by another backend because it was only done when creating a new
sequence relation. But it's simpler and cleaner to avoid that in the first
place.
pgss_post_parse_analyze() neglected to pass the call on to any earlier
occupant of the post_parse_analyze_hook. There are no other users of that
hook in contrib/, and most likely none in the wild either, so this is
probably just a latent bug. But it's a bug nonetheless, so back-patch
to 9.2 where this code was introduced.
Mention impossibility of moving tablespaces, backing them up
independently, or the inadvisability of placing them on temporary
file systems.
Patch by Craig Ringer, adjustments by Ian Lawrence Warwick and me
Previously, these functions treated "" optin values as defaults in some
ways, but not in others, like when comparing to .pgpass. Also, add
documentation to clarify that now "" and NULL use defaults, like
PQsetdbLogin() has always done.
BACKWARD INCOMPATIBILITY
Patch by Adrian Vondendriesch, docs by me
Report by Jeff Janes
Because of gcc -Wmissing-prototypes, all functions in dynamically
loadable modules must have a separate prototype declaration. This is
meant to detect global functions that are not declared in header files,
but in cases where the function is called via dfmgr, this is redundant.
Besides filling up space with boilerplate, this is a frequent source of
compiler warnings in extension modules.
We can fix that by creating the function prototype as part of the
PG_FUNCTION_INFO_V1 macro, which such modules have to use anyway. That
makes the code of modules cleaner, because there is one less place where
the entry points have to be listed, and creates an additional check that
functions have the right prototype.
Remove now redundant prototypes from contrib and other modules.
Introduced in 585bca39: msgid is not used in the Windows code path.
Also adjust comments a tad (mostly to keep pgindent from messing it up).
David Rowley
Specifically, on-stack memset() might be removed, so:
* Replace memset() with px_memset()
* Add px_memset to copy_crlf()
* Add px_memset to pgp-s2k.c
Patch by Marko Kreen
Report by PVS-Studio
Backpatch through 8.4.
Non-existent tablespace directory references can occur if user
tablespaces are created inside data directories and the data directory
is renamed in preparation for running pg_upgrade, and the symbolic links
are not updated.
Backpatch to 9.3.
Document problems when disconnection causes loss of hot_standby_feedback
and suggest adjusting max_standby_archive_delay and
max_standby_streaming_delay.
Initial patch by Marko Tiikkaja, adjustments by me
If we set the all-visible flag after writing WAL record, and XLogInsert
takes a full-page image of the page, the image would not include the flag.
We will then proceed to set the VM bit, which would then be set without the
corresponding all-visible flag on the heap page.
Found by comparing page images on master and standby, after writing/replaying
each WAL record. (There is still a discrepancy: the all-visible flag won't
be set after replaying the HEAP_CLEAN record, even though it is set in the
master. However, it will be set when replaying the HEAP2_VISIBLE record and
setting the VM bit, so the all-visible flag and VM bit are always consistent
on the standby, even though they are momentarily out-of-sync with master)
Backpatch to 9.3 where this code was introduced.
Now that EXPLAIN also outputs a "planning time" measurement, the use of
"total" here seems rather confusing: it sounds like it might include the
planning time which of course it doesn't. Majority opinion was that
"execution time" is a better label, so we'll call it that.
This should be noted as a backwards incompatibility for tools that examine
EXPLAIN ANALYZE output.
In passing, I failed to resist the temptation to do a little editing on the
materialized-view example affected by this change.
Previously, 'int' was used for socket values in libpq, but socket values
are unsigned on Windows. This is a style correction.
Initial patch and previous PGINVALID_SOCKET initial patch by Joel
Jacobson, modified by me
Report from PVS-Studio