Commit Graph

244 Commits

Author SHA1 Message Date
Alvaro Herrera de38489b92 Fix typo in comment
Commit fd31cd2651 renamed the variable to skipping_blocks, but forgot
to update this comment.

Noticed while inspecting code.
2017-07-21 20:08:53 -04:00
Alvaro Herrera 46e9151942 Fix typo
Noticed while reviewing code.
2017-07-07 17:10:36 -04:00
Robert Haas 2a6db5eba6 Update out-of-date comment in vacuumlazy.c
Commit 15c121b3ed seems to have
overlooked the need to trim this part of the comment.

Pavan Deolasee

Discussion: http://postgr.es/m/CABOikdPq_9+cWRNZ0RLKTwuZyj=uL85X=Usifa-CbPee1ZCM5A@mail.gmail.com
2017-06-22 13:38:53 -04:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Tom Lane e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Tom Lane 651902deb1 Re-run pgindent.
This is just to have a clean base state for testing of Piotr Stefaniak's
latest version of FreeBSD indent.  I fixed up a couple of places where
pgindent would have changed format not-nicely.  perltidy not included.

Discussion: https://postgr.es/m/VI1PR03MB119959F4B65F000CA7CD9F6BF2CC0@VI1PR03MB1199.eurprd03.prod.outlook.com
2017-06-13 13:05:59 -04:00
Alvaro Herrera 55a70a023c Assorted translatable string fixes
Mark our rusage reportage string translatable; remove quotes from type
names; unify formatting of very similar messages.
2017-06-04 11:41:16 -04:00
Bruce Momjian a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Fujii Masao 70adf2fbe1 Make VACUUM VERBOSE report the number of skipped frozen pages.
Previously manual VACUUM did not report the number of skipped frozen
pages even when VERBOSE option is specified. But this information is
helpful to monitor the VACUUM activity, and also autovacuum reports that
number in the log file when the condition of log_autovacuum_min_duration
is met.

This commit changes VACUUM VERBOSE so that it reports the number
of frozen pages that it skips.

Author: Masahiko Sawada
Reviewed-by: Yugo Nagata and Jim Nasby

Discussion: http://postgr.es/m/CAD21AoDZQKCxo0L39Mrq08cONNkXQKXuh=2DP1Q8ebmt35SoaA@mail.gmail.com
2017-03-25 02:39:44 +09:00
Andrew Gierth 1914c5ea7d Avoid having vacuum set reltuples to 0 on non-empty relations in the
presence of page pins, which leads to serious estimation errors in the
planner.  This particularly affects small heavily-accessed tables,
especially where locking (e.g. from FK constraints) forces frequent
vacuums for mxid cleanup.

Fix by keeping separate track of pages whose live tuples were actually
counted vs. pages that were only scanned for freezing purposes.  Thus,
reltuples can only be set to 0 if all pages of the relation were
actually counted.

Backpatch to all supported versions.

Per bug #14057 from Nicolas Baccelli, analyzed by me.

Discussion: https://postgr.es/m/20160331103739.8956.94469@wrigleys.postgresql.org
2017-03-16 22:28:03 +00:00
Simon Riggs 9eb344faf5 Allow vacuums to report oldestxmin
Allow VACUUM and Autovacuum to report the oldestxmin value they
used while cleaning tables, helping to make better sense out of
the other statistics we report in various cases.
2017-03-03 19:18:25 +05:30
Alvaro Herrera 7e26e02eec Prefetch blocks during lazy vacuum's truncation scan
Vacuum truncation scan can be sped up on rotating media by prefetching
blocks in forward direction.  That makes the blocks already present in
memory by the time they are needed, while also letting OS read-ahead
kick in.

The truncate scan has been measured to be five times faster than without
this patch (that was on a slow disk, but it shouldn't hurt on fast
disks.)

Author: Álvaro Herrera, loosely based on a submission by Claudio Freire
Discussion: https://postgr.es/m/CAGTBQpa6NFGO_6g_y_7zQx8L9GcHDSQKYdo1tGuh791z6PYgEg@mail.gmail.com
2017-01-23 12:55:18 -03:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Simon Riggs dcb12ce8d8 Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL
lazy_truncate_heap() was waiting for
VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL, but in microseconds
not milliseconds as originally intended.

Found by code inspection.

Simon Riggs
2016-09-06 15:35:47 +01:00
Kevin Grittner 1c15aac53f Add comment & docs about no vacuum truncation with sto.
Omission noted by Andres Freund.
2016-07-19 16:25:53 -05:00
Andres Freund eca0f1db14 Clear all-frozen visibilitymap status when locking tuples.
Since a892234 & fd31cd265 the visibilitymap's freeze bit is used to
avoid vacuuming the whole relation in anti-wraparound vacuums. Doing so
correctly relies on not adding xids to the heap without also unsetting
the visibilitymap flag.  Tuple locking related code has not done so.

To allow selectively resetting all-frozen - to avoid pessimizing
heap_lock_tuple - allow to selectively reset the all-frozen with
visibilitymap_clear(). To avoid having to use
visibilitymap_get_status (e.g. via VM_ALL_FROZEN) inside a critical
section, have visibilitymap_clear() return whether any bits have been
reset.

There's a remaining issue (denoted by XXX): After the PageIsAllVisible()
check in heap_lock_tuple() and heap_lock_updated_tuple_rec() the page
status could theoretically change. Practically that currently seems
impossible, because updaters will hold a page level pin already.  Due to
the next beta coming up, it seems better to get the required WAL magic
bump done before resolving this issue.

The added flags field fields to xl_heap_lock and xl_heap_lock_updated
require bumping the WAL magic. Since there's already been a catversion
bump since the last beta, that's not an issue.

Reviewed-By: Robert Haas, Amit Kapila and Andres Freund
Author: Masahiko Sawada, heavily revised by Andres Freund
Discussion: CAEepm=3fWAbWryVW9swHyLTY4sXVf0xbLvXqOwUoDiNCx9mBjQ@mail.gmail.com
Backpatch: -
2016-07-18 02:01:13 -07:00
Robert Haas ede62e56fb Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.
If you really want to vacuum every single page in the relation,
regardless of apparent visibility status or anything else, you can use
this option.  In previous releases, this behavior could be achieved
using VACUUM (FREEZE), but because we can now recognize all-frozen
pages as not needing to be frozen again, that no longer works.  There
should be no need for routine use of this option, but maybe bugs or
disaster recovery will necessitate its use.

Patch by me, reviewed by Andres Freund.
2016-06-17 15:48:57 -04:00
Robert Haas 38e9f90a22 Fix lazy_scan_heap so that it won't mark pages all-frozen too soon.
Commit a892234f83 added a new bit per
page to the visibility map fork indicating whether the page is
all-frozen, but incorrectly assumed that if lazy_scan_heap chose to
freeze a tuple then that tuple would not need to later be frozen
again. This turns out to be false, because xmin and xmax (and
conceivably xvac, if dealing with tuples from very old releases) could
be frozen at separate times.

Thanks to Andres Freund for help in uncovering and tracking down this
issue.
2016-06-15 14:30:06 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Robert Haas 6436a853f1 Fix comment to be more accurate.
Now that we skip vacuuming all-frozen pages, this comment needs
updating.

Masahiko Sawada
2016-06-03 11:56:57 -04:00
Robert Haas fdfaccfa79 Cosmetic improvements to freeze map code.
Per post-commit review comments from Andres Freund, improve variable
names, comments, and in one place, slightly improve the code structure.

Masahiko Sawada
2016-06-03 08:43:41 -04:00
Kevin Grittner a343e223a5 Revert no-op changes to BufferGetPage()
The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature.  Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming).  The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.

This change should have little or no effect on generated executable
code.

Motivated by the back-patching pain of Tom Lane and Robert Haas
2016-04-20 08:31:19 -05:00
Kevin Grittner 848ef42bb8 Add the "snapshot too old" feature
This feature is controlled by a new old_snapshot_threshold GUC.  A
value of -1 disables the feature, and that is the default.  The
value of 0 is just intended for testing.  Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect.  The xmin associated with a transaction ID does
still protect dead tuples.  A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.

This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
2016-04-08 14:36:30 -05:00
Kevin Grittner 8b65cf4c5e Modify BufferGetPage() to prepare for "snapshot too old" feature
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in.  It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point.  This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third.  The follow-on patch will change the places where the test
needs to be made.
2016-04-08 14:30:10 -05:00
Robert Haas bc55cc0b6a Fix problems in commit c16dc1aca5.
Vinayak Pokale provided a patch for a copy-and-paste error in a
comment.  I noticed that I'd use the word "automatically" nearby where
I meant to talk about things being "atomic".  Rahila Syed spotted a
misplaced counter update.  Fix all that stuff.
2016-03-16 13:54:04 -04:00
Robert Haas c16dc1aca5 Add simple VACUUM progress reporting.
There's a lot more that could be done here yet - in particular, this
reports only very coarse-grained information about the index vacuuming
phase - but even as it stands, the new pg_stat_progress_vacuum can
tell you quite a bit about what a long-running vacuum is actually
doing.

Amit Langote and Robert Haas, based on earlier work by Vinayak Pokale
and Rahila Syed.
2016-03-15 13:32:56 -04:00
Robert Haas fd31cd2651 Don't vacuum all-frozen pages.
Commit a892234f83 gave us enough
infrastructure to avoid vacuuming pages where every tuple on the
page is already frozen.  So, replace the notion of a scan_all or
whole-table vacuum with the less onerous notion of an "aggressive"
vacuum, which will pages that are all-visible, but still skip those
that are all-frozen.

This should greatly reduce the cost of anti-wraparound vacuuming
on large clusters where the majority of data is never touched
between one cycle and the next, because we'll no longer have to
read all of those pages only to find out that we don't need to
do anything with them.

Patch by me, reviewed by Masahiko Sawada.
2016-03-10 16:14:42 -05:00
Robert Haas be060cbcd4 Re-pgindent vacuumlazy.c. 2016-03-09 13:51:11 -05:00
Robert Haas b6fb6471f6 Add a generic command progress reporting facility.
Using this facility, any utility command can report the target relation
upon which it is operating, if there is one, and up to 10 64-bit
counters; the intent of this is that users should be able to figure out
what a utility command is doing without having to resort to ugly hacks
like attaching strace to a backend.

As a demonstration, this adds very crude reporting to lazy vacuum; we
just report the target relation and nothing else.  A forthcoming patch
will make VACUUM report a bunch of additional data that will make this
much more interesting.  But this gets the basic framework in place.

Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by
Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao,
and Masanori Oyama.
2016-03-09 12:08:58 -05:00
Robert Haas 77a1d1e798 Department of second thoughts: remove PD_ALL_FROZEN.
Commit a892234f83 added a second bit per
page to the visibility map, which still seems like a good idea, but it
also added a second page-level bit alongside PD_ALL_VISIBLE to track
whether the visibility map bit was set.  That no longer seems like a
clever plan, because we don't really need that bit for anything.  We
always clear both bits when the page is modified anyway.

Patch by me, reviewed by Kyotaro Horiguchi and Masahiko Sawada.
2016-03-08 08:46:48 -05:00
Robert Haas a892234f83 Change the format of the VM fork to add a second bit per page.
The new bit indicates whether every tuple on the page is already frozen.
It is cleared only when the all-visible bit is cleared, and it can be
set only when we vacuum a page and find that every tuple on that page is
both visible to every transaction and in no need of any future
vacuuming.

A future commit will use this new bit to optimize away full-table scans
that would otherwise be triggered by XID wraparound considerations.  A
page which is merely all-visible must still be scanned in that case, but
a page which is all-frozen need not be.  This commit does not attempt
that optimization, although that optimization is the goal here.  It
seems better to get the basic infrastructure in place first.

Per discussion, it's very desirable for pg_upgrade to automatically
migrate existing VM forks from the old format to the new format.  That,
too, will be handled in a follow-on patch.

Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit
Kapila, Simon Riggs, Andres Freund, and others, and substantially
revised by me.
2016-03-01 21:49:41 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Tom Lane e5d06f2b12 Dept of second thoughts: the !scan_all exit mustn't increase scanned_pages.
In the extreme edge case where contended pages are the only ones that
escape being scanned, the previous commit would have allowed us to think
that relfrozenxid could be advanced, which is exactly wrong.
2015-12-30 17:32:23 -05:00
Tom Lane e842908233 Avoid useless truncation attempts during VACUUM.
VACUUM can skip heap pages altogether when there's a run of consecutive
pages that are all-visible according to the visibility map.  This causes it
to not update its nonempty_pages count, just as if those pages were empty,
which means that at the end we will think they are candidates for deletion.
Thus, we may take the table's AccessExclusive lock only to find that no
pages are really truncatable.  This usually causes no real problems on a
master server, thanks to the lock being acquired only conditionally; but on
hot-standby servers, the same lock must be acquired unconditionally which
can result in unnecessary query cancellations.

To improve matters, force examination of the table's last page whenever
we reach there with a nonempty_pages count that would allow a truncation
attempt.  If it's not empty, we'll advance nonempty_pages and thereby
prevent the truncation attempt.

If we are unable to acquire cleanup lock on that page, there's no need to
force it, unless we're doing an anti-wraparound vacuum.  We can just check
for tuples with a shared buffer lock and then give up.  (When we are doing
an anti-wraparound vacuum, and decide it's okay to skip the page because it
contains no freezable tuples, this patch still improves matters because
nonempty_pages is properly updated, which it was not before.)

Since only the last page is special-cased in this way, we might attempt a
truncation that will release many fewer pages than the normal heuristic
would suggest; at worst, only one page would be truncated.  But that seems
all right, because the situation won't repeat during the next vacuum.
The real problem with the old logic is that the useless truncation attempt
happens every time we vacuum, so long as the state of the last few dozen
pages doesn't change.

This is a longstanding deficiency, but since the consequences aren't very
severe in most scenarios, I'm not going to risk a back-patch.

Jeff Janes and Tom Lane
2015-12-30 17:13:15 -05:00
Peter Eisentraut a8d585c091 Message style improvements
Message style, plurals, quoting, spelling, consistency with similar
messages
2015-10-28 20:38:36 -04:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Heikki Linnakangas fa60fb63e5 Fix more typos in comments.
Patch by CharSyam, plus a few more I spotted with grep.
2015-05-20 19:45:43 +03:00
Alvaro Herrera 4ff695b17d Add log_min_autovacuum_duration per-table option
This is useful to control autovacuum log volume, for situations where
monitoring only a set of tables is necessary.

Author: Michael Paquier
Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa,
Taiki Kondo, Huong Dangminh), Fujii Masao.
2015-04-03 11:55:50 -03:00
Alvaro Herrera 0d83138974 Rationalize vacuuming options and parameters
We were involving the parser too much in setting up initial vacuuming
parameters.  This patch moves that responsibility elsewhere to simplify
code, and also to make future additions easier.  To do this, create a
new struct VacuumParams which is filled just prior to vacuum execution,
instead of at parse time; for user-invoked vacuuming this is set up in a
new function ExecVacuum, while autovacuum sets it up by itself.

While at it, add a new member VACOPT_SKIPTOAST to enum VacuumOption,
only set by autovacuum, which is used to disable vacuuming of the toast
table instead of the old do_toast parameter; this relieves the argument
list of vacuum() and some callees a bit.  This partially makes up for
having added more arguments in an effort to avoid having autovacuum from
constructing a VacuumStmt parse node.

Author: Michael Paquier. Some tweaks by Álvaro
Reviewed by: Robert Haas, Stephen Frost, Álvaro Herrera
2015-03-18 11:52:33 -03:00
Andres Freund 17eaae9897 Fix logging of pages skipped due to pins during vacuum.
The new logging introduced in 35192f06 made the incorrect assumption
that scan_all vacuums would always wait for buffer pins; but they only
do so if the page actually needs to be frozen.

Fix that inaccuracy by removing the difference in log output based on
scan_all and just always remove the same message.  I chose to keep the
split log message from the original commit for now, it seems likely
that it'll be of use in the future.

Also merge the line about buffer pins in autovacuum's log output into
the existing "pages: ..." line. It seems odd to have a separate line
about pins, without the "topic: " prefix others have.

Also rename the new 'pinned_pages' variable to 'pinskipped_pages'
because it actually tracks the number of pages that could *not* be
pinned.

Discussion: 20150104005324.GC9626@awork2.anarazel.de
2015-01-08 12:57:09 +01:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Alvaro Herrera cd6e66572b Use %u to print out BlockNumber variables
Per Tom Lane
2014-12-18 17:59:00 -03:00
Alvaro Herrera 35192f0626 Have VACUUM log number of skipped pages due to pins
Author: Jim Nasby, some kibitzing by Heikki Linnankangas.
Discussion leading to current behavior and precise wording fueled by
thoughts from Robert Haas and Andres Freund.
2014-12-18 17:18:33 -03:00
Andres Freund 0c5af0a537 Move BufferGetBlockNumber() out of heap_page_is_all_visible()'s inner loop.
In some workloads BufferGetBlockNumber() shows up in profiles due to
the sheer number of calls to it (and because it causes cache
misses). The compiler can't move it out of the loop because it's a
full extern function call...
2014-11-14 17:04:44 +01:00
Heikki Linnakangas 2076db2aea Move the backup-block logic from XLogInsert to a new file, xloginsert.c.
xlog.c is huge, this makes it a little bit smaller, which is nice. Functions
related to putting together the WAL record are in xloginsert.c, and the
lower level stuff for managing WAL buffers and such are in xlog.c.

Also move the definition of XLogRecord to a separate header file. This
causes churn in the #includes of all the files that write WAL records, and
redo routines, but it avoids pulling in xlog.h into most places.

Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
2014-11-06 13:55:36 +02:00
Tom Lane fd0f651a86 Test IsInTransactionChain, not IsTransactionBlock, in vac_update_relstats.
As noted by Noah Misch, my initial cut at fixing bug #11638 didn't cover
all cases where ANALYZE might be invoked in an unsafe context.  We need to
test the result of IsInTransactionChain not IsTransactionBlock; which is
notationally a pain because IsInTransactionChain requires an isTopLevel
flag, which would have to be passed down through several levels of callers.
I chose to pass in_outer_xact (ie, the result of IsInTransactionChain)
rather than isTopLevel per se, as that seemed marginally more apropos
for the intermediate functions to know about.
2014-10-30 13:04:06 -04:00
Andres Freund ecac0e2b9e Do all-visible handling in lazy_vacuum_page() outside its critical section.
Since fdf9e21196 lazy_vacuum_page() rechecks the all-visible status
of pages in the second pass over the heap. It does so inside a
critical section, but both visibilitymap_test() and
heap_page_is_all_visible() perform operations that should not happen
inside one. The former potentially performs IO and both potentially do
memory allocations.

To fix, simply move all the all-visible handling outside the critical
section. Doing so means that the PD_ALL_VISIBLE on the page won't be
included in the full page image of the HEAP2_CLEAN record anymore. But
that's fine, the flag will be set by the HEAP2_VISIBLE logged later.

Backpatch to 9.3 where the problem was introduced. The bug only came
to light due to the assertion added in 4a170ee9 and isn't likely to
cause problems in production scenarios. The worst outcome is a
avoidable PANIC restart.

This also gets rid of the difference in the order of operations
between master and standby mentioned in 2a8e1ac5.

Per reports from David Leverton and Keith Fiske in bug #10533.
2014-06-20 11:09:17 +02:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Heikki Linnakangas 2a8e1ac598 Set the all-visible flag on heap page before writing WAL record, not after.
If we set the all-visible flag after writing WAL record, and XLogInsert
takes a full-page image of the page, the image would not include the flag.
We will then proceed to set the VM bit, which would then be set without the
corresponding all-visible flag on the heap page.

Found by comparing page images on master and standby, after writing/replaying
each WAL record. (There is still a discrepancy: the all-visible flag won't
be set after replaying the HEAP_CLEAN record, even though it is set in the
master. However, it will be set when replaying the HEAP2_VISIBLE record and
setting the VM bit, so the all-visible flag and VM bit are always consistent
on the standby, even though they are momentarily out-of-sync with master)

Backpatch to 9.3 where this code was introduced.
2014-04-17 17:47:50 +03:00