Commit Graph

725 Commits

Author SHA1 Message Date
Andres Freund 604f7956b9 Improve code around the recently added rm_identify rmgr callback.
There are four weaknesses in728f152e07f998d2cb4fe5f24ec8da2c3bda98f2:

* append_init() in heapdesc.c was ugly and required that rm_identify
  return values are only valid till the next call. Instead just add a
  couple more switch() cases for the INIT_PAGE cases. Now the returned
  value will always be valid.
* a couple rm_identify() callbacks missed masking xl_info with
  ~XLR_INFO_MASK.
* pg_xlogdump didn't map a NULL rm_identify to UNKNOWN or a similar
  string.
* append_init() was called when id=NULL - which should never actually
  happen. But it's better to be careful.
2014-09-22 17:49:34 +02:00
Andres Freund 728f152e07 Add rmgr callback to name xlog record types for display purposes.
This is primarily useful for the upcoming pg_xlogdump --stats feature,
but also allows to remove some duplicated code in the rmgr_desc
routines.

Due to the separation and harmonization, the output of dipsplayed
records changes somewhat. But since this isn't enduser oriented
content that's ok.

It's potentially desirable to further change pg_xlogdump's display of
records. It previously wasn't possible to show the record type
separately from the description forcing it to be in the last
column. But that's better done in a separate commit.

Author: Abhijit Menon-Sen, slightly editorialized by me
Reviewed-By: Álvaro Herrera, Andres Freund, and Heikki Linnakangas
Discussion: 20140604104716.GA3989@toroid.org
2014-09-19 16:20:29 +02:00
Heikki Linnakangas 54685338e3 Move log_newpage and log_newpage_buffer to xlog.c.
log_newpage is used by many indexams, in addition to heap, but for
historical reasons it's always been part of the heapam rmgr. Starting with
9.3, we have another WAL record type for logging an image of a page,
XLOG_FPI. Simplify things by moving log_newpage and log_newpage_buffer to
xlog.c, and switch to using the XLOG_FPI record type.

Bump the WAL version number because the code to replay the old HEAP_NEWPAGE
records is removed.
2014-07-31 16:48:55 +03:00
Heikki Linnakangas 60d931827b Oops, fix recoveryStopsBefore functions for regular commits.
Pointed out by Tom Lane. Backpatch to 9.4, the code was structured
differently in earlier branches and didn't have this mistake.
2014-07-29 17:19:43 +03:00
Heikki Linnakangas e74e0906fa Treat 2PC commit/abort the same as regular xacts in recovery.
There were several oversights in recovery code where COMMIT/ABORT PREPARED
records were ignored:

* pg_last_xact_replay_timestamp() (wasn't updated for 2PC commits)
* recovery_min_apply_delay (2PC commits were applied immediately)
* recovery_target_xid (recovery would not stop if the XID used 2PC)

The first of those was reported by Sergiy Zuban in bug #11032, analyzed by
Tom Lane and Andres Freund. The bug was always there, but was masked before
commit d19bd29f07, because COMMIT PREPARED
always created an extra regular transaction that was WAL-logged.

Backpatch to all supported versions (older versions didn't have all the
features and therefore didn't have all of the above bugs).
2014-07-29 11:59:22 +03:00
Robert Haas 250c26ba9c Fix checkpointer crash in EXEC_BACKEND builds.
Nothing in the checkpointer calls InitXLOGAccess(), so WALInsertLocks
never got initialized there.  Without EXEC_BACKEND, it works anyway
because the correct value is inherited from the postmaster, but
with EXEC_BACKEND we've got a problem.  The problem appears to have
been introduced by commit 68a2e52bba.

To fix, move the relevant initialization steps from InitXLOGAccess()
to XLOGShmemInit(), making this more parallel to what we do
elsewhere.

Amit Kapila
2014-07-24 09:12:38 -04:00
Peter Eisentraut d38228fe40 Add missing serial commas
Also update one place where the wal_level "logical" was not added to an
error message.
2014-07-15 08:31:50 -04:00
Heikki Linnakangas 1c6821be31 Fix and enhance the assertion of no palloc's in a critical section.
The assertion failed if WAL_DEBUG or LWLOCK_STATS was enabled; fix that by
using separate memory contexts for the allocations made within those code
blocks.

This patch introduces a mechanism for marking any memory context as allowed
in a critical section. Previously ErrorContext was exempt as a special case.

Instead of a blanket exception of the checkpointer process, only exempt the
memory context used for the pending ops hash table.
2014-06-30 10:26:00 +03:00
Alvaro Herrera f741300c90 Have multixact be truncated by checkpoint, not vacuum
Instead of truncating pg_multixact at vacuum time, do it only at
checkpoint time.  The reason for doing it this way is twofold: first, we
want it to delete only segments that we're certain will not be required
if there's a crash immediately after the removal; and second, we want to
do it relatively often so that older files are not left behind if
there's an untimely crash.

Per my proposal in
http://www.postgresql.org/message-id/20140626044519.GJ7340@eldon.alvh.no-ip.org
we now execute the truncation in the checkpointer process rather than as
part of vacuum.  Vacuum is in only charge of maintaining in shared
memory the value to which it's possible to truncate the files; that
value is stored as part of checkpoints also, and so upon recovery we can
reuse the same value to re-execute truncate and reset the
oldest-value-still-safe-to-use to one known to remain after truncation.

Per bug reported by Jeff Janes in the course of his tests involving
bug #8673.

While at it, update some comments that hadn't been updated since
multixacts were changed.

Backpatch to 9.3, where persistency of pg_multixact files was
introduced by commit 0ac5ad5134.
2014-06-27 14:43:53 -04:00
Heikki Linnakangas 85ba0748ed Fix bug in WAL_DEBUG.
The record header was not copied correctly to the buffer that was passed
to the rm_desc function. Broken by my rm_desc signature refactoring patch.
2014-06-23 12:22:36 +03:00
Heikki Linnakangas 0ef0b6784c Change the signature of rm_desc so that it's passed a XLogRecord.
Just feels more natural, and is more consistent with rm_redo.
2014-06-14 10:46:48 +03:00
Andres Freund e04a9ccd2c Consistency improvements for slot and decoding code.
Change the order of checks in similar functions to be the same; remove
a parameter that's not needed anymore; rename a memory context and
expand a couple of comments.

Per review comments from Amit Kapila
2014-06-12 13:33:27 +02:00
Tom Lane 5f93c37805 Add defenses against running with a wrong selection of LOBLKSIZE.
It's critical that the backend's idea of LOBLKSIZE match the way data has
actually been divided up in pg_largeobject.  While we don't provide any
direct way to adjust that value, doing so is a one-line source code change
and various people have expressed interest recently in changing it.  So,
just as with TOAST_MAX_CHUNK_SIZE, it seems prudent to record the value in
pg_control and cross-check that the backend's compiled-in setting matches
the on-disk data.

Also tweak the code in inv_api.c so that fetches from pg_largeobject
explicitly verify that the length of the data field is not more than
LOBLKSIZE.  Formerly we just had Asserts() for that, which is no protection
at all in production builds.  In some of the call sites an overlength data
value would translate directly to a security-relevant stack clobber, so it
seems worth one extra runtime comparison to be sure.

In the back branches, we can't change the contents of pg_control; but we
can still make the extra checks in inv_api.c, which will offer some amount
of protection against running with the wrong value of LOBLKSIZE.
2014-06-05 11:31:06 -04:00
Andres Freund f0c108560b Consistently spell a replication slot's name as slot_name.
Previously there's been a mix between 'slotname' and 'slot_name'. It's
not nice to be unneccessarily inconsistent in a new feature. As a post
beta1 initdb now is required in the wake of eeca4cd35e, fix the
inconsistencies.
Most the changes won't affect usage of replication slots because the
majority of changes is around function parameter names. The prominent
exception to that is that the recovery.conf parameter
'primary_slotname' is now named 'primary_slot_name'.
2014-06-05 16:29:20 +02:00
Tom Lane c1907f0cc4 Fix a bunch of functions that were declared static then defined not-static.
Per testing with a compiler that whines about this.
2014-05-17 17:57:53 -04:00
Tom Lane 0d0b2bf175 Rename min_recovery_apply_delay to recovery_min_apply_delay.
Per discussion, this seems like a more consistent choice of name.

Fabrízio de Royes Mello, after a suggestion by Peter Eisentraut;
some additional documentation wordsmithing by me
2014-05-10 19:46:19 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane 5035701e07 Improve generation algorithm for database system identifier.
As noted some time ago, the original coding had a typo ("|" for "^")
that made the result less unique than intended.  Even the intended
behavior is obsolete since it was based on wanting to produce a
usable value even if we didn't have int64 arithmetic --- a limitation
we stopped supporting years ago.  Instead, let's redefine the system
identifier as tv_sec in the upper 32 bits (same as before), tv_usec
in the next 20 bits, and the low 12 bits of getpid() in the remaining
bits.  This is still hardly guaranteed-universally-unique, but it's
noticeably better than before.  Per my proposal at
<29019.1374535940@sss.pgh.pa.us>
2014-04-26 15:11:10 -04:00
Bruce Momjian 83defef8c7 report stat() error in trigger file check
Permissions might prevent the existence of the trigger file from being
checked.

Per report from Andres Freund
2014-04-17 11:55:57 -04:00
Heikki Linnakangas 848b9f05ab Use correctly-sized buffer when zero-filling a WAL file.
I mixed up BLCKSZ and XLOG_BLCKSZ when I changed the way the buffer is
allocated a couple of weeks ago. With the default settings, they are both
8k, but they can be changed at compile-time.
2014-04-16 10:26:36 +03:00
Heikki Linnakangas 787064cd00 Fix typo in comment.
Tomonari Katsumata
2014-04-10 13:11:49 +03:00
Robert Haas 59202fae04 Fix some compiler warnings that clang emits with -pedantic.
Andres Freund
2014-04-04 11:29:50 -04:00
Heikki Linnakangas d9e7873bbb In checkpoint, move the check for in-progress xacts out of critical section.
GetVirtualXIDsDelayingChkpt calls palloc, which isn't safe in a critical
section. I thought I covered this case with the exemption for the
checkpointer, but CreateCheckPoint is also called from the startup process.
2014-04-04 17:31:22 +03:00
Heikki Linnakangas 877b088785 Avoid allocations in critical sections.
If a palloc in a critical section fails, it becomes a PANIC.
2014-04-04 13:35:44 +03:00
Heikki Linnakangas c2a6724823 Pass more than the first XLogRecData entry to rm_desc, with WAL_DEBUG.
If you compile with WAL_DEBUG and enable it with wal_debug=on, we used to
only pass the first XLogRecData entry to the rm_desc routine. I think the
original assumprion was that the first XLogRecData entry contains all the
necessary information for the rm_desc routine, but that's a pretty shaky
assumption. At least standby_redo didn't get the memo.

To fix, piece together all the data in a temporary buffer, and pass that to
the rm_desc routine.

It's been like this forever, but the patch didn't apply cleanly to
back-branches. Probably wouldn't be hard to fix the conflicts, but it's
not worth the trouble.
2014-03-26 18:17:53 +02:00
Fujii Masao 49638868f8 Don't forget to flush XLOG_PARAMETER_CHANGE record.
Backpatch to 9.0 where XLOG_PARAMETER_CHANGE record was instroduced.
2014-03-26 02:12:39 +09:00
Heikki Linnakangas 3ed249b741 Fix "the the" typos.
Erik Rijkers
2014-03-24 08:42:13 +02:00
Heikki Linnakangas 68a2e52bba Replace the XLogInsert slots with regular LWLocks.
The special feature the XLogInsert slots had over regular LWLocks is the
insertingAt value that was updated atomically with releasing backends
waiting on it. Add new functions to the LWLock API to do that, and replace
the slots with LWLocks. This reduces the amount of duplicated code.
(There's still some duplication, but at least it's all in lwlock.c now.)

Reviewed by Andres Freund.
2014-03-21 15:10:48 +01:00
Heikki Linnakangas 59a5ab3f42 Remove rm_safe_restartpoint machinery.
It is no longer used, none of the resource managers have multi-record
actions that would make it unsafe to perform a restartpoint.

Also don't allow rm_cleanup to write WAL records, it's also no longer
required. Move the call to rm_cleanup routines to make it more symmetric
with rm_startup.
2014-03-18 22:10:35 +02:00
Bruce Momjian 886c0be3f6 C comments: remove odd blank lines after #ifdef WIN32 lines 2014-03-13 01:34:42 -04:00
Heikki Linnakangas a3115f0d9e Only WAL-log the modified portion in an UPDATE, if possible.
When a row is updated, and the new tuple version is put on the same page as
the old one, only WAL-log the part of the new tuple that's not identical to
the old. This saves significantly on the amount of WAL that needs to be
written, in the common case that most fields are not modified.

Amit Kapila, with a lot of back and forth with me, Robert Haas, and others.
2014-03-12 23:28:36 +02:00
Heikki Linnakangas 956685f82b Do wal_level and hot standby checks when doing crash-then-archive recovery.
CheckRequiredParameterValues() should perform the checks if archive recovery
was requested, even if we are going to perform crash recovery first.

Reported by Kyotaro HORIGUCHI. Backpatch to 9.2, like the crash-then-archive
recovery mode.
2014-03-05 14:48:14 +02:00
Heikki Linnakangas af246c37c0 Fix lastReplayedEndRecPtr calculation when starting from shutdown checkpoint.
When entering crash recovery followed by archive recovery, and the latest
checkpoint is a shutdown checkpoint, and there are no more WAL records to
replay before transitioning from crash to archive recovery, we would not
immediately allow read-only connections in hot standby mode even if we
could. That's because when starting from a shutdown checkpoint, we set
lastReplayedEndRecPtr incorrectly to the record before the checkpoint
record, instead of the checkpoint record itself. We don't run the redo
routine of the shutdown checkpoint record, but starting recovery from it
goes through the same motions, so it should be considered as replayed.

Reported by Kyotaro HORIGUCHI. All versions with hot standby are affected,
so backpatch to 9.0.
2014-03-05 13:51:19 +02:00
Robert Haas b89e151054 Introduce logical decoding.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables.  The output format is controlled by a
so-called "output plugin"; an example is included.  To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.

Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.

Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
2014-03-03 16:32:18 -05:00
Heikki Linnakangas d8a42b150f Remove bogus while-loop.
Commit abf5c5c9a4 added a bogus while-
statement after the for(;;)-loop. It went unnoticed in testing, because
it was dead code.

Report by KONDO Mitsumasa. Backpatch to 9.3. The commit that introduced
this was also applied to 9.2, but not the bogus while-loop part, because
the code in 9.2 looks quite different.
2014-02-28 13:33:41 +02:00
Heikki Linnakangas 8f09ca436d Improve comment on setting data_checksum GUC.
There was an extra space there, and "fixed" wasn't very descriptive.
2014-02-20 10:58:30 +02:00
Heikki Linnakangas 057152b37c Fix comment; checkpointer, not bgwriter, performs checkpoints since 9.2.
Amit Langote
2014-02-18 09:48:18 +02:00
Tom Lane 01824385ae Prevent potential overruns of fixed-size buffers.
Coverity identified a number of places in which it couldn't prove that a
string being copied into a fixed-size buffer would fit.  We believe that
most, perhaps all of these are in fact safe, or are copying data that is
coming from a trusted source so that any overrun is not really a security
issue.  Nonetheless it seems prudent to forestall any risk by using
strlcpy() and similar functions.

Fixes by Peter Eisentraut and Jozef Mlich based on Coverity reports.

In addition, fix a potential null-pointer-dereference crash in
contrib/chkpass.  The crypt(3) function is defined to return NULL on
failure, but chkpass.c didn't check for that before using the result.
The main practical case in which this could be an issue is if libc is
configured to refuse to execute unapproved hashing algorithms (e.g.,
"FIPS mode").  This ideally should've been a separate commit, but
since it touches code adjacent to one of the buffer overrun changes,
I included it in this commit to avoid last-minute merge issues.
This issue was reported by Honza Horak.

Security: CVE-2014-0065 for buffer overruns, CVE-2014-0066 for crypt()
2014-02-17 11:20:21 -05:00
Heikki Linnakangas 4d894b41cd Change the order that pg_xlog and WAL archive are polled for WAL segments.
If there is a WAL segment with same ID but different TLI present in both
the WAL archive and pg_xlog, prefer the one with higher TLI. Before this
patch, the archive was polled first, for all expected TLIs, and only if no
file was found was pg_xlog scanned. This was a change in behavior from 9.3,
which first scanned archive and pg_xlog for the highest TLI, then archive
and pg_xlog for the next highest TLI and so forth. This patch reverts the
behavior back to what it was in 9.2.

The reason for this is that if for example you try to do archive recovery
to timeline 2, which branched off timeline 1, but the WAL for timeline 2 is
not archived yet, we would replay past the timeline switch point on
timeline 1 using the archived files, before even looking timeline 2's files
in pg_xlog

Report and patch by Kyotaro Horiguchi. Backpatch to 9.3 where the behavior
was changed.
2014-02-14 15:15:09 +02:00
Heikki Linnakangas d699ba4134 Fix WakeupWaiters() to not wake up an exclusive locker unnecessarily.
WakeupWaiters() is supposed to wake up all LW_WAIT_UNTIL_FREE waiters of
the slot, but the loop incorrectly also woke up the first LW_EXCLUSIVE
waiter, if there was no LW_WAIT_UNTIL_FREE waiters in the queue.

Noted by Andres Freund. This code is new in 9.4, so no backpatching.
2014-02-10 15:18:18 +02:00
Robert Haas 858ec11858 Introduce replication slots.
Replication slots are a crash-safe data structure which can be created
on either a master or a standby to prevent premature removal of
write-ahead log segments needed by a standby, as well as (with
hot_standby_feedback=on) pruning of tuples whose removal would cause
replication conflicts.  Slots have some advantages over existing
techniques, as explained in the documentation.

In a few places, we refer to the type of replication slots introduced
by this patch as "physical" slots, because forthcoming patches for
logical decoding will also have slots, but with somewhat different
properties.

Andres Freund and Robert Haas
2014-01-31 22:45:36 -05:00
Heikki Linnakangas 71c6a8e375 Add recovery_target='immediate' option.
This allows ending recovery as a consistent state has been reached. Without
this, there was no easy way to e.g restore an online backup, without
replaying any extra WAL after the backup ended.

MauMau and me.
2014-01-25 17:34:04 +02:00
Tom Lane ac4ef637ad Allow use of "z" flag in our printf calls, and use it where appropriate.
Since C99, it's been standard for printf and friends to accept a "z" size
modifier, meaning "whatever size size_t has".  Up to now we've generally
dealt with printing size_t values by explicitly casting them to unsigned
long and using the "l" modifier; but this is really the wrong thing on
platforms where pointers are wider than longs (such as Win64).  So let's
start using "z" instead.  To ensure we can do that on all platforms, teach
src/port/snprintf.c to understand "z", and add a configure test to force
use of that implementation when the platform's version doesn't handle "z".

Having done that, modify a bunch of places that were using the
unsigned-long hack to use "z" instead.  This patch doesn't pretend to have
gotten everyplace that could benefit, but it catches many of them.  I made
an effort in particular to ensure that all uses of the same error message
text were updated together, so as not to increase the number of
translatable strings.

It's possible that this change will result in format-string warnings from
pre-C99 compilers.  We might have to reconsider if there are any popular
compilers that will warn about this; but let's start by seeing what the
buildfarm thinks.

Andres Freund, with a little additional work by me
2014-01-23 17:18:33 -05:00
Tom Lane 061b079f89 Fix multiple bugs in index page locking during hot-standby WAL replay.
In ordinary operation, VACUUM must be careful to take a cleanup lock on
each leaf page of a btree index; this ensures that no indexscans could
still be "in flight" to heap tuples due to be deleted.  (Because of
possible index-tuple motion due to concurrent page splits, it's not enough
to lock only the pages we're deleting index tuples from.)  In Hot Standby,
the WAL replay process must likewise lock every leaf page.  There were
several bugs in the code for that:

* The replay scan might come across unused, all-zero pages in the index.
While btree_xlog_vacuum itself did the right thing (ie, nothing) with
such pages, xlogutils.c supposed that such pages must be corrupt and
would throw an error.  This accounts for various reports of replication
failures with "PANIC: WAL contains references to invalid pages".  To
fix, add a ReadBufferMode value that instructs XLogReadBufferExtended
not to complain when we're doing this.

* btree_xlog_vacuum performed the extra locking if standbyState ==
STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up
for hot standby queries until the database has reached consistency, and
we don't want to do the extra locking till then either, for fear of reading
corrupted pages (which bufmgr.c would complain about).  Fix by exporting a
new function from xlog.c that will report whether we're actually in hot
standby replay mode.

* To ensure full coverage of the index in the replay scan, btvacuumscan
would emit a dummy WAL record for the last page of the index, if no
vacuuming work had been done on that page.  However, if the last page
of the index is all-zero, that would result in corruption of said page,
since the functions called on it weren't prepared to handle that case.
There's no need to lock any such pages, so change the logic to target
the last normal leaf page instead.

The first two of these bugs were diagnosed by Andres Freund, the other one
by me.  Fixes based on ideas from Heikki Linnakangas and myself.

This has been wrong since Hot Standby was introduced, so back-patch to 9.0.
2014-01-14 17:35:21 -05:00
Heikki Linnakangas c945af80cf Refactor checking whether we've reached the recovery target.
Makes the replay loop slightly more readable, by separating the concerns of
whether to stop and whether to delay, and how to extract the timestamp from
a record.

This has the user-visible change that the timestamp of the last applied
record is now updated after actually applying it. Before, it was updated
just before applying it. That meant that pg_last_xact_replay_timestamp()
could return the timestamp of a commit record that is in process of being
replayed, but not yet applied. Normally the difference is small, but if
min_recovery_apply_delay is set, there could be a significant delay between
reading a record and applying it.

Another behavioral change is that if you recover to a restore point, we stop
after the restore point record, not before it. It makes no difference as far
as running queries on the server is concerned, as applying a restore point
record changes nothing, but if examine the timeline history you will see
that the new timeline branched off just after the restore point record, not
before it. One practical consequence is that if you do PITR to the new
timeline, and set recovery target to the same named restore point again, it
will find and stop recovery at the same restore point. Conceptually, I think
it makes more sense to consider the restore point as part of the new
timeline's history than not.

In principle, setting the last-replayed timestamp before actually applying
the record was a bug all along, but it doesn't seem worth the risk to
backpatch, since min_recovery_apply_delay was only added in 9.4.
2014-01-09 14:00:39 +02:00
Heikki Linnakangas 3739e5ab93 Fix pause_at_recovery_target + recovery_target_inclusive combination.
If pause_at_recovery_target is set, recovery pauses *before* applying the
target record, even if recovery_target_inclusive is set. If you then
continue with pg_xlog_replay_resume(), it will apply the target record
before ending recovery. In other words, if you log in while it's paused
and verify that the database looks OK, ending recovery changes its state
again, possibly destroying data that you were tring to salvage with PITR.

Backpatch to 9.1, this has been broken since pause_at_recovery_target was
added.
2014-01-08 23:28:52 +02:00
Heikki Linnakangas 815d71deed If multiple recovery_targets are specified, use the latest one.
The docs say that only one of recovery_target_xid, recovery_target_time, or
recovery_target_name can be specified. But the code actually did something
different, so that a name overrode time, and xid overrode both time and name.
Now the target specified last takes effect, whether it's an xid, time or
name.

With this patch, we still accept multiple recovery_target settings, even
though docs say that only one can be specified. It's a general property of
the recovery.conf file parser that you if you specify the same option twice,
the last one takes effect, like with postgresql.conf.
2014-01-08 22:26:39 +02:00
Heikki Linnakangas d59ff6c110 Fix bug in determining when recovery has reached consistency.
When starting WAL replay from an online checkpoint, the last replayed WAL
record variable was initialized using the checkpoint record's location, even
though the records between the REDO location and the checkpoint record had
not been replayed yet. That was noted as "slightly confusing" but harmless
in the comment, but in some cases, it fooled CheckRecoveryConsistency to
incorrectly conclude that we had already reached a consistent state
immediately at the beginning of WAL replay. That caused the system to accept
read-only connections in hot standby mode too early, and also PANICs with
message "WAL contains references to invalid pages".

Fix by initializing the variables to the REDO location instead.

In 9.2 and above, change CheckRecoveryConsistency() to use
lastReplayedEndRecPtr variable when checking if backup end location has
been reached. It was inconsistently using EndRecPtr for that check, but
lastReplayedEndRecPtr when checking min recovery point. It made no
difference before this patch, because in all the places where
CheckRecoveryConsistency was called the two variables were the same, but
it was always an accident waiting to happen, and would have been wrong
after this patch anyway.

Report and analysis by Tomonari Katsumata, bug #8686. Backpatch to 9.0,
where hot standby was introduced.
2014-01-08 15:03:09 +02:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Magnus Hagander 9544cc0d65 Move permissions check from do_pg_start_backup to pg_start_backup
And the same for do_pg_stop_backup. The code in do_pg_* is not allowed
to access the catalogs. For manual base backups, the permissions
check can be handled in the calling function, and for streaming
base backups only users with the required permissions can get past
the authentication step in the first place.

Reported by Antonin Houska, diagnosed by Andres Freund
2014-01-07 17:50:56 +01:00