Commit Graph

228 Commits

Author SHA1 Message Date
Tom Lane 46825d4978 Clean up some sloppy coding in repl_gram.y.
Remove unused copy-and-pasted macro definitions, and improve formatting
of recently-added productions.

I got interested in this because buildfarm member protosciurus has been
crashing in "bison repl_gram.y" since commit 858ec11.  It's a long shot
that this will fix that, though maybe the missing trailing semicolon
has something to do with it?  In any case, there's no need to approve
of dead code, nor of code whose formatting isn't even self-consistent
let alone consistent with what's around it.
2014-02-02 12:51:14 -05:00
Fujii Masao 63be3b78f6 Fix typos in docs and comments.
Thom Brown
2014-02-02 10:28:18 +09:00
Tom Lane 214c7a4f0b Fix some more bugs in signal handlers and process shutdown logic.
WalSndKill was doing things exactly backwards: it should first clear
MyWalSnd (to stop signal handlers from touching MyWalSnd->latch),
then disown the latch, and only then mark the WalSnd struct unused by
clearing its pid field.

Also, WalRcvSigUsr1Handler and worker_spi_sighup failed to preserve
errno, which is surely a requirement for any signal handler.

Per discussion of recent buildfarm failures.  Back-patch as far
as the relevant code exists.
2014-02-01 16:21:23 -05:00
Robert Haas 858ec11858 Introduce replication slots.
Replication slots are a crash-safe data structure which can be created
on either a master or a standby to prevent premature removal of
write-ahead log segments needed by a standby, as well as (with
hot_standby_feedback=on) pruning of tuples whose removal would cause
replication conflicts.  Slots have some advantages over existing
techniques, as explained in the documentation.

In a few places, we refer to the type of replication slots introduced
by this patch as "physical" slots, because forthcoming patches for
logical decoding will also have slots, but with somewhat different
properties.

Andres Freund and Robert Haas
2014-01-31 22:45:36 -05:00
Fujii Masao dd515d4082 Change the suffix of auto conf temporary file from "temp" to "tmp".
Michael Paquier
2014-01-27 12:39:11 +09:00
Heikki Linnakangas a472ae1e4e Fix Hot Standby feedback sending when streaming busily.
Commit 6f60fdd701 accidentally removed a
call to XLogWalRcvSendHSFeedback() after flushing received WAL to disk.
The consequence is that when walsender is busy streaming WAL, it doesn't
send HS feedback messages. One is sent if nothing is received from the
master for 100ms, but if there's a steady stream of WAL, it never happens.

Backpatch to 9.3.

Andres Freund and Amit Kapila
2014-01-16 23:15:41 +02:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Magnus Hagander b168c5ef27 Avoid including tablespaces inside PGDATA twice in base backups
If a tablespace was crated inside PGDATA it was backed up both as part
of the PGDATA backup and as the backup of the tablespace. Avoid this
by skipping any directory inside PGDATA that contains one of the active
tablespaces.

Dimitri Fontaine and Magnus Hagander
2014-01-07 17:11:32 +01:00
Tatsuo Ishii 65d6e4cb5c Add ALTER SYSTEM command to edit the server configuration file.
Patch contributed by Amit Kapila. Reviewed by Hari Babu, Masao Fujii,
Boszormenyi Zoltan, Andres Freund, Greg Smith and others.
2013-12-18 23:42:44 +09:00
Heikki Linnakangas dde6282500 Fix more instances of "the the" in comments.
Plus one instance of "to to" in the docs.
2013-12-13 20:02:01 +02:00
Heikki Linnakangas a93bdfc711 Fix typo in comment.
Also line-wrap an over-wide line in a comment that's ignored by pgindent.
2013-09-03 13:17:09 +03:00
Magnus Hagander db4ef73760 Don't crash when pg_xlog is empty and pg_basebackup -x is used
The backup will not work (without a logarchive, and that's the whole
point of -x) in this case, this patch just changes it to throw an
error instead of crashing when this happens.

Noticed and diagnosed by TAKATSUKA Haruka
2013-08-24 17:13:49 +02:00
Peter Eisentraut 229fb58d4f Treat timeline IDs as unsigned in replication parser
Timeline IDs are unsigned ints everywhere, except the replication parser
treated them as signed ints.
2013-08-14 23:18:49 -04:00
Peter Eisentraut 626092a2e1 Message style improvements 2013-07-28 07:01:13 -04:00
Fujii Masao 985bd7d497 Support clean switchover.
In replication, when we shutdown the master, walsender tries to send
all the outstanding WAL records to the standby, and then to exit. This
basically means that all the WAL records are fully synced between
two servers after the clean shutdown of the master. So, after
promoting the standby to new master, we can restart the stopped
master as new standby without the need for a fresh backup from
new master.

But there was one problem so far: though walsender tries to send all
the outstanding WAL records, it doesn't wait for them to be replicated
to the standby. Then, before receiving all the WAL records,
walreceiver can detect the closure of connection and exit. We cannot
guarantee that there is no missing WAL in the standby after clean
shutdown of the master. In this case, backup from new master is
required when restarting the stopped master as new standby.

This patch fixes this problem. It just changes walsender so that it
waits for all the outstanding WAL records to be replicated to the
standby before closing the replication connection.

Per discussion, this is a fix that needs to get backpatched rather than
new feature. So, back-patch to 9.1 where enough infrastructure for
this exists.

Patch by me, reviewed by Andres Freund.
2013-06-26 02:14:37 +09:00
Peter Eisentraut d7eb6f46de Minor spell checking 2013-05-30 20:56:58 -04:00
Bruce Momjian 9af4159fce pgindent run for release 9.3
This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.
2013-05-29 16:58:43 -04:00
Heikki Linnakangas 2ffa66f497 Fix walsender failure at promotion.
If a standby server has a cascading standby server connected to it, it's
possible that WAL has already been sent up to the next WAL page boundary,
splitting a WAL record in the middle, when the first standby server is
promoted. Don't throw an assertion failure or error in walsender if that
happens.

Also, fix a variant of the same bug in pg_receivexlog: if it had already
received WAL on previous timeline up to a segment boundary, when the
upstream standby server is promoted so that the timeline switch record falls
on the previous segment, pg_receivexlog would miss the segment containing
the timeline switch. To fix that, have walsender send the position of the
timeline switch at end-of-streaming, in addition to the next timeline's ID.
It was previously assumed that the switch happened exactly where the
streaming stopped.

Note: this is an incompatible change in the streaming protocol. You might
get an error if you try to stream over timeline switches, if the client is
running 9.3beta1 and the server is more recent. It should be fine after a
reconnect, however.

Reported by Fujii Masao.
2013-05-08 20:30:17 +03:00
Heikki Linnakangas 28ba260906 In base backup, only include our own tablespace version directory.
If you have clusters of different versions pointing to the same tablespace
location, we would incorrectly include all the data belonging to the other
versions, too.

Fixes bug #7986, reported by Sergey Burladyan.
2013-03-25 20:19:22 +02:00
Tom Lane da5aeccf64 Move pqsignal() to libpgport.
We had two copies of this function in the backend and libpq, which was
already pretty bogus, but it turns out that we need it in some other
programs that don't use libpq (such as pg_test_fsync).  So put it where
it probably should have been all along.  The signal-mask-initialization
support in src/backend/libpq/pqsignal.c stays where it is, though, since
we only need that in the backend.
2013-03-17 12:06:42 -04:00
Heikki Linnakangas 3a9e64aa0d Cannot use WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE.
In copy-out mode, the frontend should not send any messages until the
backend has finished streaming, by sending a CopyDone message. I'm not sure
if it would be legal for the client to send a new query before receiving the
CopyDone message from the backend, but trying to support that would require
bigger changes to the backend code structure.

Fixes an assertion failure reported by Fujii Masao.
2013-02-27 19:28:51 +02:00
Peter Eisentraut 4f36292669 Add quotes to messages 2013-02-22 23:33:07 -05:00
Simon Riggs c2f79ba269 Force archive_status of .done for xlogs created by dearchival/replication.
This is a forward-patch of commit 6f4b8a4f4f,
applied to 9.2 back in August. The plan was to do something else in master,
but it looks like it's not going to happen, so let's just apply the 9.2
solution to master as well.

Fujii Masao
2013-02-15 19:28:06 +02:00
Peter Eisentraut 0cb1fac3b1 Add noreturn attributes to some error reporting functions 2013-02-12 07:13:22 -05:00
Simon Riggs bd56e74127 Reset master xmin when hot_standby_feedback disabled.
If walsender has xmin of standby then ensure we
reset the value to 0 when we change from hot_standby_feedback=on
to hot_standby_feedback=off.
2013-02-04 10:29:22 +00:00
Heikki Linnakangas 990fe3c4ed Fix more issues with cascading replication and timeline switches.
When a standby server follows the master using WAL archive, and it chooses
a new timeline (recovery_target_timeline='latest'), it only fetches the
timeline history file for the chosen target timeline, not any other history
files that might be missing from pg_xlog. For example, if the current
timeline is 2, and we choose 4 as the new recovery target timeline, the
history file for timeline 3 is not fetched, even if it's part of this
server's history. That's enough for the standby itself - the history file
for timeline 4 includes timeline 3 as well - but if a cascading standby
server wants to recover to timeline 3, it needs the history file. To fix,
when a new recovery target timeline is chosen, try to copy any missing
history files from the archive to pg_xlog between the old and new target
timeline.

A second similar issue was with the WAL files. When a standby recovers from
archive, and it reaches a segment that contains a switch to a new timeline,
recovery fetches only the WAL file labelled with the new timeline's ID. The
file from the new timeline contains a copy of the WAL from the old timeline
up to the point where the switch happened, and recovery recovers it from the
new file. But in streaming replication, walsender only tries to read it
from the old timeline's file. To fix, change walsender to read it from the
new file, so that it behaves the same as recovery in that sense, and doesn't
try to open the possibly nonexistent file with the old timeline's ID.
2013-01-23 10:19:20 +02:00
Heikki Linnakangas 6f7cddc7ae Now that START_REPLICATION returns the next timeline's ID after reaching end
of timeline, take advantage of that in walreceiver.

Startup process is still in control of choosign the target timeline, by
scanning the timeline history files present in pg_xlog, but walreceiver now
uses the next timeline's ID to fetch its history file immediately after it
has finished streaming the old timeline. Before, the standby would first try
to restart streaming on the old timeline, which fetches the missing timeline
history file as a side-effect, and only then restart from the new timeline.
This patch eliminates the extra iteration, which speeds up the timeline
switch and reduces the noise in the log caused by the extra restart on the
old timeline.
2013-01-18 11:59:34 +02:00
Heikki Linnakangas 3684a534ef I added a result set to START_STREAMING command, but neglected walreceiver.
The patch to allow pg_receivexlog to switch timeline added a result set
after copy has ended in START_STREAMING command, to return the next
timeline's ID to the client. But walreceived didn't get the memo, and threw
an error on the unexpected result set. Fix.
2013-01-17 23:45:45 +02:00
Heikki Linnakangas 0b6329130e Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.
This mirrors the changes done earlier to the server in standby mode. When
receivelog reaches the end of a timeline, as reported by the server, it
fetches the timeline history file of the next timeline, and restarts
streaming from the new timeline by issuing a new START_STREAMING command.

When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the
last segment on the old timeline. This helps you to tell apart a partial
segment left in the directory because of a timeline switch, and a completed
segment. If you just follow a single server, it won't make a difference, but
it can be significant in more complicated scenarios where new WAL is still
generated on the old timeline.

This includes two small changes to the streaming replication protocol:
First, when you reach the end of timeline while streaming, the server now
sends the TLI of the next timeline in the server's history to the client.
pg_receivexlog uses that as the next timeline, so that it doesn't need to
parse the timeline history file like a standby server does. Second, when
BASE_BACKUP command sends the begin and end WAL positions, it now also sends
the timeline IDs corresponding the positions.
2013-01-17 20:23:00 +02:00
Heikki Linnakangas 3f4b1749a8 Return value of lseek() can be negative on failure.
Because the return value of lseek() was assigned to an unsigned size_t
variable, we'd fail to notice an error return code -1. Compiler gave a
warning about this.

Andres Freund
2013-01-15 00:42:37 +02:00
Tom Lane b853eb9718 Improve handling of ereport(ERROR) and elog(ERROR).
In commit 71450d7fd6, we added code to inform
suitably-intelligent compilers that ereport() doesn't return if the elevel
is ERROR or higher.  This patch extends that to elog(), and also fixes a
double-evaluation hazard that the previous commit created in ereport(),
as well as reducing the emitted code size.

The elog() improvement requires the compiler to support __VA_ARGS__, which
should be available in just about anything nowadays since it's required by
C99.  But our minimum language baseline is still C89, so add a configure
test for that.

The previous commit assumed that ereport's elevel could be evaluated twice,
which isn't terribly safe --- there are already counterexamples in xlog.c.
On compilers that have __builtin_constant_p, we can use that to protect the
second test, since there's no possible optimization gain if the compiler
doesn't know the value of elevel.  Otherwise, use a local variable inside
the macros to prevent double evaluation.  The local-variable solution is
inferior because (a) it leads to useless code being emitted when elevel
isn't constant, and (b) it increases the optimization level needed for the
compiler to recognize that subsequent code is unreachable.  But it seems
better than not teaching non-gcc compilers about unreachability at all.

Lastly, if the compiler has __builtin_unreachable(), we can use that
instead of abort(), resulting in a noticeable code savings since no
function call is actually emitted.  However, it seems wise to do this only
in non-assert builds.  In an assert build, continue to use abort(), so that
the behavior will be predictable and debuggable if the "impossible"
happens.

These changes involve making the ereport and elog macros emit do-while
statement blocks not just expressions, which forces small changes in
a few call sites.

Andres Freund, Tom Lane, Heikki Linnakangas
2013-01-13 18:40:09 -05:00
Heikki Linnakangas b0daba57bb Tolerate timeline switches while "pg_basebackup -X fetch" is running.
If you take a base backup from a standby server with "pg_basebackup -X
fetch", and the timeline switches while the backup is being taken, the
backup used to fail with an error "requested WAL segment %s has already
been removed". This is because the server-side code that sends over the
required WAL files would not construct the WAL filename with the correct
timeline after a switch.

Fix that by using readdir() to scan pg_xlog for all the WAL segments in the
range, regardless of timeline.

Also, include all timeline history files in the backup, if taken with
"-X fetch". That fixes another related bug: If a timeline switch happened
just before the backup was initiated in a standby, the WAL segment
containing the initial checkpoint record contains WAL from the older
timeline too. Recovery will not accept that without a timeline history file
that lists the older timeline.

Backpatch to 9.2. Versions prior to that were not affected as you could not
take a base backup from a standby before 9.2.
2013-01-03 19:51:00 +02:00
Heikki Linnakangas ee994272ca Delay reading timeline history file until it's fetched from master.
Streaming replication can fetch any missing timeline history files from the
master, but recovery would read the timeline history file for the target
timeline before reading the checkpoint record, and before walreceiver has
had a chance to fetch it from the master. Delay reading it, and the sanity
checks involving timeline history, until after reading the checkpoint
record.

There is at least one scenario where this makes a difference: if you take
a base backup from a standby server right after a timeline switch, the
WAL segment containing the initial checkpoint record will begin with an
older timeline ID. Without the timeline history file, recovering that file
will fail as the older timeline ID is not recognized to be an ancestor of
the target timeline. If you try to recover from such a backup, using only
streaming replication to fetch the WAL, this patch is required for that to
work.
2013-01-03 10:41:58 +02:00
Magnus Hagander 794397ae1d Move tar function headers to pgtar.h
This makes it possible to include them only where they are used, so
we can avoid the conflict of the uid_t and gid_t datatypes that happened
in plperl (since plperl doesn't need the tar functions)
2013-01-02 20:34:08 +01:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Magnus Hagander f5d4bdd3a5 Unify some tar functionality across different parts
Move some of the tar functionality that existed mostly duplicated
in both pg_dump and the walsender basebackup functionality into
port/tar.c instead, so it can be used from both. It will also be
used by pg_basebackup in the future, which would've caused a third
copy of it around.

Zoltan Boszormenyi and Magnus Hagander
2013-01-01 18:15:57 +01:00
Alvaro Herrera 5ab3af46dd Remove obsolete XLogRecPtr macros
This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance.
These were useful for brevity when XLogRecPtrs were split in
xlogid/xrecoff; but now that they are simple uint64's, they are just
clutter.  The only downside to making this change would be ease of
backporting patches, but that has been negated by other substantive
changes to the involved code anyway.  The clarity of simpler expressions
makes the change worthwhile.

Most of the changes are mechanical, but in a couple of places, the patch
author chose to invert the operator sense, making the code flow more
logical (and more in line with preceding comments).

Author: Andres Freund
Eyeballed by Dimitri Fontaine and Alvaro Herrera
2012-12-28 13:06:15 -03:00
Alvaro Herrera 24eca7977e Assign InvalidXLogRecPtr instead of MemSet(0)
For consistency.

Author: Andres Freund
2012-12-27 18:33:03 -03:00
Heikki Linnakangas 1ff92eea14 Fix sloppiness in the timeline switch over streaming replication patch.
Here's another attempt at fixing the logic that decides how far the WAL can
be streamed, which was still broken if the timeline changed while streaming.
You would get an assertion failure. The way the logic is now written is more
readable, too.

Thom Brown reported the assertion failure.
2012-12-21 20:08:12 +02:00
Heikki Linnakangas 36e4456d78 Fix race condition if a file is removed while pg_basebackup is running.
If a relation file was removed when the server-side counterpart of
pg_basebackup was just about to open it to send it to the client, you'd
get a "could not open file" error. Fix that.

Backpatch to 9.1, this goes back to when pg_basebackup was introduced.
2012-12-21 15:34:15 +02:00
Heikki Linnakangas af275a12df Follow TLI of last replayed record, not recovery target TLI, in walsenders.
Most of the time, the last replayed record comes from the recovery target
timeline, but there is a corner case where it makes a difference. When
the startup process scans for a new timeline, and decides to change recovery
target timeline, there is a window where the recovery target TLI has already
been bumped, but there are no WAL segments from the new timeline in pg_xlog
yet. For example, if we have just replayed up to point 0/30002D8, on
timeline 1, there is a WAL file called 000000010000000000000003 in pg_xlog
that contains the WAL up to that point. When recovery switches recovery
target timeline to 2, a walsender can immediately try to read WAL from
0/30002D8, from timeline 2, so it will try to open WAL file
000000020000000000000003. However, that doesn't exist yet - the startup
process hasn't copied that file from the archive yet nor has the walreceiver
streamed it yet, so walsender fails with error "requested WAL segment
000000020000000000000003 has already been removed". That's harmless, in that
the standby will try to reconnect later and by that time the segment is
already created, but error messages that should be ignored are not good.

To fix that, have walsender track the TLI of the last replayed record,
instead of the recovery target timeline. That way walsender will not try to
read anything from timeline 2, until the WAL segment has been created and at
least one record has been replayed from it. The recovery target timeline is
now xlog.c's internal affair, it doesn't need to be exposed in shared memory
anymore.

This fixes the error reported by Thom Brown. depesz the same error message,
but I'm not sure if this fixes his scenario.
2012-12-20 14:39:04 +02:00
Heikki Linnakangas abfd192b1b Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.

There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.

START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.

Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
2012-12-13 19:17:32 +02:00
Heikki Linnakangas add6c3179a Make the streaming replication protocol messages architecture-independent.
We used to send structs wrapped in CopyData messages, which works as long as
the client and server agree on things like endianess, timestamp format and
alignment. That's good enough for running a standby server, which has to run
on the same platform anyway, but it's useful for tools like pg_receivexlog
to work across platforms.

This breaks protocol compatibility of streaming replication, but we never
promised that to be compatible across versions, anyway.
2012-11-07 19:09:13 +02:00
Heikki Linnakangas 7d3ed5ae78 Fix typo in comment.
Fujii Masao
2012-10-15 13:01:31 +03:00
Heikki Linnakangas 6f60fdd701 Improve replication connection timeouts.
Rename replication_timeout to wal_sender_timeout, and add a new setting
called wal_receiver_timeout that does the same at the walreceiver side.
There was previously no timeout in walreceiver, so if the network went down,
for example, the walreceiver could take a long time to notice that the
connection was lost. Now with the two settings, both sides of a replication
connection will detect a broken connection similarly.

It is no longer necessary to manually set wal_receiver_status_interval to
a value smaller than the timeout. Both wal sender and receiver now
automatically send a "ping" message if more than 1/2 of the configured
timeout has elapsed, and it hasn't received any messages from the other end.

Amit Kapila, heavily edited by me.
2012-10-11 17:48:08 +03:00
Peter Eisentraut 8521d13194 Refactor flex and bison make rules
Numerous flex and bison make rules have appeared in the source tree
over time, and they are all virtually identical, so we can replace
them by pattern rules with some variables for customization.

Users of pgxs will also be able to benefit from this.
2012-10-11 06:57:04 -04:00
Heikki Linnakangas 0b77aebabf Remove stray newline in comment. 2012-10-09 13:06:48 +03:00
Peter Eisentraut b6d4522296 Remove generation of repl_gram.h
It was apparently never necessary.
2012-10-08 20:36:46 -04:00
Heikki Linnakangas 9c0e2b9182 Fix walsender handling of postmaster shutdown, to not go into endless loop.
This bug was introduced by my patch to use the regular die/quickdie signal
handlers in walsender processes. I tried to make walsender exit at next
CHECK_FOR_INTERRUPTS() by setting ProcDiePending, but that's not enough, you
need to set InterruptPending too. On second thoght, it was not a very good
way to make walsender exit anyway, so use proc_exit(0) instead.

Also, send a CommandComplete message before exiting; that's what we did
before, and you get a nicer error message in the standby that way.

Reported by Thom Brown.
2012-10-08 13:32:14 +03:00
Heikki Linnakangas fd5942c18f Use the regular main processing loop also in walsenders.
The regular backend's main loop handles signal handling and error recovery
better than the current WAL sender command loop does. For example, if the
client hangs and a SIGTERM is received before starting streaming, the
walsender will now terminate immediately, rather than hang until the
connection times out.
2012-10-05 17:21:12 +03:00