Commit Graph

218 Commits

Author SHA1 Message Date
Heikki Linnakangas ec786c6c81 I neglected many comments in the log+seg -> 64-bit segno patch. Fix.
Reported by Amit Kapila.
2012-06-27 17:53:53 +03:00
Peter Eisentraut eeece9e609 Unify calling conventions for postgres/postmaster sub-main functions
There was a wild mix of calling conventions: Some were declared to
return void and didn't return, some returned an int exit code, some
claimed to return an exit code, which the callers checked, but
actually never returned, and so on.

Now all of these functions are declared to return void and decorated
with attribute noreturn and don't return.  That's easiest, and most
code already worked that way.
2012-06-25 21:30:12 +03:00
Robert Haas c7d47abd04 Fix typo in DEBUG message, introduced by recent WAL refactoring.
Fujii Masao
2012-06-25 14:00:35 -04:00
Heikki Linnakangas 0ab9d1c4b3 Replace XLogRecPtr struct with a 64-bit integer.
This simplifies code that needs to do arithmetic on XLogRecPtrs.

To avoid changing on-disk format of data pages, the LSN on data pages is
still stored in the old format. That should keep pg_upgrade happy. However,
we have XLogRecPtrs embedded in the control file, and in the structs that
are sent over the replication protocol, so this changes breaks compatibility
of pg_basebackup and server. I didn't do anything about this in this patch,
per discussion on -hackers, the right thing to do would to be to change the
replication protocol to be architecture-independent, so that you could use
a newer version of pg_receivexlog, for example, against an older server
version.
2012-06-24 19:19:45 +03:00
Heikki Linnakangas dfda6ebaec Don't waste the last segment of each 4GB logical log file.
The comments claimed that wasting the last segment made it easier to do
calculations with XLogRecPtrs, because you don't have problems representing
last-byte-position-plus-1 that way. In my experience, however, it only made
things more complicated, because the there was two ways to represent the
boundary at the beginning of a logical log file: logid = n+1 and xrecoff = 0,
or as xlogid = n and xrecoff = 4GB - XLOG_SEG_SIZE. Some functions were
picky about which representation was used.

Also, use a 64-bit segment number instead of the log/seg combination, to
point to a certain WAL segment. We assume that all platforms have a working
64-bit integer type nowadays.

This is an incompatible change in WAL format, so bumping WAL version number.
2012-06-24 18:35:29 +03:00
Magnus Hagander 3595a71e9c Prevent non-streaming replication connections from being selected sync slave
This prevents a pg_basebackup backup session that just does a base
backup (no xlog involved at all) from becoming the synchronous slave
and thus blocking all access while it runs.

Also fixes the problem when a higher priority slave shows up it would
become the sync standby before it has reached the STREAMING state, by
making sure we can only switch to a walsender that's actually STREAMING.

Fujii Masao
2012-06-11 15:17:38 +02:00
Bruce Momjian 927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Tom Lane ed61127be4 Cast some printf arguments to avoid possibly-nonportable behavior.
Per compiler warnings on buildfarm member black_firefly.
2012-03-23 20:18:04 -04:00
Simon Riggs ba1868ba31 Minor bug fix and cleanup from self-review of sync rep queues patch. 2012-01-30 14:36:17 +00:00
Simon Riggs 73f617f13f Various minor comments changes from bgwriter to checkpointer. 2012-01-30 14:34:25 +00:00
Simon Riggs 8366c7803e Allow pg_basebackup from standby node with safety checking.
Base backup follows recommended procedure, plus goes to great
lengths to ensure that partial page writes are avoided.

Jun Ishizuka and Fujii Masao, with minor modifications
2012-01-25 18:02:04 +00:00
Simon Riggs 443b4821f1 Add new replication mode synchronous_commit = 'write'.
Replication occurs only to memory on standby, not to disk,
so provides additional performance if user wishes to
reduce durability level slightly. Adds concept of multiple
independent sync rep queues.

Fujii Masao and Simon Riggs
2012-01-24 20:22:37 +00:00
Robert Haas 4d0b11a0ca Typo fix. 2012-01-13 08:21:45 -05:00
Simon Riggs 3f1787c253 Minor but necessary improvements to WAL keepalives
Fujii Masao
2012-01-13 12:59:08 +00:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Simon Riggs 64233902d2 Send new protocol keepalive messages to standby servers.
Allows streaming replication users to calculate transfer latency
and apply delay via internal functions. No external functions yet.
2011-12-31 13:30:26 +00:00
Tom Lane 0d0ec527af Fix corner cases in readlink() usage.
Make sure all calls are protected by HAVE_READLINK, and get the buffer
overflow tests right.  Be a bit more paranoid about string length in
_tarWriteHeader(), too.
2011-12-07 13:34:13 -05:00
Magnus Hagander 1f422db663 Avoid using readlink() on platforms that don't support it
We don't have any such platforms now, but might in the future.

Also, detect cases when a tablespace symlink points to a path that
is longer than we can handle, and give a warning.
2011-12-07 12:09:05 +01:00
Robert Haas ed0b409d22 Move "hot" members of PGPROC into a separate PGXACT array.
This speeds up snapshot-taking and reduces ProcArrayLock contention.
Also, the PGPROC (and PGXACT) structures used by two-phase commit are
now allocated as part of the main array, rather than in a separate
array, and we keep ProcArray sorted in pointer order.  These changes
are intended to minimize the number of cache lines that must be pulled
in to take a snapshot, and testing shows a substantial increase in
performance on both read and write workloads at high concurrencies.

Pavan Deolasee, Heikki Linnakangas, Robert Haas
2011-11-25 08:02:10 -05:00
Simon Riggs 9aceb6ab3c Refactor xlog.c to create src/backend/postmaster/startup.c
Startup process now has its own dedicated file, just like all other
special/background processes. Reduces role and size of xlog.c
2011-11-02 14:25:01 +00:00
Peter Eisentraut 654e1f96b0 Clean up whitespace and indentation in parser and scanner files
These are not touched by pgindent, so clean them up a bit manually.
2011-11-01 21:51:30 +02:00
Simon Riggs f3ebaad45b Comment changes to show bgwriter no longer performs checkpoints. 2011-11-01 18:48:47 +00:00
Heikki Linnakangas b436c72f61 Fix overly-complicated usage of errcode_for_file_access().
No need to do  "errcode(errcode_for_file_access())", just
"errcode_for_file_access()" is enough. The extra errcode() call is useless
but harmless, so there's no user-visible bug here. Nevertheless, backpatch
to 9.1 where this code were added.
2011-10-22 20:19:50 +03:00
Tom Lane b4a0223d00 Simplify and improve ProcessStandbyHSFeedbackMessage logic.
There's no need to clamp the standby's xmin to be greater than
GetOldestXmin's result; if there were any such need this logic would be
hopelessly inadequate anyway, because it fails to account for
within-database versus cluster-wide values of GetOldestXmin.  So get rid of
that, and just rely on sanity-checking that the xmin is not wrapped around
relative to the nextXid counter.  Also, don't reset the walsender's xmin if
the current feedback xmin is indeed out of range; that just creates more
problems than we already had.  Lastly, don't bother to take the
ProcArrayLock; there's no need to do that to set xmin.

Also improve the comments about this in GetOldestXmin itself.
2011-10-20 19:43:31 -04:00
Magnus Hagander d1e25b78f9 Exclude postmaster.opts from base backups
Noted by Fujii Masao
2011-10-18 15:58:37 +02:00
Magnus Hagander 7aeff9f4a4 Ensure walsenders can be SIGTERMed while in non-walsender code
In oder to exit on SIGTERM when in non-walsender code,
such as do_pg_stop_backup(), we need to set the interrupt
variables that are used there, and not just the walsender
local ones.
2011-10-06 21:43:14 +02:00
Alvaro Herrera 86822df9b5 Split walsender.h in public/private headers
This dramatically cuts short the number of headers the public one brings
into whatever includes it.
2011-09-13 21:42:49 -03:00
Tom Lane a7801b62f2 Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h.
As per my recent proposal, this refactors things so that these typedefs and
macros are available in a header that can be included in frontend-ish code.
I also changed various headers that were undesirably including
utils/timestamp.h to include datatype/timestamp.h instead.  Unsurprisingly,
this showed that half the system was getting utils/timestamp.h by way of
xlog.h.

No actual code changes here, just header refactoring.
2011-09-09 13:23:41 -04:00
Alvaro Herrera 295e7dc929 Tweak string for uniformity 2011-09-08 16:39:58 -03:00
Simon Riggs dde70cc313 Emit cascaded standby message on shutdown only when appropriate.
Adds additional test for active walsenders and closes a race
condition for when we failover when a new walsender was connecting.

Reported and fixed bu Fujii Masao. Review by Heikki Linnakangas
2011-09-07 09:09:47 +01:00
Tom Lane 1609797c25 Clean up the #include mess a little.
walsender.h should depend on xlog.h, not vice versa.  (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.)  Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h.  Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.

This episode reinforces my feeling that pgrminclude should not be run
without adult supervision.  Inclusion changes in header files in particular
need to be reviewed with great care.  More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.
2011-09-04 01:13:16 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Bruce Momjian 4bd7333b14 Allow more include files to be compiled in their own by adding missing
include dependencies.

Modify pgcompinclude to skip a common fcinfo error.
2011-08-27 11:05:33 -04:00
Tom Lane cff75130b5 Remove wal_sender_delay GUC, because it's no longer useful.
The latch infrastructure is now capable of detecting all cases where the
walsender loop needs to wake up, so there is no reason to have an arbitrary
timeout.

Also, modify the walsender loop logic to follow the standard pattern of
ResetLatch, test for work to do, WaitLatch.  The previous coding was both
hard to follow and buggy: it would sometimes busy-loop despite having
nothing available to do, eg between receipt of a signal and the next time
it was caught up with new WAL, and it also had interesting choices like
deciding to update to WALSNDSTATE_STREAMING on the strength of information
known to be obsolete.
2011-08-10 18:50:28 -04:00
Tom Lane 4dab3d5ae1 Change the autovacuum launcher to use WaitLatch instead of a poll loop.
In pursuit of this (and with the expectation that WaitLatch will be needed
in more places), convert the latch field that was already added to PGPROC
for sync rep into a generic latch that is activated for all PGPROC-owning
processes, and change many of the standard backend signal handlers to set
that latch when a signal happens.  This will allow WaitLatch callers to be
wakened properly by these signals.

In passing, fix a whole bunch of signal handlers that had been hacked to do
things that might change errno, without adding the necessary save/restore
logic for errno.  Also make some minor fixes in unix_latch.c, and clean
up bizarre and unsafe scheme for disowning the process's latch.  Much of
this has to be back-patched into 9.1.

Peter Geoghegan, with additional work by Tom
2011-08-10 12:22:21 -04:00
Tom Lane 9f17ffd866 Measure WaitLatch's timeout parameter in milliseconds, not microseconds.
The original definition had the problem that timeouts exceeding about 2100
seconds couldn't be specified on 32-bit machines.  Milliseconds seem like
sufficient resolution, and finer grain than that would be fantasy anyway
on many platforms.

Back-patch to 9.1 so that this aspect of the latch API won't change between
9.1 and later releases.

Peter Geoghegan
2011-08-09 18:52:29 -04:00
Tom Lane 4e15a4db5e Documentation improvement and minor code cleanups for the latch facility.
Improve the documentation around weak-memory-ordering risks, and do a pass
of general editorialization on the comments in the latch code.  Make the
Windows latch code more like the Unix latch code where feasible; in
particular provide the same Assert checks in both implementations.
Fix poorly-placed WaitLatch call in syncrep.c.

This patch resolves, for the moment, concerns around weak-memory-ordering
bugs in latch-related code: we have documented the restrictions and checked
that existing calls meet them.  In 9.2 I hope that we will install suitable
memory barrier instructions in SetLatch/ResetLatch, so that their callers
don't need to be quite so careful.
2011-08-09 15:30:45 -04:00
Tom Lane 05e8396892 Clean up ill-advised attempt to invent a private set of Node tags.
Somebody thought it'd be cute to invent a set of Node tag numbers that were
defined independently of, and indeed conflicting with, the main tag-number
list.  While this accidentally failed to fail so far, it would certainly
lead to trouble as soon as anyone wanted to, say, apply copyObject to these
node types.  Clang was already complaining about the use of makeNode on
these tags, and I think quite rightly so.  Fix by pushing these node
definitions into the mainstream, including putting replnodes.h where it
belongs.
2011-08-06 14:53:49 -04:00
Simon Riggs 5286105800 Cascading replication feature for streaming log-based replication.
Standby servers can now have WALSender processes, which can work with
either WALReceiver or archive_commands to pass data. Fully updated
docs, including new conceptual terms of sending server, upstream and
downstream servers. WALSenders terminated when promote to master.

Fujii Masao, review, rework and doc rewrite by Simon Riggs
2011-07-19 03:40:03 +01:00
Heikki Linnakangas 89fd72cbf2 Introduce a pipe between postmaster and each backend, which can be used to
detect postmaster death. Postmaster keeps the write-end of the pipe open,
so when it dies, children get EOF in the read-end. That can conveniently
be waited for in select(), which allows eliminating some of the polling
loops that check for postmaster death. This patch doesn't yet change all
the loops to use the new mechanism, expect a follow-on patch to do that.

This changes the interface to WaitLatch, so that it takes as argument a
bitmask of events that it waits for. Possible events are latch set, timeout,
postmaster death, and socket becoming readable or writeable.

The pipe method behaves slightly differently from the kill() method
previously used in PostmasterIsAlive() in the case that postmaster has died,
but its parent has not yet read its exit code with waitpid(). The pipe
returns EOF as soon as the process dies, but kill() continues to return
true until waitpid() has been called (IOW while the process is a zombie).
Because of that, change PostmasterIsAlive() to use the pipe too, otherwise
WaitLatch() would return immediately with WL_POSTMASTER_DEATH, while
PostmasterIsAlive() would claim it's still alive. That could easily lead to
busy-waiting while postmaster is in zombie state.

Peter Geoghegan with further changes by me, reviewed by Fujii Masao and
Florian Pflug.
2011-07-08 18:44:07 +03:00
Peter Eisentraut f05c65090a Message style improvements 2011-07-08 07:37:04 +03:00
Tom Lane 9cc2c182fc Add missing -I switch for VPATH builds.
Per bug #6073 from Hartmut Raschick.
2011-06-22 13:20:03 -04:00
Peter Eisentraut e2a0cb1a80 Message style and spelling improvements 2011-06-22 00:45:34 +03:00
Bruce Momjian 6560407c7d Pgindent run before 9.1 beta2. 2011-06-09 14:32:50 -04:00
Bruce Momjian 5a71b64130 Lowercase status labels in pg_stat_replication view. 2011-04-29 22:20:43 -04:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane 2594cf0e8c Revise the API for GUC variable assign hooks.
The previous functions of assign hooks are now split between check hooks
and assign hooks, where the former can fail but the latter shouldn't.
Aside from being conceptually clearer, this approach exposes the
"canonicalized" form of the variable value to guc.c without having to do
an actual assignment.  And that lets us fix the problem recently noted by
Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus
log messages about "parameter "wal_buffers" cannot be changed without
restarting the server".  There may be some speed advantage too, because
this design lets hook functions avoid re-parsing variable values when
restoring a previous state after a rollback (they can store a pre-parsed
representation of the value instead).  This patch also resolves a
longstanding annoyance about custom error messages from variable assign
hooks: they should modify, not appear separately from, guc.c's own message
about "invalid parameter value".
2011-04-07 00:12:02 -04:00
Robert Haas 240067b3b0 Merge synchronous_replication setting into synchronous_commit.
This means one less thing to configure when setting up synchronous
replication, and also avoids some ambiguity around what the behavior
should be when the settings of these variables conflict.

Fujii Masao, with additional hacking by me.
2011-04-04 16:25:52 -04:00
Robert Haas 38b27792ea Avoid possible hang during smart shutdown.
If a smart shutdown occurs just as a child is starting up, and the
child subsequently becomes a walsender, there is a race condition:
the postmaster might count the exstant backends, determine that there
is one normal backend, and wait for it to die off.  Had the walsender
transition already occurred before the postmaster counted, it would
have proceeded with the shutdown.

To fix this, have each child that transforms into a walsender kick
the postmaster just after doing so, so that the state machine is
certain to advance.

Fujii Masao
2011-04-03 19:42:00 -04:00
Robert Haas 7fcc75dd26 Fix compiler warning. 2011-04-01 10:36:38 -04:00
Heikki Linnakangas 754baa21f7 Automatically terminate replication connections that are idle for more
than replication_timeout (a new GUC) milliseconds. The TCP timeout is often
too long, you want the master to notice a dead connection much sooner.
People complained about that in 9.0 too, but with synchronous replication
it's even more important to notice dead connections promptly.

Fujii Masao and Heikki Linnakangas
2011-03-30 10:20:37 +03:00
Heikki Linnakangas bc03c5937d Adjust error message, now that we expect other message types than connection
close at this point. Fix PQsetnonblocking() comment.

Fujii Masao
2011-03-30 08:54:28 +03:00
Simon Riggs 92f4786fa9 Additional test for each commit in sync rep path to plug minute
possibility of race condition that would effect performance only.
Requested by Robert Haas. Re-arrange related comments.
2011-03-26 10:09:37 +00:00
Robert Haas 30f6136f28 Make walreceiver send a reply after receiving data but before flushing it.
It originally worked this way, but was changed by commit
a8a8a3e096, since which time it's been impossible
for walreceiver to ever send a reply with write_location and flush_location
set to different values.
2011-03-25 11:28:16 -04:00
Robert Haas 727589995a Move synchronous_standbys_defined updates from WAL writer to BG writer.
This is advantageous because the BG writer is alive until much later in
the shutdown sequence than WAL writer; we want to make sure that it's
possible to shut off synchronous replication during a smart shutdown,
else it might not be possible to complete the shutdown at all.

Per very reasonable gripes from Fujii Masao and Simon Riggs.
2011-03-18 21:43:45 -04:00
Robert Haas 7a37900443 Make synchronous replication query cancel/die messages more consistent.
Per a gripe from Thom Brown about my previous commit in this area,
commit 9a56dc3389.
2011-03-18 10:20:22 -04:00
Robert Haas 02b1f84e7d Remove bogus comment. 2011-03-17 14:43:57 -04:00
Robert Haas 9a56dc3389 Fix various possible problems with synchronous replication.
1. Don't ignore query cancel interrupts.  Instead, if the user asks to
cancel the query after we've already committed it, but before it's on
the standby, just emit a warning and let the COMMIT finish.

2. Don't ignore die interrupts (pg_terminate_backend or fast shutdown).
Instead, emit a warning message and close the connection without
acknowledging the commit.  Other backends will still see the effect of
the commit, but there's no getting around that; it's too late to abort
at this point, and ignoring die interrupts altogether doesn't seem like
a good idea.

3. If synchronous_standby_names becomes empty, wake up all backends
waiting for synchronous replication to complete.  Without this, someone
attempting to shut synchronous replication off could easily wedge the
entire system instead.

4. Avoid depending on the assumption that if a walsender updates
MyProc->syncRepState, we'll see the change even if we read it without
holding the lock.  The window for this appears to be quite narrow (and
probably doesn't exist at all on machines with strong memory ordering)
but protecting against it is practically free, so do that.

5. Remove useless state SYNC_REP_MUST_DISCONNECT, which isn't needed and
doesn't actually do anything.

There's still some further work needed here to make the behavior of fast
shutdown plausible, but that looks complex, so I'm leaving it for a
separate commit.  Review by Fujii Masao.
2011-03-17 13:12:21 -04:00
Robert Haas 551c07d84a Make error handling of synchronous_standby_names consistent.
It's not a good idea to kill the postmaster just because someone muffs
this, and it's not consistent with what we do for other, similar GUCs.

Fujii Masao, with a bit more hacking by me
2011-03-10 16:24:52 -05:00
Robert Haas 2e019c8611 More synchronous replication typo fixes.
Fujii Masao
2011-03-10 15:56:18 -05:00
Robert Haas b8bb8dbf20 More synchronous replication tweaks.
SyncRepRequested() must check not only the value of the
synchronous_replication GUC but also whether max_wal_senders > 0.
Otherwise, we might end up waiting for sync rep even when there's no
possibility of a standby ever managing to connect.  There are some
existing cross-checks to prevent this, but they're not quite sufficient:
the user can start the server with max_wal_senders=0,
synchronous_standby_names='', and synchronous_replication=off and then
subsequent make synchronous_standby_names not empty using pg_ctl reload,
and then SET synchronous_standby=on, leading to an indefinite hang.

Along the way, rename the global variable for the synchronous_replication
GUC to match the name of the GUC itself, for clarity.

Report by Fujii Masao, though I didn't use his patch.
2011-03-10 15:43:37 -05:00
Robert Haas 6436098795 Minor sync rep corrections.
Fujii Masao, with a bit of additional wordsmithing by me.
2011-03-10 14:57:02 -05:00
Robert Haas fcb99609b6 Replication README updates.
Fujii Masao
2011-03-10 08:59:59 -05:00
Itagaki Takahiro 2d8de0a50b Cleanup copyright years and file names in the header comments of some files. 2011-03-10 15:05:33 +09:00
Bruce Momjian 76fdee31c4 Mention gcc version in C comment. 2011-03-09 23:41:13 -05:00
Heikki Linnakangas baabf05196 Silence compiler warning about undefined function when compiling without
assertions.
2011-03-07 09:56:53 +02:00
Simon Riggs cae4974e3d Dynamic array required within pg_stat_replication. 2011-03-07 00:26:30 +00:00
Simon Riggs 966fb05b58 Add new files for syncrep missed in previous commit 2011-03-06 23:39:14 +00:00
Simon Riggs a8a8a3e096 Efficient transaction-controlled synchronous replication.
If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.

This version has very strict behaviour; more relaxed options
may be added at a later date.

Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime
Casanova, Heikki Linnakangas and Robert Haas, plus the assistance
of many other design reviewers.
2011-03-06 22:49:16 +00:00
Heikki Linnakangas 6eba5a7c57 Change pg_last_xlog_receive_location() not to move backwards. That makes
it a lot more useful for determining which standby is most up-to-date,
for example. There was long discussions on whether overwriting existing
existing WAL makes sense to begin with, and whether we should do some more
extensive variable renaming, but this change nevertheless seems quite
uncontroversial.

Fujii Masao, reviewed by Jeff Janes, Robert Haas, Stephen Frost.
2011-03-01 20:54:35 +02:00
Robert Haas 59d6a75942 Avoid excessive Hot Standby feedback messages.
Without this patch, when wal_receiver_status_interval=0, indicating that no
status messages should be sent, Hot Standby feedback messages are instead sent
extremely frequently.

Fujii Masao, with documentation changes by me.
2011-03-01 11:34:25 -05:00
Heikki Linnakangas be6668d6ef Increase the default for wal_sender_delay from 200ms to 1s. Now that WAL
sender is immediately woken up by transaction commit, there's no need to
wake up so aggressively.
2011-02-26 23:38:25 +02:00
Simon Riggs bc76695c4c Make a hard state change from catchup to streaming mode.
More useful state change for monitoring purposes, plus a
required change for synchronous replication patch.
2011-02-18 15:07:26 +00:00
Simon Riggs 06828c5feb Separate messages for standby replies and hot standby feedback.
Allow messages to be sent at different times, and greatly reduce
the frequency of hot standby feedback. Refactor to allow additional
message types.
2011-02-18 11:31:49 +00:00
Tom Lane 93016983d1 Fix blatantly uninitialized variable in recent commit.
Doesn't anybody around here pay attention to compiler warnings?
2011-02-16 19:53:20 -05:00
Simon Riggs bca8b7f16a Hot Standby feedback for avoidance of cleanup conflicts on standby.
Standby optionally sends back information about oldestXmin of queries
which is then checked and applied to the WALSender's proc->xmin.
GetOldestXmin() is modified slightly to agree with GetSnapshotData(),
so that all backends on primary include WALSender within their snapshots.
Note this does nothing to change the snapshot xmin on either master or
standby. Feedback piggybacks on the standby reply message.
vacuum_defer_cleanup_age is no longer used on standby, though parameter
still exists on primary, since some use cases still exist.

Simon Riggs, review comments from Fujii Masao, Heikki Linnakangas, Robert Haas
2011-02-16 19:29:37 +00:00
Robert Haas 3a087369c0 WAL receiver shouldn't try to send a reply when dying.
Per report from, and discussion with, Fujii Masao.
2011-02-16 10:27:35 -05:00
Robert Haas 883a9659fa Assorted corrections to the patch to add WAL receiver replies.
Per reports from Fujii Masao.
2011-02-15 12:05:00 -05:00
Robert Haas d309acf201 Typo fixes. receivedUpto should be capitalized consistently. 2011-02-11 11:55:12 -05:00
Heikki Linnakangas b186523fd9 Send status updates back from standby server to master, indicating how far
the standby has written, flushed, and applied the WAL. At the moment, this
is for informational purposes only, the values are only shown in
pg_stat_replication system view, but in the future they will also be needed
for synchronous replication.

Extracted from Simon riggs' synchronous replication patch by Robert Haas, with
some tweaking by me.
2011-02-10 21:04:02 +02:00
Magnus Hagander 3144c33a2f Implement NOWAIT option for BASE_BACKUP command
Specifying this option makes the server not wait for the
xlog to be archived, or emit a warning that it can't,
instead leaving the responsibility with the client.

This is useful when the log is being streamed using
the streaming protocol in parallel with the backup,
without having log archiving enabled.
2011-02-09 10:59:53 +01:00
Magnus Hagander cedd6515ba IDENTIFY_SYSTEM now returns 3 fields, not 2 2011-02-06 07:46:14 +01:00
Bruce Momjian 51dbc87dff Add C comment about why older compilers complain about basebackup.c's
longjump.
2011-02-04 23:28:14 -05:00
Magnus Hagander 76129e7f14 Include more status information in walsender results
Add the current xlog insert location to the response of
IDENTIFY_SYSTEM, and adds result sets containing start
and stop location of backups to BASE_BACKUP responses.
2011-02-03 13:46:23 +01:00
Heikki Linnakangas 997b48ed96 Support multiple concurrent pg_basebackup backups.
With this patch, pg_basebackup doesn't write a backup_label file in the
data directory, so it doesn't interfere with a pg_start/stop_backup() based
backup anymore. backup_label is still included in the backup, but it is
injected directly into the tar stream.

Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.
2011-01-31 18:25:39 +02:00
Magnus Hagander 507069de6d Add option to include WAL in base backup
When included, this makes the base backup a complete working
"clone" of the initial database, ready to have a postmaster
started against it without the need to set up any log archiving
or similar.

Magnus Hagander, reviewed by Fujii Masao and Heikki Linnakangas
2011-01-30 21:30:09 +01:00
Magnus Hagander 966d4f52c2 Typo fix for MemSet size.
Fujii Masao
2011-01-25 10:50:04 +01:00
Magnus Hagander e5487f65fd Make walsender options order-independent
While doing this, also move base backup options into
a struct instead of increasing the number of parameters
to multiple functions for each new option.
2011-01-23 23:39:18 +01:00
Magnus Hagander f88a638199 Only show pg_stat_replication details to superusers 2011-01-23 17:28:19 +01:00
Magnus Hagander 048d148fe6 Add pg_basebackup tool for streaming base backups
This tool makes it possible to do the pg_start_backup/
copy files/pg_stop_backup step in a single command.

There are still some steps to be done before this is a
complete backup solution, such as the ability to stream
the required WAL logs, but it's still usable, and
could do with some buildfarm coverage.

In passing, make the checkpoint request optionally
fast instead of hardcoding it.

Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine
2011-01-23 12:21:23 +01:00
Magnus Hagander 48075095ac Set fallback_application_name in walreceiver
Makes replication slaves identify themselves in the new
pg_stat_replication view.
2011-01-17 11:42:53 +01:00
Heikki Linnakangas 34ef02b4d4 Before exiting walreceiver, fsync() all the WAL received.
Otherwise WAL recovery will replay the un-flushed WAL after walreceiver has
exited, which can lead to a non-recoverable standby if the system crashes hard
at that point.
2011-01-17 12:27:35 +02:00
Tom Lane 36750dcef5 Add .gitignore to silence git complaints about parser/scanner output files. 2011-01-15 16:05:28 -05:00
Magnus Hagander 3866ff6149 Enumerate available tablespaces after starting the backup
This closes a race condition where if a tablespace was created
after the enumeration happened but before the do_pg_start_backup()
was called, the backup would be incomplete. Now that it's done
while we are in backup mode, WAL replay will recreate it during
restore.

Noted by Heikki.
2011-01-15 19:31:16 +01:00
Heikki Linnakangas 8f5d65e916 Treat a WAL sender process that hasn't started streaming yet as a regular
backend, as far as the postmaster shutdown logic is concerned. That means,
fast shutdown will wait for WAL sender processes to exit before signaling
bgwriter to finish. This avoids race conditions between a base backup stopping
or starting, and bgwriter writing the shutdown checkpoint WAL record. We don't
want e.g the end-of-backup WAL record to be written after the shutdown
checkpoint.
2011-01-15 16:38:21 +02:00
Magnus Hagander fcd810c69a Use a lexer and grammar for parsing walsender commands
Makes it easier to parse mainly the BASE_BACKUP command
with it's options, and avoids having to manually deal
with quoted identifiers in the label (previously broken),
and makes it easier to add new commands and options in
the future.

In passing, refactor the case statement in the walsender
to put each command in it's own function.
2011-01-14 16:30:33 +01:00
Magnus Hagander 688423d004 Exit from base backups when shutdown is requested
When the exit waits until the whole backup completes, it may take
a very long time.

In passing, add back an error check in the main loop so we detect
clients that disconnect much earlier if the backup is large.
2011-01-14 12:36:45 +01:00
Magnus Hagander 9eacd427e8 Make sure walsender state is only read while holding the spinlock
Noted by Robert Haas.
2011-01-13 18:51:13 +01:00
Heikki Linnakangas a5a02a7445 Fix the logic in libpqrcv_receive() to determine if there's any incoming data
that can be read without blocking. It used to conclude that there isn't, even
though there was data in the socket receive buffer. That lead walreceiver to
flush the WAL after every received chunk, potentially causing big performance
issues.

Backpatch to 9.0, because the performance impact can be very significant.
2011-01-13 18:26:39 +02:00
Magnus Hagander 4c8e20f815 Track walsender state in shared memory and expose in pg_stat_replication 2011-01-11 21:25:28 +01:00