Commit Graph

154 Commits

Author SHA1 Message Date
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Simon Riggs 64233902d2 Send new protocol keepalive messages to standby servers.
Allows streaming replication users to calculate transfer latency
and apply delay via internal functions. No external functions yet.
2011-12-31 13:30:26 +00:00
Tom Lane 0d0ec527af Fix corner cases in readlink() usage.
Make sure all calls are protected by HAVE_READLINK, and get the buffer
overflow tests right.  Be a bit more paranoid about string length in
_tarWriteHeader(), too.
2011-12-07 13:34:13 -05:00
Magnus Hagander 1f422db663 Avoid using readlink() on platforms that don't support it
We don't have any such platforms now, but might in the future.

Also, detect cases when a tablespace symlink points to a path that
is longer than we can handle, and give a warning.
2011-12-07 12:09:05 +01:00
Robert Haas ed0b409d22 Move "hot" members of PGPROC into a separate PGXACT array.
This speeds up snapshot-taking and reduces ProcArrayLock contention.
Also, the PGPROC (and PGXACT) structures used by two-phase commit are
now allocated as part of the main array, rather than in a separate
array, and we keep ProcArray sorted in pointer order.  These changes
are intended to minimize the number of cache lines that must be pulled
in to take a snapshot, and testing shows a substantial increase in
performance on both read and write workloads at high concurrencies.

Pavan Deolasee, Heikki Linnakangas, Robert Haas
2011-11-25 08:02:10 -05:00
Simon Riggs 9aceb6ab3c Refactor xlog.c to create src/backend/postmaster/startup.c
Startup process now has its own dedicated file, just like all other
special/background processes. Reduces role and size of xlog.c
2011-11-02 14:25:01 +00:00
Peter Eisentraut 654e1f96b0 Clean up whitespace and indentation in parser and scanner files
These are not touched by pgindent, so clean them up a bit manually.
2011-11-01 21:51:30 +02:00
Simon Riggs f3ebaad45b Comment changes to show bgwriter no longer performs checkpoints. 2011-11-01 18:48:47 +00:00
Heikki Linnakangas b436c72f61 Fix overly-complicated usage of errcode_for_file_access().
No need to do  "errcode(errcode_for_file_access())", just
"errcode_for_file_access()" is enough. The extra errcode() call is useless
but harmless, so there's no user-visible bug here. Nevertheless, backpatch
to 9.1 where this code were added.
2011-10-22 20:19:50 +03:00
Tom Lane b4a0223d00 Simplify and improve ProcessStandbyHSFeedbackMessage logic.
There's no need to clamp the standby's xmin to be greater than
GetOldestXmin's result; if there were any such need this logic would be
hopelessly inadequate anyway, because it fails to account for
within-database versus cluster-wide values of GetOldestXmin.  So get rid of
that, and just rely on sanity-checking that the xmin is not wrapped around
relative to the nextXid counter.  Also, don't reset the walsender's xmin if
the current feedback xmin is indeed out of range; that just creates more
problems than we already had.  Lastly, don't bother to take the
ProcArrayLock; there's no need to do that to set xmin.

Also improve the comments about this in GetOldestXmin itself.
2011-10-20 19:43:31 -04:00
Magnus Hagander d1e25b78f9 Exclude postmaster.opts from base backups
Noted by Fujii Masao
2011-10-18 15:58:37 +02:00
Magnus Hagander 7aeff9f4a4 Ensure walsenders can be SIGTERMed while in non-walsender code
In oder to exit on SIGTERM when in non-walsender code,
such as do_pg_stop_backup(), we need to set the interrupt
variables that are used there, and not just the walsender
local ones.
2011-10-06 21:43:14 +02:00
Alvaro Herrera 86822df9b5 Split walsender.h in public/private headers
This dramatically cuts short the number of headers the public one brings
into whatever includes it.
2011-09-13 21:42:49 -03:00
Tom Lane a7801b62f2 Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h.
As per my recent proposal, this refactors things so that these typedefs and
macros are available in a header that can be included in frontend-ish code.
I also changed various headers that were undesirably including
utils/timestamp.h to include datatype/timestamp.h instead.  Unsurprisingly,
this showed that half the system was getting utils/timestamp.h by way of
xlog.h.

No actual code changes here, just header refactoring.
2011-09-09 13:23:41 -04:00
Alvaro Herrera 295e7dc929 Tweak string for uniformity 2011-09-08 16:39:58 -03:00
Simon Riggs dde70cc313 Emit cascaded standby message on shutdown only when appropriate.
Adds additional test for active walsenders and closes a race
condition for when we failover when a new walsender was connecting.

Reported and fixed bu Fujii Masao. Review by Heikki Linnakangas
2011-09-07 09:09:47 +01:00
Tom Lane 1609797c25 Clean up the #include mess a little.
walsender.h should depend on xlog.h, not vice versa.  (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.)  Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h.  Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.

This episode reinforces my feeling that pgrminclude should not be run
without adult supervision.  Inclusion changes in header files in particular
need to be reviewed with great care.  More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.
2011-09-04 01:13:16 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Bruce Momjian 4bd7333b14 Allow more include files to be compiled in their own by adding missing
include dependencies.

Modify pgcompinclude to skip a common fcinfo error.
2011-08-27 11:05:33 -04:00
Tom Lane cff75130b5 Remove wal_sender_delay GUC, because it's no longer useful.
The latch infrastructure is now capable of detecting all cases where the
walsender loop needs to wake up, so there is no reason to have an arbitrary
timeout.

Also, modify the walsender loop logic to follow the standard pattern of
ResetLatch, test for work to do, WaitLatch.  The previous coding was both
hard to follow and buggy: it would sometimes busy-loop despite having
nothing available to do, eg between receipt of a signal and the next time
it was caught up with new WAL, and it also had interesting choices like
deciding to update to WALSNDSTATE_STREAMING on the strength of information
known to be obsolete.
2011-08-10 18:50:28 -04:00
Tom Lane 4dab3d5ae1 Change the autovacuum launcher to use WaitLatch instead of a poll loop.
In pursuit of this (and with the expectation that WaitLatch will be needed
in more places), convert the latch field that was already added to PGPROC
for sync rep into a generic latch that is activated for all PGPROC-owning
processes, and change many of the standard backend signal handlers to set
that latch when a signal happens.  This will allow WaitLatch callers to be
wakened properly by these signals.

In passing, fix a whole bunch of signal handlers that had been hacked to do
things that might change errno, without adding the necessary save/restore
logic for errno.  Also make some minor fixes in unix_latch.c, and clean
up bizarre and unsafe scheme for disowning the process's latch.  Much of
this has to be back-patched into 9.1.

Peter Geoghegan, with additional work by Tom
2011-08-10 12:22:21 -04:00
Tom Lane 9f17ffd866 Measure WaitLatch's timeout parameter in milliseconds, not microseconds.
The original definition had the problem that timeouts exceeding about 2100
seconds couldn't be specified on 32-bit machines.  Milliseconds seem like
sufficient resolution, and finer grain than that would be fantasy anyway
on many platforms.

Back-patch to 9.1 so that this aspect of the latch API won't change between
9.1 and later releases.

Peter Geoghegan
2011-08-09 18:52:29 -04:00
Tom Lane 4e15a4db5e Documentation improvement and minor code cleanups for the latch facility.
Improve the documentation around weak-memory-ordering risks, and do a pass
of general editorialization on the comments in the latch code.  Make the
Windows latch code more like the Unix latch code where feasible; in
particular provide the same Assert checks in both implementations.
Fix poorly-placed WaitLatch call in syncrep.c.

This patch resolves, for the moment, concerns around weak-memory-ordering
bugs in latch-related code: we have documented the restrictions and checked
that existing calls meet them.  In 9.2 I hope that we will install suitable
memory barrier instructions in SetLatch/ResetLatch, so that their callers
don't need to be quite so careful.
2011-08-09 15:30:45 -04:00
Tom Lane 05e8396892 Clean up ill-advised attempt to invent a private set of Node tags.
Somebody thought it'd be cute to invent a set of Node tag numbers that were
defined independently of, and indeed conflicting with, the main tag-number
list.  While this accidentally failed to fail so far, it would certainly
lead to trouble as soon as anyone wanted to, say, apply copyObject to these
node types.  Clang was already complaining about the use of makeNode on
these tags, and I think quite rightly so.  Fix by pushing these node
definitions into the mainstream, including putting replnodes.h where it
belongs.
2011-08-06 14:53:49 -04:00
Simon Riggs 5286105800 Cascading replication feature for streaming log-based replication.
Standby servers can now have WALSender processes, which can work with
either WALReceiver or archive_commands to pass data. Fully updated
docs, including new conceptual terms of sending server, upstream and
downstream servers. WALSenders terminated when promote to master.

Fujii Masao, review, rework and doc rewrite by Simon Riggs
2011-07-19 03:40:03 +01:00
Heikki Linnakangas 89fd72cbf2 Introduce a pipe between postmaster and each backend, which can be used to
detect postmaster death. Postmaster keeps the write-end of the pipe open,
so when it dies, children get EOF in the read-end. That can conveniently
be waited for in select(), which allows eliminating some of the polling
loops that check for postmaster death. This patch doesn't yet change all
the loops to use the new mechanism, expect a follow-on patch to do that.

This changes the interface to WaitLatch, so that it takes as argument a
bitmask of events that it waits for. Possible events are latch set, timeout,
postmaster death, and socket becoming readable or writeable.

The pipe method behaves slightly differently from the kill() method
previously used in PostmasterIsAlive() in the case that postmaster has died,
but its parent has not yet read its exit code with waitpid(). The pipe
returns EOF as soon as the process dies, but kill() continues to return
true until waitpid() has been called (IOW while the process is a zombie).
Because of that, change PostmasterIsAlive() to use the pipe too, otherwise
WaitLatch() would return immediately with WL_POSTMASTER_DEATH, while
PostmasterIsAlive() would claim it's still alive. That could easily lead to
busy-waiting while postmaster is in zombie state.

Peter Geoghegan with further changes by me, reviewed by Fujii Masao and
Florian Pflug.
2011-07-08 18:44:07 +03:00
Peter Eisentraut f05c65090a Message style improvements 2011-07-08 07:37:04 +03:00
Tom Lane 9cc2c182fc Add missing -I switch for VPATH builds.
Per bug #6073 from Hartmut Raschick.
2011-06-22 13:20:03 -04:00
Peter Eisentraut e2a0cb1a80 Message style and spelling improvements 2011-06-22 00:45:34 +03:00
Bruce Momjian 6560407c7d Pgindent run before 9.1 beta2. 2011-06-09 14:32:50 -04:00
Bruce Momjian 5a71b64130 Lowercase status labels in pg_stat_replication view. 2011-04-29 22:20:43 -04:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane 2594cf0e8c Revise the API for GUC variable assign hooks.
The previous functions of assign hooks are now split between check hooks
and assign hooks, where the former can fail but the latter shouldn't.
Aside from being conceptually clearer, this approach exposes the
"canonicalized" form of the variable value to guc.c without having to do
an actual assignment.  And that lets us fix the problem recently noted by
Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus
log messages about "parameter "wal_buffers" cannot be changed without
restarting the server".  There may be some speed advantage too, because
this design lets hook functions avoid re-parsing variable values when
restoring a previous state after a rollback (they can store a pre-parsed
representation of the value instead).  This patch also resolves a
longstanding annoyance about custom error messages from variable assign
hooks: they should modify, not appear separately from, guc.c's own message
about "invalid parameter value".
2011-04-07 00:12:02 -04:00
Robert Haas 240067b3b0 Merge synchronous_replication setting into synchronous_commit.
This means one less thing to configure when setting up synchronous
replication, and also avoids some ambiguity around what the behavior
should be when the settings of these variables conflict.

Fujii Masao, with additional hacking by me.
2011-04-04 16:25:52 -04:00
Robert Haas 38b27792ea Avoid possible hang during smart shutdown.
If a smart shutdown occurs just as a child is starting up, and the
child subsequently becomes a walsender, there is a race condition:
the postmaster might count the exstant backends, determine that there
is one normal backend, and wait for it to die off.  Had the walsender
transition already occurred before the postmaster counted, it would
have proceeded with the shutdown.

To fix this, have each child that transforms into a walsender kick
the postmaster just after doing so, so that the state machine is
certain to advance.

Fujii Masao
2011-04-03 19:42:00 -04:00
Robert Haas 7fcc75dd26 Fix compiler warning. 2011-04-01 10:36:38 -04:00
Heikki Linnakangas 754baa21f7 Automatically terminate replication connections that are idle for more
than replication_timeout (a new GUC) milliseconds. The TCP timeout is often
too long, you want the master to notice a dead connection much sooner.
People complained about that in 9.0 too, but with synchronous replication
it's even more important to notice dead connections promptly.

Fujii Masao and Heikki Linnakangas
2011-03-30 10:20:37 +03:00
Heikki Linnakangas bc03c5937d Adjust error message, now that we expect other message types than connection
close at this point. Fix PQsetnonblocking() comment.

Fujii Masao
2011-03-30 08:54:28 +03:00
Simon Riggs 92f4786fa9 Additional test for each commit in sync rep path to plug minute
possibility of race condition that would effect performance only.
Requested by Robert Haas. Re-arrange related comments.
2011-03-26 10:09:37 +00:00
Robert Haas 30f6136f28 Make walreceiver send a reply after receiving data but before flushing it.
It originally worked this way, but was changed by commit
a8a8a3e096, since which time it's been impossible
for walreceiver to ever send a reply with write_location and flush_location
set to different values.
2011-03-25 11:28:16 -04:00
Robert Haas 727589995a Move synchronous_standbys_defined updates from WAL writer to BG writer.
This is advantageous because the BG writer is alive until much later in
the shutdown sequence than WAL writer; we want to make sure that it's
possible to shut off synchronous replication during a smart shutdown,
else it might not be possible to complete the shutdown at all.

Per very reasonable gripes from Fujii Masao and Simon Riggs.
2011-03-18 21:43:45 -04:00
Robert Haas 7a37900443 Make synchronous replication query cancel/die messages more consistent.
Per a gripe from Thom Brown about my previous commit in this area,
commit 9a56dc3389.
2011-03-18 10:20:22 -04:00
Robert Haas 02b1f84e7d Remove bogus comment. 2011-03-17 14:43:57 -04:00
Robert Haas 9a56dc3389 Fix various possible problems with synchronous replication.
1. Don't ignore query cancel interrupts.  Instead, if the user asks to
cancel the query after we've already committed it, but before it's on
the standby, just emit a warning and let the COMMIT finish.

2. Don't ignore die interrupts (pg_terminate_backend or fast shutdown).
Instead, emit a warning message and close the connection without
acknowledging the commit.  Other backends will still see the effect of
the commit, but there's no getting around that; it's too late to abort
at this point, and ignoring die interrupts altogether doesn't seem like
a good idea.

3. If synchronous_standby_names becomes empty, wake up all backends
waiting for synchronous replication to complete.  Without this, someone
attempting to shut synchronous replication off could easily wedge the
entire system instead.

4. Avoid depending on the assumption that if a walsender updates
MyProc->syncRepState, we'll see the change even if we read it without
holding the lock.  The window for this appears to be quite narrow (and
probably doesn't exist at all on machines with strong memory ordering)
but protecting against it is practically free, so do that.

5. Remove useless state SYNC_REP_MUST_DISCONNECT, which isn't needed and
doesn't actually do anything.

There's still some further work needed here to make the behavior of fast
shutdown plausible, but that looks complex, so I'm leaving it for a
separate commit.  Review by Fujii Masao.
2011-03-17 13:12:21 -04:00
Robert Haas 551c07d84a Make error handling of synchronous_standby_names consistent.
It's not a good idea to kill the postmaster just because someone muffs
this, and it's not consistent with what we do for other, similar GUCs.

Fujii Masao, with a bit more hacking by me
2011-03-10 16:24:52 -05:00
Robert Haas 2e019c8611 More synchronous replication typo fixes.
Fujii Masao
2011-03-10 15:56:18 -05:00
Robert Haas b8bb8dbf20 More synchronous replication tweaks.
SyncRepRequested() must check not only the value of the
synchronous_replication GUC but also whether max_wal_senders > 0.
Otherwise, we might end up waiting for sync rep even when there's no
possibility of a standby ever managing to connect.  There are some
existing cross-checks to prevent this, but they're not quite sufficient:
the user can start the server with max_wal_senders=0,
synchronous_standby_names='', and synchronous_replication=off and then
subsequent make synchronous_standby_names not empty using pg_ctl reload,
and then SET synchronous_standby=on, leading to an indefinite hang.

Along the way, rename the global variable for the synchronous_replication
GUC to match the name of the GUC itself, for clarity.

Report by Fujii Masao, though I didn't use his patch.
2011-03-10 15:43:37 -05:00
Robert Haas 6436098795 Minor sync rep corrections.
Fujii Masao, with a bit of additional wordsmithing by me.
2011-03-10 14:57:02 -05:00
Robert Haas fcb99609b6 Replication README updates.
Fujii Masao
2011-03-10 08:59:59 -05:00
Itagaki Takahiro 2d8de0a50b Cleanup copyright years and file names in the header comments of some files. 2011-03-10 15:05:33 +09:00