postgresql

Commit Graph

Author	SHA1	Message	Date
Heikki Linnakangas	7d3ed5ae78	Fix typo in comment. Fujii Masao	2012-10-15 13:01:31 +03:00
Heikki Linnakangas	6f60fdd701	Improve replication connection timeouts. Rename replication_timeout to wal_sender_timeout, and add a new setting called wal_receiver_timeout that does the same at the walreceiver side. There was previously no timeout in walreceiver, so if the network went down, for example, the walreceiver could take a long time to notice that the connection was lost. Now with the two settings, both sides of a replication connection will detect a broken connection similarly. It is no longer necessary to manually set wal_receiver_status_interval to a value smaller than the timeout. Both wal sender and receiver now automatically send a "ping" message if more than 1/2 of the configured timeout has elapsed, and it hasn't received any messages from the other end. Amit Kapila, heavily edited by me.	2012-10-11 17:48:08 +03:00
Peter Eisentraut	8521d13194	Refactor flex and bison make rules Numerous flex and bison make rules have appeared in the source tree over time, and they are all virtually identical, so we can replace them by pattern rules with some variables for customization. Users of pgxs will also be able to benefit from this.	2012-10-11 06:57:04 -04:00
Heikki Linnakangas	0b77aebabf	Remove stray newline in comment.	2012-10-09 13:06:48 +03:00
Peter Eisentraut	b6d4522296	Remove generation of repl_gram.h It was apparently never necessary.	2012-10-08 20:36:46 -04:00
Heikki Linnakangas	9c0e2b9182	Fix walsender handling of postmaster shutdown, to not go into endless loop. This bug was introduced by my patch to use the regular die/quickdie signal handlers in walsender processes. I tried to make walsender exit at next CHECK_FOR_INTERRUPTS() by setting ProcDiePending, but that's not enough, you need to set InterruptPending too. On second thoght, it was not a very good way to make walsender exit anyway, so use proc_exit(0) instead. Also, send a CommandComplete message before exiting; that's what we did before, and you get a nicer error message in the standby that way. Reported by Thom Brown.	2012-10-08 13:32:14 +03:00
Heikki Linnakangas	fd5942c18f	Use the regular main processing loop also in walsenders. The regular backend's main loop handles signal handling and error recovery better than the current WAL sender command loop does. For example, if the client hangs and a SIGTERM is received before starting streaming, the walsender will now terminate immediately, rather than hang until the connection times out.	2012-10-05 17:21:12 +03:00
Tom Lane	05b555d12b	Fix tar files emitted by pg_dump and pg_basebackup to be POSIX conformant. Both programs got the "magic" string wrong, causing standard-conforming tar implementations to believe the output was just legacy tar format without any POSIX extensions. This doesn't actually matter that much, especially since pg_dump failed to fill the POSIX fields anyway, but still there is little point in emitting tar format if we can't be compliant with the standard. In addition, pg_dump failed to write the EOF marker correctly (there should be 2 blocks of zeroes not just one), pg_basebackup put the numeric group ID in the wrong place, and both programs had a pretty brain-dead idea of how to compute the checksum. Fix all that and improve the comments a bit. pg_restore is modified to accept either the correct POSIX-compliant "magic" string or the previous value. This part of the change will need to be back-patched to avoid an unnecessary compatibility break when a previous version tries to read tar-format output from 9.3 pg_dump. Brian Weaver and Tom Lane	2012-09-28 15:19:15 -04:00
Heikki Linnakangas	c4c227477b	Fix bugs in cascading replication with recovery_target_timeline='latest' The cascading replication code assumed that the current RecoveryTargetTLI never changes, but that's not true with recovery_target_timeline='latest'. The obvious upshot of that is that RecoveryTargetTLI in shared memory needs to be protected by a lock. A less obvious consequence is that when a cascading standby is connected, and the standby switches to a new target timeline after scanning the archive, it will continue to stream WAL to the cascading standby, but from a wrong file, ie. the file of the previous timeline. For example, if the standby is currently streaming from the middle of file 000000010000000000000005, and the timeline changes, the standby will continue to stream from that file. However, the WAL on the new timeline is in file 000000020000000000000005, so the standby sends garbage from 000000010000000000000005 to the cascading standby, instead of the correct WAL from file 000000020000000000000005. This also fixes a related bug where a partial WAL segment is restored from the archive and streamed to a cascading standby. The code assumed that when a WAL segment is copied from the archive, it can immediately be fully streamed to a cascading standby. However, if the segment is only partially filled, ie. has the right size, but only N first bytes contain valid WAL, that's not safe. That can happen if a partial WAL segment is manually copied to the archive, or if a partial WAL segment is archived because a server is started up on a new timeline within that segment. The cascading standby will get confused if the WAL it received is not valid, and will get stuck until it's restarted. This patch fixes that problem by not allowing WAL restored from the archive to be streamed to a cascading standby until it's been replayed, and thus validated.	2012-09-04 19:33:21 -07:00
Heikki Linnakangas	fe811ae810	Fix typos in README.	2012-08-31 11:30:11 +03:00
Simon Riggs	da4efa13d8	Turn off WalSender keepalives by default, users can enable if desired	2012-08-09 17:07:03 +01:00
Simon Riggs	87d8bd7c9f	Ensure all replication message info is available and correct via WalRcv	2012-08-09 17:03:59 +01:00
Tom Lane	4a9c30a8a1	Fix management of pendingOpsTable in auxiliary processes. mdinit() was misusing IsBootstrapProcessingMode() to decide whether to create an fsync pending-operations table in the current process. This led to creating a table not only in the startup and checkpointer processes as intended, but also in the bgwriter process, not to mention other auxiliary processes such as walwriter and walreceiver. Creation of the table in the bgwriter is fatal, because it absorbs fsync requests that should have gone to the checkpointer; instead they just sit in bgwriter local memory and are never acted on. So writes performed by the bgwriter were not being fsync'd which could result in data loss after an OS crash. I think there is no live bug with respect to walwriter and walreceiver because those never perform any writes of shared buffers; but the potential is there for future breakage in those processes too. To fix, make AuxiliaryProcessMain() export the current process's AuxProcType as a global variable, and then make mdinit() test directly for the types of aux process that should have a pendingOpsTable. Having done that, we might as well also get rid of the random bool flags such as am_walreceiver that some of the aux processes had grown. (Note that we could not have fixed the bug by examining those variables in mdinit(), because it's called from BaseInit() which is run by AuxiliaryProcessMain() before entering any of the process-type-specific code.) Back-patch to 9.2, where the problem was introduced by the split-up of bgwriter and checkpointer processes. The bogus pendingOpsTable exists in walwriter and walreceiver processes in earlier branches, but absent any evidence that it causes actual problems there, I'll leave the older branches alone.	2012-07-18 15:28:10 -04:00
Alvaro Herrera	f34c68f096	Introduce timeout handling framework Management of timeouts was getting a little cumbersome; what we originally had was more than enough back when we were only concerned about deadlocks and query cancel; however, when we added timeouts for standby processes, the code got considerably messier. Since there are plans to add more complex timeouts, this seems a good time to introduce a central timeout handling module. External modules register their timeout handlers during process initialization, and later enable and disable them as they see fit using a simple API; timeout.c is in charge of keeping track of which timeouts are in effect at any time, installing a common SIGALRM signal handler, and calling setitimer() as appropriate to ensure timely firing of external handlers. timeout.c additionally supports pluggable modules to add their own timeouts, though this capability isn't exercised anywhere yet. Additionally, as of this commit, walsender processes are aware of timeouts; we had a preexisting bug there that made those ignore SIGALRM, thus being subject to unhandled deadlocks, particularly during the authentication phase. This has already been fixed in back branches in commit `0bf8eb2a`, which see for more details. Main author: Zoltán Böszörményi Some review and cleanup by Álvaro Herrera Extensive reworking by Tom Lane	2012-07-16 22:55:33 -04:00
Heikki Linnakangas	2686da9db2	Don't initialize TLI variable to -1, as TimeLineID is unsigned. This was causing a compiler warning with Solaris compiler. Use 0 instead. The variable is initialized just for the sake of tidyness and/or debugging, it's not used for anything before setting it to a real value. Per report and suggestion from Peter Eisentraut.	2012-07-14 21:04:53 +03:00
Magnus Hagander	0c4b468692	Always treat a standby returning an an invalid flush location as async This ensures that a standby such as pg_receivexlog will not be selected as sync standby - which would cause the master to block waiting for a location that could never happen. Fujii Masao	2012-07-04 15:14:42 +02:00
Robert Haas	f83b59997d	Make walsender more responsive. Per testing by Andres Freund, this improves replication performance and reduces replication latency and latency jitter. I was a bit concerned about moving more work into XLogInsert, but testing seems to show that it's not a problem in practice. Along the way, improve comments for WaitLatchOrSocket. Andres Freund. Review and stylistic cleanup by me.	2012-07-02 09:41:01 -04:00
Heikki Linnakangas	ec786c6c81	I neglected many comments in the log+seg -> 64-bit segno patch. Fix. Reported by Amit Kapila.	2012-06-27 17:53:53 +03:00
Peter Eisentraut	eeece9e609	Unify calling conventions for postgres/postmaster sub-main functions There was a wild mix of calling conventions: Some were declared to return void and didn't return, some returned an int exit code, some claimed to return an exit code, which the callers checked, but actually never returned, and so on. Now all of these functions are declared to return void and decorated with attribute noreturn and don't return. That's easiest, and most code already worked that way.	2012-06-25 21:30:12 +03:00
Robert Haas	c7d47abd04	Fix typo in DEBUG message, introduced by recent WAL refactoring. Fujii Masao	2012-06-25 14:00:35 -04:00
Heikki Linnakangas	0ab9d1c4b3	Replace XLogRecPtr struct with a 64-bit integer. This simplifies code that needs to do arithmetic on XLogRecPtrs. To avoid changing on-disk format of data pages, the LSN on data pages is still stored in the old format. That should keep pg_upgrade happy. However, we have XLogRecPtrs embedded in the control file, and in the structs that are sent over the replication protocol, so this changes breaks compatibility of pg_basebackup and server. I didn't do anything about this in this patch, per discussion on -hackers, the right thing to do would to be to change the replication protocol to be architecture-independent, so that you could use a newer version of pg_receivexlog, for example, against an older server version.	2012-06-24 19:19:45 +03:00
Heikki Linnakangas	dfda6ebaec	Don't waste the last segment of each 4GB logical log file. The comments claimed that wasting the last segment made it easier to do calculations with XLogRecPtrs, because you don't have problems representing last-byte-position-plus-1 that way. In my experience, however, it only made things more complicated, because the there was two ways to represent the boundary at the beginning of a logical log file: logid = n+1 and xrecoff = 0, or as xlogid = n and xrecoff = 4GB - XLOG_SEG_SIZE. Some functions were picky about which representation was used. Also, use a 64-bit segment number instead of the log/seg combination, to point to a certain WAL segment. We assume that all platforms have a working 64-bit integer type nowadays. This is an incompatible change in WAL format, so bumping WAL version number.	2012-06-24 18:35:29 +03:00
Magnus Hagander	3595a71e9c	Prevent non-streaming replication connections from being selected sync slave This prevents a pg_basebackup backup session that just does a base backup (no xlog involved at all) from becoming the synchronous slave and thus blocking all access while it runs. Also fixes the problem when a higher priority slave shows up it would become the sync standby before it has reached the STREAMING state, by making sure we can only switch to a walsender that's actually STREAMING. Fujii Masao	2012-06-11 15:17:38 +02:00
Bruce Momjian	927d61eeff	Run pgindent on 9.2 source tree in preparation for first 9.3 commit-fest.	2012-06-10 15:20:04 -04:00
Tom Lane	ed61127be4	Cast some printf arguments to avoid possibly-nonportable behavior. Per compiler warnings on buildfarm member black_firefly.	2012-03-23 20:18:04 -04:00
Simon Riggs	ba1868ba31	Minor bug fix and cleanup from self-review of sync rep queues patch.	2012-01-30 14:36:17 +00:00
Simon Riggs	73f617f13f	Various minor comments changes from bgwriter to checkpointer.	2012-01-30 14:34:25 +00:00
Simon Riggs	8366c7803e	Allow pg_basebackup from standby node with safety checking. Base backup follows recommended procedure, plus goes to great lengths to ensure that partial page writes are avoided. Jun Ishizuka and Fujii Masao, with minor modifications	2012-01-25 18:02:04 +00:00
Simon Riggs	443b4821f1	Add new replication mode synchronous_commit = 'write'. Replication occurs only to memory on standby, not to disk, so provides additional performance if user wishes to reduce durability level slightly. Adds concept of multiple independent sync rep queues. Fujii Masao and Simon Riggs	2012-01-24 20:22:37 +00:00
Robert Haas	4d0b11a0ca	Typo fix.	2012-01-13 08:21:45 -05:00
Simon Riggs	3f1787c253	Minor but necessary improvements to WAL keepalives Fujii Masao	2012-01-13 12:59:08 +00:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Simon Riggs	64233902d2	Send new protocol keepalive messages to standby servers. Allows streaming replication users to calculate transfer latency and apply delay via internal functions. No external functions yet.	2011-12-31 13:30:26 +00:00
Tom Lane	0d0ec527af	Fix corner cases in readlink() usage. Make sure all calls are protected by HAVE_READLINK, and get the buffer overflow tests right. Be a bit more paranoid about string length in _tarWriteHeader(), too.	2011-12-07 13:34:13 -05:00
Magnus Hagander	1f422db663	Avoid using readlink() on platforms that don't support it We don't have any such platforms now, but might in the future. Also, detect cases when a tablespace symlink points to a path that is longer than we can handle, and give a warning.	2011-12-07 12:09:05 +01:00
Robert Haas	ed0b409d22	Move "hot" members of PGPROC into a separate PGXACT array. This speeds up snapshot-taking and reduces ProcArrayLock contention. Also, the PGPROC (and PGXACT) structures used by two-phase commit are now allocated as part of the main array, rather than in a separate array, and we keep ProcArray sorted in pointer order. These changes are intended to minimize the number of cache lines that must be pulled in to take a snapshot, and testing shows a substantial increase in performance on both read and write workloads at high concurrencies. Pavan Deolasee, Heikki Linnakangas, Robert Haas	2011-11-25 08:02:10 -05:00
Simon Riggs	9aceb6ab3c	Refactor xlog.c to create src/backend/postmaster/startup.c Startup process now has its own dedicated file, just like all other special/background processes. Reduces role and size of xlog.c	2011-11-02 14:25:01 +00:00
Peter Eisentraut	654e1f96b0	Clean up whitespace and indentation in parser and scanner files These are not touched by pgindent, so clean them up a bit manually.	2011-11-01 21:51:30 +02:00
Simon Riggs	f3ebaad45b	Comment changes to show bgwriter no longer performs checkpoints.	2011-11-01 18:48:47 +00:00
Heikki Linnakangas	b436c72f61	Fix overly-complicated usage of errcode_for_file_access(). No need to do "errcode(errcode_for_file_access())", just "errcode_for_file_access()" is enough. The extra errcode() call is useless but harmless, so there's no user-visible bug here. Nevertheless, backpatch to 9.1 where this code were added.	2011-10-22 20:19:50 +03:00
Tom Lane	b4a0223d00	Simplify and improve ProcessStandbyHSFeedbackMessage logic. There's no need to clamp the standby's xmin to be greater than GetOldestXmin's result; if there were any such need this logic would be hopelessly inadequate anyway, because it fails to account for within-database versus cluster-wide values of GetOldestXmin. So get rid of that, and just rely on sanity-checking that the xmin is not wrapped around relative to the nextXid counter. Also, don't reset the walsender's xmin if the current feedback xmin is indeed out of range; that just creates more problems than we already had. Lastly, don't bother to take the ProcArrayLock; there's no need to do that to set xmin. Also improve the comments about this in GetOldestXmin itself.	2011-10-20 19:43:31 -04:00
Magnus Hagander	d1e25b78f9	Exclude postmaster.opts from base backups Noted by Fujii Masao	2011-10-18 15:58:37 +02:00
Magnus Hagander	7aeff9f4a4	Ensure walsenders can be SIGTERMed while in non-walsender code In oder to exit on SIGTERM when in non-walsender code, such as do_pg_stop_backup(), we need to set the interrupt variables that are used there, and not just the walsender local ones.	2011-10-06 21:43:14 +02:00
Alvaro Herrera	86822df9b5	Split walsender.h in public/private headers This dramatically cuts short the number of headers the public one brings into whatever includes it.	2011-09-13 21:42:49 -03:00
Tom Lane	a7801b62f2	Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h. As per my recent proposal, this refactors things so that these typedefs and macros are available in a header that can be included in frontend-ish code. I also changed various headers that were undesirably including utils/timestamp.h to include datatype/timestamp.h instead. Unsurprisingly, this showed that half the system was getting utils/timestamp.h by way of xlog.h. No actual code changes here, just header refactoring.	2011-09-09 13:23:41 -04:00
Alvaro Herrera	295e7dc929	Tweak string for uniformity	2011-09-08 16:39:58 -03:00
Simon Riggs	dde70cc313	Emit cascaded standby message on shutdown only when appropriate. Adds additional test for active walsenders and closes a race condition for when we failover when a new walsender was connecting. Reported and fixed bu Fujii Masao. Review by Heikki Linnakangas	2011-09-07 09:09:47 +01:00
Tom Lane	1609797c25	Clean up the #include mess a little. walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.	2011-09-04 01:13:16 -04:00
Bruce Momjian	6416a82a62	Remove unnecessary #include references, per pgrminclude script.	2011-09-01 10:04:27 -04:00
Bruce Momjian	4bd7333b14	Allow more include files to be compiled in their own by adding missing include dependencies. Modify pgcompinclude to skip a common fcinfo error.	2011-08-27 11:05:33 -04:00
Tom Lane	cff75130b5	Remove wal_sender_delay GUC, because it's no longer useful. The latch infrastructure is now capable of detecting all cases where the walsender loop needs to wake up, so there is no reason to have an arbitrary timeout. Also, modify the walsender loop logic to follow the standard pattern of ResetLatch, test for work to do, WaitLatch. The previous coding was both hard to follow and buggy: it would sometimes busy-loop despite having nothing available to do, eg between receipt of a signal and the next time it was caught up with new WAL, and it also had interesting choices like deciding to update to WALSNDSTATE_STREAMING on the strength of information known to be obsolete.	2011-08-10 18:50:28 -04:00
Tom Lane	4dab3d5ae1	Change the autovacuum launcher to use WaitLatch instead of a poll loop. In pursuit of this (and with the expectation that WaitLatch will be needed in more places), convert the latch field that was already added to PGPROC for sync rep into a generic latch that is activated for all PGPROC-owning processes, and change many of the standard backend signal handlers to set that latch when a signal happens. This will allow WaitLatch callers to be wakened properly by these signals. In passing, fix a whole bunch of signal handlers that had been hacked to do things that might change errno, without adding the necessary save/restore logic for errno. Also make some minor fixes in unix_latch.c, and clean up bizarre and unsafe scheme for disowning the process's latch. Much of this has to be back-patched into 9.1. Peter Geoghegan, with additional work by Tom	2011-08-10 12:22:21 -04:00
Tom Lane	9f17ffd866	Measure WaitLatch's timeout parameter in milliseconds, not microseconds. The original definition had the problem that timeouts exceeding about 2100 seconds couldn't be specified on 32-bit machines. Milliseconds seem like sufficient resolution, and finer grain than that would be fantasy anyway on many platforms. Back-patch to 9.1 so that this aspect of the latch API won't change between 9.1 and later releases. Peter Geoghegan	2011-08-09 18:52:29 -04:00
Tom Lane	4e15a4db5e	Documentation improvement and minor code cleanups for the latch facility. Improve the documentation around weak-memory-ordering risks, and do a pass of general editorialization on the comments in the latch code. Make the Windows latch code more like the Unix latch code where feasible; in particular provide the same Assert checks in both implementations. Fix poorly-placed WaitLatch call in syncrep.c. This patch resolves, for the moment, concerns around weak-memory-ordering bugs in latch-related code: we have documented the restrictions and checked that existing calls meet them. In 9.2 I hope that we will install suitable memory barrier instructions in SetLatch/ResetLatch, so that their callers don't need to be quite so careful.	2011-08-09 15:30:45 -04:00
Tom Lane	05e8396892	Clean up ill-advised attempt to invent a private set of Node tags. Somebody thought it'd be cute to invent a set of Node tag numbers that were defined independently of, and indeed conflicting with, the main tag-number list. While this accidentally failed to fail so far, it would certainly lead to trouble as soon as anyone wanted to, say, apply copyObject to these node types. Clang was already complaining about the use of makeNode on these tags, and I think quite rightly so. Fix by pushing these node definitions into the mainstream, including putting replnodes.h where it belongs.	2011-08-06 14:53:49 -04:00
Simon Riggs	5286105800	Cascading replication feature for streaming log-based replication. Standby servers can now have WALSender processes, which can work with either WALReceiver or archive_commands to pass data. Fully updated docs, including new conceptual terms of sending server, upstream and downstream servers. WALSenders terminated when promote to master. Fujii Masao, review, rework and doc rewrite by Simon Riggs	2011-07-19 03:40:03 +01:00
Heikki Linnakangas	89fd72cbf2	Introduce a pipe between postmaster and each backend, which can be used to detect postmaster death. Postmaster keeps the write-end of the pipe open, so when it dies, children get EOF in the read-end. That can conveniently be waited for in select(), which allows eliminating some of the polling loops that check for postmaster death. This patch doesn't yet change all the loops to use the new mechanism, expect a follow-on patch to do that. This changes the interface to WaitLatch, so that it takes as argument a bitmask of events that it waits for. Possible events are latch set, timeout, postmaster death, and socket becoming readable or writeable. The pipe method behaves slightly differently from the kill() method previously used in PostmasterIsAlive() in the case that postmaster has died, but its parent has not yet read its exit code with waitpid(). The pipe returns EOF as soon as the process dies, but kill() continues to return true until waitpid() has been called (IOW while the process is a zombie). Because of that, change PostmasterIsAlive() to use the pipe too, otherwise WaitLatch() would return immediately with WL_POSTMASTER_DEATH, while PostmasterIsAlive() would claim it's still alive. That could easily lead to busy-waiting while postmaster is in zombie state. Peter Geoghegan with further changes by me, reviewed by Fujii Masao and Florian Pflug.	2011-07-08 18:44:07 +03:00
Peter Eisentraut	f05c65090a	Message style improvements	2011-07-08 07:37:04 +03:00
Tom Lane	9cc2c182fc	Add missing -I switch for VPATH builds. Per bug #6073 from Hartmut Raschick.	2011-06-22 13:20:03 -04:00
Peter Eisentraut	e2a0cb1a80	Message style and spelling improvements	2011-06-22 00:45:34 +03:00
Bruce Momjian	6560407c7d	Pgindent run before 9.1 beta2.	2011-06-09 14:32:50 -04:00
Bruce Momjian	5a71b64130	Lowercase status labels in pg_stat_replication view.	2011-04-29 22:20:43 -04:00
Bruce Momjian	bf50caf105	pgindent run before PG 9.1 beta 1.	2011-04-10 11:42:00 -04:00
Tom Lane	2594cf0e8c	Revise the API for GUC variable assign hooks. The previous functions of assign hooks are now split between check hooks and assign hooks, where the former can fail but the latter shouldn't. Aside from being conceptually clearer, this approach exposes the "canonicalized" form of the variable value to guc.c without having to do an actual assignment. And that lets us fix the problem recently noted by Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus log messages about "parameter "wal_buffers" cannot be changed without restarting the server". There may be some speed advantage too, because this design lets hook functions avoid re-parsing variable values when restoring a previous state after a rollback (they can store a pre-parsed representation of the value instead). This patch also resolves a longstanding annoyance about custom error messages from variable assign hooks: they should modify, not appear separately from, guc.c's own message about "invalid parameter value".	2011-04-07 00:12:02 -04:00
Robert Haas	240067b3b0	Merge synchronous_replication setting into synchronous_commit. This means one less thing to configure when setting up synchronous replication, and also avoids some ambiguity around what the behavior should be when the settings of these variables conflict. Fujii Masao, with additional hacking by me.	2011-04-04 16:25:52 -04:00
Robert Haas	38b27792ea	Avoid possible hang during smart shutdown. If a smart shutdown occurs just as a child is starting up, and the child subsequently becomes a walsender, there is a race condition: the postmaster might count the exstant backends, determine that there is one normal backend, and wait for it to die off. Had the walsender transition already occurred before the postmaster counted, it would have proceeded with the shutdown. To fix this, have each child that transforms into a walsender kick the postmaster just after doing so, so that the state machine is certain to advance. Fujii Masao	2011-04-03 19:42:00 -04:00
Robert Haas	7fcc75dd26	Fix compiler warning.	2011-04-01 10:36:38 -04:00
Heikki Linnakangas	754baa21f7	Automatically terminate replication connections that are idle for more than replication_timeout (a new GUC) milliseconds. The TCP timeout is often too long, you want the master to notice a dead connection much sooner. People complained about that in 9.0 too, but with synchronous replication it's even more important to notice dead connections promptly. Fujii Masao and Heikki Linnakangas	2011-03-30 10:20:37 +03:00
Heikki Linnakangas	bc03c5937d	Adjust error message, now that we expect other message types than connection close at this point. Fix PQsetnonblocking() comment. Fujii Masao	2011-03-30 08:54:28 +03:00
Simon Riggs	92f4786fa9	Additional test for each commit in sync rep path to plug minute possibility of race condition that would effect performance only. Requested by Robert Haas. Re-arrange related comments.	2011-03-26 10:09:37 +00:00
Robert Haas	30f6136f28	Make walreceiver send a reply after receiving data but before flushing it. It originally worked this way, but was changed by commit `a8a8a3e096`, since which time it's been impossible for walreceiver to ever send a reply with write_location and flush_location set to different values.	2011-03-25 11:28:16 -04:00
Robert Haas	727589995a	Move synchronous_standbys_defined updates from WAL writer to BG writer. This is advantageous because the BG writer is alive until much later in the shutdown sequence than WAL writer; we want to make sure that it's possible to shut off synchronous replication during a smart shutdown, else it might not be possible to complete the shutdown at all. Per very reasonable gripes from Fujii Masao and Simon Riggs.	2011-03-18 21:43:45 -04:00
Robert Haas	7a37900443	Make synchronous replication query cancel/die messages more consistent. Per a gripe from Thom Brown about my previous commit in this area, commit `9a56dc3389`.	2011-03-18 10:20:22 -04:00
Robert Haas	02b1f84e7d	Remove bogus comment.	2011-03-17 14:43:57 -04:00
Robert Haas	9a56dc3389	Fix various possible problems with synchronous replication. 1. Don't ignore query cancel interrupts. Instead, if the user asks to cancel the query after we've already committed it, but before it's on the standby, just emit a warning and let the COMMIT finish. 2. Don't ignore die interrupts (pg_terminate_backend or fast shutdown). Instead, emit a warning message and close the connection without acknowledging the commit. Other backends will still see the effect of the commit, but there's no getting around that; it's too late to abort at this point, and ignoring die interrupts altogether doesn't seem like a good idea. 3. If synchronous_standby_names becomes empty, wake up all backends waiting for synchronous replication to complete. Without this, someone attempting to shut synchronous replication off could easily wedge the entire system instead. 4. Avoid depending on the assumption that if a walsender updates MyProc->syncRepState, we'll see the change even if we read it without holding the lock. The window for this appears to be quite narrow (and probably doesn't exist at all on machines with strong memory ordering) but protecting against it is practically free, so do that. 5. Remove useless state SYNC_REP_MUST_DISCONNECT, which isn't needed and doesn't actually do anything. There's still some further work needed here to make the behavior of fast shutdown plausible, but that looks complex, so I'm leaving it for a separate commit. Review by Fujii Masao.	2011-03-17 13:12:21 -04:00
Robert Haas	551c07d84a	Make error handling of synchronous_standby_names consistent. It's not a good idea to kill the postmaster just because someone muffs this, and it's not consistent with what we do for other, similar GUCs. Fujii Masao, with a bit more hacking by me	2011-03-10 16:24:52 -05:00
Robert Haas	2e019c8611	More synchronous replication typo fixes. Fujii Masao	2011-03-10 15:56:18 -05:00
Robert Haas	b8bb8dbf20	More synchronous replication tweaks. SyncRepRequested() must check not only the value of the synchronous_replication GUC but also whether max_wal_senders > 0. Otherwise, we might end up waiting for sync rep even when there's no possibility of a standby ever managing to connect. There are some existing cross-checks to prevent this, but they're not quite sufficient: the user can start the server with max_wal_senders=0, synchronous_standby_names='', and synchronous_replication=off and then subsequent make synchronous_standby_names not empty using pg_ctl reload, and then SET synchronous_standby=on, leading to an indefinite hang. Along the way, rename the global variable for the synchronous_replication GUC to match the name of the GUC itself, for clarity. Report by Fujii Masao, though I didn't use his patch.	2011-03-10 15:43:37 -05:00
Robert Haas	6436098795	Minor sync rep corrections. Fujii Masao, with a bit of additional wordsmithing by me.	2011-03-10 14:57:02 -05:00
Robert Haas	fcb99609b6	Replication README updates. Fujii Masao	2011-03-10 08:59:59 -05:00
Itagaki Takahiro	2d8de0a50b	Cleanup copyright years and file names in the header comments of some files.	2011-03-10 15:05:33 +09:00
Bruce Momjian	76fdee31c4	Mention gcc version in C comment.	2011-03-09 23:41:13 -05:00
Heikki Linnakangas	baabf05196	Silence compiler warning about undefined function when compiling without assertions.	2011-03-07 09:56:53 +02:00
Simon Riggs	cae4974e3d	Dynamic array required within pg_stat_replication.	2011-03-07 00:26:30 +00:00
Simon Riggs	966fb05b58	Add new files for syncrep missed in previous commit	2011-03-06 23:39:14 +00:00
Simon Riggs	a8a8a3e096	Efficient transaction-controlled synchronous replication. If a standby is broadcasting reply messages and we have named one or more standbys in synchronous_standby_names then allow users who set synchronous_replication to wait for commit, which then provides strict data integrity guarantees. Design avoids sending and receiving transaction state information so minimises bookkeeping overheads. We synchronize with the highest priority standby that is connected and ready to synchronize. Other standbys can be defined to takeover in case of standby failure. This version has very strict behaviour; more relaxed options may be added at a later date. Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime Casanova, Heikki Linnakangas and Robert Haas, plus the assistance of many other design reviewers.	2011-03-06 22:49:16 +00:00
Heikki Linnakangas	6eba5a7c57	Change pg_last_xlog_receive_location() not to move backwards. That makes it a lot more useful for determining which standby is most up-to-date, for example. There was long discussions on whether overwriting existing existing WAL makes sense to begin with, and whether we should do some more extensive variable renaming, but this change nevertheless seems quite uncontroversial. Fujii Masao, reviewed by Jeff Janes, Robert Haas, Stephen Frost.	2011-03-01 20:54:35 +02:00
Robert Haas	59d6a75942	Avoid excessive Hot Standby feedback messages. Without this patch, when wal_receiver_status_interval=0, indicating that no status messages should be sent, Hot Standby feedback messages are instead sent extremely frequently. Fujii Masao, with documentation changes by me.	2011-03-01 11:34:25 -05:00
Heikki Linnakangas	be6668d6ef	Increase the default for wal_sender_delay from 200ms to 1s. Now that WAL sender is immediately woken up by transaction commit, there's no need to wake up so aggressively.	2011-02-26 23:38:25 +02:00
Simon Riggs	bc76695c4c	Make a hard state change from catchup to streaming mode. More useful state change for monitoring purposes, plus a required change for synchronous replication patch.	2011-02-18 15:07:26 +00:00
Simon Riggs	06828c5feb	Separate messages for standby replies and hot standby feedback. Allow messages to be sent at different times, and greatly reduce the frequency of hot standby feedback. Refactor to allow additional message types.	2011-02-18 11:31:49 +00:00
Tom Lane	93016983d1	Fix blatantly uninitialized variable in recent commit. Doesn't anybody around here pay attention to compiler warnings?	2011-02-16 19:53:20 -05:00
Simon Riggs	bca8b7f16a	Hot Standby feedback for avoidance of cleanup conflicts on standby. Standby optionally sends back information about oldestXmin of queries which is then checked and applied to the WALSender's proc->xmin. GetOldestXmin() is modified slightly to agree with GetSnapshotData(), so that all backends on primary include WALSender within their snapshots. Note this does nothing to change the snapshot xmin on either master or standby. Feedback piggybacks on the standby reply message. vacuum_defer_cleanup_age is no longer used on standby, though parameter still exists on primary, since some use cases still exist. Simon Riggs, review comments from Fujii Masao, Heikki Linnakangas, Robert Haas	2011-02-16 19:29:37 +00:00
Robert Haas	3a087369c0	WAL receiver shouldn't try to send a reply when dying. Per report from, and discussion with, Fujii Masao.	2011-02-16 10:27:35 -05:00
Robert Haas	883a9659fa	Assorted corrections to the patch to add WAL receiver replies. Per reports from Fujii Masao.	2011-02-15 12:05:00 -05:00
Robert Haas	d309acf201	Typo fixes. receivedUpto should be capitalized consistently.	2011-02-11 11:55:12 -05:00
Heikki Linnakangas	b186523fd9	Send status updates back from standby server to master, indicating how far the standby has written, flushed, and applied the WAL. At the moment, this is for informational purposes only, the values are only shown in pg_stat_replication system view, but in the future they will also be needed for synchronous replication. Extracted from Simon riggs' synchronous replication patch by Robert Haas, with some tweaking by me.	2011-02-10 21:04:02 +02:00
Magnus Hagander	3144c33a2f	Implement NOWAIT option for BASE_BACKUP command Specifying this option makes the server not wait for the xlog to be archived, or emit a warning that it can't, instead leaving the responsibility with the client. This is useful when the log is being streamed using the streaming protocol in parallel with the backup, without having log archiving enabled.	2011-02-09 10:59:53 +01:00
Magnus Hagander	cedd6515ba	IDENTIFY_SYSTEM now returns 3 fields, not 2	2011-02-06 07:46:14 +01:00
Bruce Momjian	51dbc87dff	Add C comment about why older compilers complain about basebackup.c's longjump.	2011-02-04 23:28:14 -05:00
Magnus Hagander	76129e7f14	Include more status information in walsender results Add the current xlog insert location to the response of IDENTIFY_SYSTEM, and adds result sets containing start and stop location of backups to BASE_BACKUP responses.	2011-02-03 13:46:23 +01:00
Heikki Linnakangas	997b48ed96	Support multiple concurrent pg_basebackup backups. With this patch, pg_basebackup doesn't write a backup_label file in the data directory, so it doesn't interfere with a pg_start/stop_backup() based backup anymore. backup_label is still included in the backup, but it is injected directly into the tar stream. Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.	2011-01-31 18:25:39 +02:00
Magnus Hagander	507069de6d	Add option to include WAL in base backup When included, this makes the base backup a complete working "clone" of the initial database, ready to have a postmaster started against it without the need to set up any log archiving or similar. Magnus Hagander, reviewed by Fujii Masao and Heikki Linnakangas	2011-01-30 21:30:09 +01:00
Magnus Hagander	966d4f52c2	Typo fix for MemSet size. Fujii Masao	2011-01-25 10:50:04 +01:00
Magnus Hagander	e5487f65fd	Make walsender options order-independent While doing this, also move base backup options into a struct instead of increasing the number of parameters to multiple functions for each new option.	2011-01-23 23:39:18 +01:00
Magnus Hagander	f88a638199	Only show pg_stat_replication details to superusers	2011-01-23 17:28:19 +01:00
Magnus Hagander	048d148fe6	Add pg_basebackup tool for streaming base backups This tool makes it possible to do the pg_start_backup/ copy files/pg_stop_backup step in a single command. There are still some steps to be done before this is a complete backup solution, such as the ability to stream the required WAL logs, but it's still usable, and could do with some buildfarm coverage. In passing, make the checkpoint request optionally fast instead of hardcoding it. Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine	2011-01-23 12:21:23 +01:00
Magnus Hagander	48075095ac	Set fallback_application_name in walreceiver Makes replication slaves identify themselves in the new pg_stat_replication view.	2011-01-17 11:42:53 +01:00
Heikki Linnakangas	34ef02b4d4	Before exiting walreceiver, fsync() all the WAL received. Otherwise WAL recovery will replay the un-flushed WAL after walreceiver has exited, which can lead to a non-recoverable standby if the system crashes hard at that point.	2011-01-17 12:27:35 +02:00
Tom Lane	36750dcef5	Add .gitignore to silence git complaints about parser/scanner output files.	2011-01-15 16:05:28 -05:00
Magnus Hagander	3866ff6149	Enumerate available tablespaces after starting the backup This closes a race condition where if a tablespace was created after the enumeration happened but before the do_pg_start_backup() was called, the backup would be incomplete. Now that it's done while we are in backup mode, WAL replay will recreate it during restore. Noted by Heikki.	2011-01-15 19:31:16 +01:00
Heikki Linnakangas	8f5d65e916	Treat a WAL sender process that hasn't started streaming yet as a regular backend, as far as the postmaster shutdown logic is concerned. That means, fast shutdown will wait for WAL sender processes to exit before signaling bgwriter to finish. This avoids race conditions between a base backup stopping or starting, and bgwriter writing the shutdown checkpoint WAL record. We don't want e.g the end-of-backup WAL record to be written after the shutdown checkpoint.	2011-01-15 16:38:21 +02:00
Magnus Hagander	fcd810c69a	Use a lexer and grammar for parsing walsender commands Makes it easier to parse mainly the BASE_BACKUP command with it's options, and avoids having to manually deal with quoted identifiers in the label (previously broken), and makes it easier to add new commands and options in the future. In passing, refactor the case statement in the walsender to put each command in it's own function.	2011-01-14 16:30:33 +01:00
Magnus Hagander	688423d004	Exit from base backups when shutdown is requested When the exit waits until the whole backup completes, it may take a very long time. In passing, add back an error check in the main loop so we detect clients that disconnect much earlier if the backup is large.	2011-01-14 12:36:45 +01:00
Magnus Hagander	9eacd427e8	Make sure walsender state is only read while holding the spinlock Noted by Robert Haas.	2011-01-13 18:51:13 +01:00
Heikki Linnakangas	a5a02a7445	Fix the logic in libpqrcv_receive() to determine if there's any incoming data that can be read without blocking. It used to conclude that there isn't, even though there was data in the socket receive buffer. That lead walreceiver to flush the WAL after every received chunk, potentially causing big performance issues. Backpatch to 9.0, because the performance impact can be very significant.	2011-01-13 18:26:39 +02:00
Magnus Hagander	4c8e20f815	Track walsender state in shared memory and expose in pg_stat_replication	2011-01-11 21:25:28 +01:00
Magnus Hagander	47a5f3e9da	Add missing function prototype, for consistency	2011-01-11 21:12:12 +01:00
Tom Lane	e6dce4e439	Adjust basebackup.c to suppress compiler warnings. Some versions of gcc complain about "variable `tablespaces' might be clobbered by `longjmp' or `vfork'" with the original coding. Fix by moving the PG_TRY block into a separate subroutine.	2011-01-11 13:41:13 -05:00
Magnus Hagander	b7ebda9d8c	Reset walsender ps title in the main loop When in streaming mode we can never get out, so it will never be required, but after a base backup (or other operations) we can get back to the loop, so the title needs to be cleared.	2011-01-11 10:04:54 +01:00
Magnus Hagander	2e36343f82	Set process title to indicate base backup is running	2011-01-10 21:53:18 +01:00
Heikki Linnakangas	dc1305ce5f	Leave temporary files out of streaming base backups.	2011-01-10 19:42:05 +02:00
Magnus Hagander	0eb59c4591	Backend support for streaming base backups Add BASE_BACKUP command to walsender, allowing it to stream a base backup to the client (in tar format). The syntax is still far from ideal, that will be fixed in the switch to use a proper grammar for walsender. No client included yet, will come as a separate commit. Magnus Hagander and Heikki Linnakangas	2011-01-10 14:04:19 +01:00
Itagaki Takahiro	a755ea33ae	New system view pg_stat_replication displays activity of wal sender processes. Itagaki Takahiro and Simon Riggs.	2011-01-07 20:35:38 +09:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Robert Haas	d3d414696f	Allow bidirectional copy messages in streaming replication mode. Fujii Masao. Review by Alvaro Herrera, Tom Lane, and myself.	2010-12-11 09:27:37 -05:00
Peter Eisentraut	19e231bbda	Improved parallel make support Replace for loops in makefiles with proper dependencies. Parallel make can now span across directories. Also, make -k and make -q work properly. GNU make 3.80 or newer is now required.	2010-11-12 22:15:16 +02:00
Heikki Linnakangas	8c843fff2d	Bootstrap WAL to begin at segment logid=0 logseg=1 (000000010000000000000001) rather than 0/0, so that we can safely use 0/0 as an invalid value. This is a more future-proof fix for the corner-case bug in streaming replication that was fixed yesterday. We had a similar corner-case bug with log/seg 0/0 back in February as well. Avoiding 0/0 as a valid value should prevent bugs like that in the future. Per Tom Lane's idea. Back-patch to 9.0. Since this only affects bootstrapping, it makes no difference to existing installations. We don't need to worry about the bug in existing installations, because if you've managed to get past the initial base backup already, you won't hit the bug in the future either.	2010-11-02 11:39:48 +02:00
Heikki Linnakangas	931b6db39b	Fix corner-case bug in tracking of latest removed WAL segment during streaming replication. We used log/seg 0/0 to indicate that no WAL segments have been removed since startup, but 0/0 is a valid value for the very first WAL segment after initdb. To make that disambiguous, store (latest removed WAL segment + 1) in the global variable. Per report from Matt Chesler, also reproduced by Greg Smith.	2010-11-01 10:05:15 +02:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Heikki Linnakangas	723d0184e2	Use a latch to make startup process wake up and replay immediately when new WAL arrives via streaming replication. This reduces the latency, and also allows us to use a longer polling interval, which is good for energy efficiency. We still need to poll to check for the appearance of a trigger file, but the interval is now 5 seconds (instead of 100ms), like when waiting for a new WAL segment to appear in WAL archive.	2010-09-15 10:35:05 +00:00
Heikki Linnakangas	1eab7a560d	Don't call OwnLatch while holding a spinlock. OwnLatch can elog() under some "can't happen" scenarios, and spinlocks should only be held for a few instructions anyway. As pointed out by Fujii Masao.	2010-09-15 06:51:19 +00:00
Heikki Linnakangas	3522217b63	Oops, the timeout argument to WaitLatchOrSocket is in microseconds, not milliseconds.	2010-09-14 13:35:14 +00:00
Heikki Linnakangas	2746e5f21d	Introduce latches. A latch is a boolean variable, with the capability to wait until it is set. Latches can be used to reliably wait until a signal arrives, which is hard otherwise because signals don't interrupt select() on some platforms, and even when they do, there's race conditions. On Unix, latches use the so called self-pipe trick under the covers to implement the sleep until the latch is set, without race conditions. On Windows, Windows events are used. Use the new latch abstraction to sleep in walsender, so that as soon as a transaction finishes, walsender is woken up to immediately send the WAL to the standby. This reduces the latency between master and standby, which is good. Preliminary work by Fujii Masao. The latch implementation is by me, with helpful comments from many people.	2010-09-11 15:48:04 +00:00
Robert Haas	bca03b12c1	Add missing function prototype. Fujii Masao	2010-07-22 13:03:11 +00:00
Bruce Momjian	239d769e7e	pgindent run for 9.0, second run	2010-07-06 19:19:02 +00:00
Tom Lane	e76c1a0f4d	Replace max_standby_delay with two parameters, max_standby_archive_delay and max_standby_streaming_delay, and revise the implementation to avoid assuming that timestamps found in WAL records can meaningfully be compared to clock time on the standby server. Instead, the delay limits are compared to the elapsed time since we last obtained a new WAL segment from archive or since we were last "caught up" to WAL data arriving via streaming replication. This avoids problems with clock skew between primary and standby, as well as other corner cases that the original coding would misbehave in, such as the primary server having significant idle time between transactions. Per my complaint some time ago and considerable ensuing discussion. Do some desultory editing on the hot standby documentation, too.	2010-07-03 20:43:58 +00:00
Tom Lane	07e8b6aabc	Don't allow walsender to send WAL data until it's been safely fsync'd on the master. Otherwise a subsequent crash could cause the master to lose WAL that has already been applied on the slave, resulting in the slave being out of sync and soon corrupt. Per recent discussion and an example from Robert Haas. Fujii Masao	2010-06-17 16:41:25 +00:00
Heikki Linnakangas	e751b71b56	Use "replication" as the database name when constructing a connection string for a streaming replication connection. It's ignored by the server, but allows libpq to pick up the password from .pgpass where "replication" is specified as the database name. Patch by Fujii Masao per Tom's suggestion, with some wording changes by me.	2010-06-11 10:13:09 +00:00
Heikki Linnakangas	71815306e9	In standby mode, respect checkpoint_segments in addition to checkpoint_timeout to trigger restartpoints. We used to deliberately only do time-based restartpoints, because if checkpoint_segments is small we would spend time doing restartpoints more often than really necessary. But now that restartpoints are done in bgwriter, they're not as disruptive as they used to be. Secondly, because streaming replication stores the streamed WAL files in pg_xlog, we want to clean it up more often to avoid running out of disk space when checkpoint_timeout is large and checkpoint_segments small. Patch by Fujii Masao, with some minor changes by me.	2010-06-09 15:04:07 +00:00
Tatsuo Ishii	016212e0eb	Fix typo in the header comment. Per request from Masao Fujii.	2010-06-09 00:54:39 +00:00
Tom Lane	36614006e1	Avoid useless snprintf() call when update_process_title is turned off. Fujii Masao	2010-06-07 15:49:30 +00:00
Tom Lane	3ceb44fa0a	Adjust misleading comment in walsender.c. We try to send all WAL data that's been written out from shared memory, but the previous phrasing might be read to say that we send only what's been fsync'd.	2010-06-03 23:00:14 +00:00
Tom Lane	0cc59cc1f3	Add current WAL end (as seen by walsender, ie, GetWriteRecPtr() result) and current server clock time to SR data messages. These are not currently used on the slave side but seem likely to be useful in future, and it'd be better not to change the SR protocol after release. Per discussion. Also do some minor code review and cleanup on walsender.c, and improve the protocol documentation.	2010-06-03 22:17:32 +00:00
Peter Eisentraut	cb6038c168	Fix some inconsistent quoting of wal_level values in messages When referring to postgresql.conf syntax, then it's without quotes (wal_level=archive); in narrative it's with double quotes. But never single quotes.	2010-06-03 21:02:12 +00:00
Heikki Linnakangas	e0b581acd2	Send all outstanding WAL before exiting when smart shutdown is requested. This was broken by my previous patch to send WAL in smaller batches. Patch by Fujii Masao.	2010-05-31 10:44:37 +00:00
Heikki Linnakangas	fbcdff39bd	Thinko in previous commit: ensure that MAX_SEND_SIZE is always greater than XLOG_BLCKSZ, by defining it as 16 * XLOG_BLCKSZ rather than directly as 128k bytes.	2010-05-26 22:34:49 +00:00
Heikki Linnakangas	ea5516081d	In walsender, don't sleep if there's outstanding WAL waiting to be sent, otherwise we effectively rate-limit the streaming as pointed out by Simon Riggs. Also, send the WAL in smaller chunks, to respond to signals more promptly.	2010-05-26 22:21:33 +00:00
Tom Lane	2ea56cbda6	Fix missing static declaration for XLogRead().	2010-05-09 18:11:55 +00:00
Tom Lane	77acab75df	Modify ShmemInitStruct and ShmemInitHash to throw errors internally, rather than returning NULL for some-but-not-all failures as they used to. Remove now-redundant tests for NULL from call sites. We had to do something about this because many call sites were failing to check for NULL; and changing it like this seems a lot more useful and mistake-proof than adding checks to the call sites without them.	2010-04-28 16:54:16 +00:00
Heikki Linnakangas	9b8a73326e	Introduce wal_level GUC to explicitly control if information needed for archival or hot standby should be WAL-logged, instead of deducing that from other options like archive_mode. This replaces recovery_connections GUC in the primary, where it now has no effect, but it's still used in the standby to enable/disable hot standby. Remove the WAL-logging of "unlogged operations", like creating an index without WAL-logging and fsyncing it at the end. Instead, we keep a copy of the wal_mode setting and the settings that affect how much shared memory a hot standby server needs to track master transactions (max_connections, max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings change, at server restart, write a WAL record noting the new settings and update pg_control. This allows us to notice the change in those settings in the standby at the right moment, they used to be included in checkpoint records, but that meant that a changed value was not reflected in the standby until the first checkpoint after the change. Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to the sequence it used to follow, before hot standby and subsequent patches changed it to 0x9003.	2010-04-28 16:10:43 +00:00
Tom Lane	a2c3931a24	Fix pg_hba.conf matching so that replication connections only match records with database = replication. The previous coding would allow them to match ordinary records too, but that seems like a recipe for security breaches. Improve the messages associated with no-such-pg_hba.conf entry to report replication connections as such, since that's now a critical aspect of whether the connection matches. Make some cursory improvements in the related documentation, too.	2010-04-21 03:32:53 +00:00
Tom Lane	a3c6d10575	Move the check for whether walreceiver has authenticated as a superuser from walsender.c, where it didn't really belong, to postinit.c where it does belong (and is essentially free, too).	2010-04-21 00:51:57 +00:00
Tom Lane	7de2dfccc5	Fix code that doesn't work on machines with strict alignment requirements: must use memcpy here rather than struct assignment. In passing, rearrange some randomly-ordered declarations to be a tad less random.	2010-04-20 22:55:03 +00:00
Magnus Hagander	03a571a4cf	Add wrapper function libpqrcv_PQexec() in the walreceiver that uses async libpq to send queries, making the waiting for responses interruptible on platforms where PQexec() can't normally be interrupted by signals, such as win32. Fujii Masao and Magnus Hagander	2010-04-19 14:10:45 +00:00
Magnus Hagander	a95d15ff5d	Only try to do a graceful disconnect if we've successfully loaded the shared library with the disconnect function in it. Fixes segmentation fault reported by Jeff Davis. Fujii Masao	2010-04-13 08:16:09 +00:00
Heikki Linnakangas	258174b462	Need to use the start pointer of a block we read from WAL segment in the calculation, not the end pointer, as pointed out by Fujii Masao.	2010-04-12 10:18:50 +00:00
Heikki Linnakangas	e57cd7f0a1	Change the logic to decide when to delete old WAL segments, so that it doesn't take into account how far the WAL senders are. This way a hung WAL sender doesn't prevent old WAL segments from being recycled/removed in the primary, ultimately causing the disk to fill up. Instead add standby_keep_segments setting to control how many old WAL segments are kept in the primary. This also makes it more reliable to use streaming replication without WAL archiving, assuming that you set standby_keep_segments high enough.	2010-04-12 09:52:29 +00:00
Robert Haas	54943734f8	Refer to max_wal_senders in a more consistent fashion. The error message now makes explicit reference to the GUC that must be changed to fix the problem, using wording suggested by Tom Lane. Along the way, rename the GUC from MaxWalSenders to max_wal_senders for consistency and grep-ability.	2010-04-01 00:43:29 +00:00
Heikki Linnakangas	59292f28ca	Flush CopyOutResponse when starting streaming in walsender, so that it's not delayed until the first WAL record is sent. Fujii Masao	2010-03-26 12:23:34 +00:00
Simon Riggs	92fc0db99f	Additional thoughts on WALSender cpu reduction. Use long type and alter a comment to reduce confusion.	2010-03-24 21:41:57 +00:00
Simon Riggs	08882ce74c	Reduce CPU utilisation of WALSender process. Process was using 10% CPU doing nothing, caused by naptime specified in milliseconds yet units of pg_usleep() parameter is microseconds. Correctly specifying units reduces call frequency by 1000. Reduction in CPU consumption verified.	2010-03-24 20:11:12 +00:00
Heikki Linnakangas	de3483acfa	Update description of walrcv_receive() function to match reality.	2010-03-24 06:25:39 +00:00
Peter Eisentraut	c248d17120	Message tuning	2010-03-21 00:17:59 +00:00
Simon Riggs	6a771d1d36	Add connection messages for streaming replication. log_connections was broken for a replication connection and no messages were displayed on either standby or primary, at any debug level. Connection messages needed to diagnose session drop/reconnect events. Use LOG mode for now, discuss lowering in later releases.	2010-03-19 19:19:38 +00:00
Simon Riggs	75867c528d	Minor tweaks on libpqrcv_connect(): ensure conninfo_repl[] is correctly sized and expand comment to explain otherwise undocumented use of replication connection parameter.	2010-03-19 17:51:42 +00:00
Heikki Linnakangas	a383c55a1d	Throw a nicer error message if a standby server attempts to connect while the master is still in recovery. We don't support cascading slaves yet. Patch by Fujii Masao, with slightly changed wording.	2010-03-16 09:09:55 +00:00
Bruce Momjian	65e806cba1	pgindent run for 9.0	2010-02-26 02:01:40 +00:00
Heikki Linnakangas	cd2b7d3c4d	Fix streaming replication starting at the very first WAL segment. Per complaint from Greg Stark.	2010-02-25 07:31:40 +00:00
Heikki Linnakangas	ad458cfe81	Don't use O_DIRECT when writing WAL files if archiving or streaming is enabled. Bypassing the kernel cache is counter-productive in that case, because the archiver/walsender process will read from the WAL file soon after it's written, and if it's not cached the read will cause a physical read, eating I/O bandwidth available on the WAL drive. Also, walreceiver process does unaligned writes, so disable O_DIRECT in walreceiver process for that reason too.	2010-02-19 10:51:04 +00:00
Heikki Linnakangas	3e87ba6ef7	Fix pq_getbyte_if_available() function. It was confused on what it returns if no data is immediately available. Patch by me with numerous fixes from Fujii Masao and Magnus Hagander.	2010-02-18 11:13:46 +00:00
Tom Lane	50a90fac40	Stamp HEAD as 9.0devel, and update various places that were referring to 8.5 (hope I got 'em all). Per discussion, this release will be 9.0 not 8.5.	2010-02-17 04:19:41 +00:00
Heikki Linnakangas	808969d0e7	Add a message type header to the CopyData messages sent from primary to standby in streaming replication. While we only have one message type at the moment, adding a message type header makes this easier to extend.	2010-02-03 09:47:19 +00:00
Heikki Linnakangas	83cb7da7dc	Fix bug in wasender's xlogid boundary handling, reported by Erik Rijkers. LogwrtRqst.Write can be set to non-existent FF log segment, we mustn't try to send that in XLogSend(). Also fix similar bug in ReadRecord(), which I just introduced in the ReadRecord() refactoring patch.	2010-01-27 16:41:09 +00:00
Heikki Linnakangas	1bb2558046	Make standby server continuously retry restoring the next WAL segment with restore_command, if the connection to the primary server is lost. This ensures that the standby can recover automatically, if the connection is lost for a long time and standby falls behind so much that the required WAL segments have been archived and deleted in the master. This also makes standby_mode useful without streaming replication; the server will keep retrying restore_command every few seconds until the trigger file is found. That's the same basic functionality pg_standby offers, but without the bells and whistles. To implement that, refactor the ReadRecord/FetchRecord functions. The FetchRecord() function introduced in the original streaming replication patch is removed, and all the retry logic is now in a new function called XLogReadPage(). XLogReadPage() is now responsible for executing restore_command, launching walreceiver, and waiting for new WAL to arrive from primary, as required. This also changes the life cycle of walreceiver. When launched, it now only tries to connect to the master once, and exits if the connection fails, or is lost during streaming for any reason. The startup process detects the death, and re-launches walreceiver if necessary.	2010-01-27 15:27:51 +00:00
Heikki Linnakangas	2d29f5f59a	Fix bogus comments.	2010-01-21 08:19:57 +00:00
Heikki Linnakangas	e8aae273d4	Fix bogus subdir setting. Again. I must've unfixed it by accident while moving files around.	2010-01-20 20:34:51 +00:00
Heikki Linnakangas	b3a1ef53c3	Add missing "!= NULL", for the sake of consistency. Fujii Masao	2010-01-20 11:58:44 +00:00
Heikki Linnakangas	32bc08b1d4	Rethink the way walreceiver is linked into the backend. Instead than shoving walreceiver as whole into a dynamically loaded module, split the libpq-specific parts of it into dynamically loaded module and keep the rest in the main backend binary. Although Tom fixed the Windows compilation problems with the old walreceiver module already, this is a cleaner division of labour and makes the code more readable. There's also the prospect of adding new transport methods as pluggable modules in the future, which this patch makes easier, though for now the API between libpqwalreceiver and walreceiver process should be considered private. The libpq-specific module is now in src/backend/replication/libpqwalreceiver, and the part linked with postgres binary is in src/backend/replication/walreceiver.c.	2010-01-20 09:16:24 +00:00
Bruce Momjian	a736540958	Add #include <sys/time.h> for struct timeval definition on BSD/OS.	2010-01-16 01:55:28 +00:00
Tom Lane	cf7af28068	No, scratch that, it was getting added twice.	2010-01-15 21:06:26 +00:00
Tom Lane	4335281604	Actually, I'll bet the mingw problem is lack of $(BE_DLLLIBS) ...	2010-01-15 20:45:42 +00:00
Tom Lane	798fe1d513	Fix bogus subdir setting ... wonder just what that affects ...	2010-01-15 20:34:11 +00:00
Heikki Linnakangas	d3be71a208	Remove unused (in non-assertion-enabled build) variable.	2010-01-15 11:47:15 +00:00
Heikki Linnakangas	40f908bdcd	Introduce Streaming Replication. This includes two new kinds of postmaster processes, walsenders and walreceiver. Walreceiver is responsible for connecting to the primary server and streaming WAL to disk, while walsender runs in the primary server and streams WAL from disk to the client. Documentation still needs work, but the basics are there. We will probably pull the replication section to a new chapter later on, as well as the sections describing file-based replication. But let's do that as a separate patch, so that it's easier to see what has been added/changed. This patch also adds a new section to the chapter about FE/BE protocol, documenting the protocol used by walsender/walreceivxer. Bump catalog version because of two new functions, pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for monitoring the progress of replication. Fujii Masao, with additional hacking by me	2010-01-15 09:19:10 +00:00

... 21 22 23 24 25 ...

1285 Commits