postgresql

Commit Graph

Author	SHA1	Message	Date
Robert Haas	0af695fd43	Log restartpoints in the same fashion as checkpoints. Prior to 9.0, restartpoints never created, deleted, or recycled WAL files, but now they can. This code makes log_checkpoints treat checkpoints and restartpoints symmetrically. It also adjusts up the documentation of the parameter to mention restartpoints. Fujii Masao. Docs by me, as suggested by Itagaki Takahiro.	2011-02-02 21:08:53 -05:00
Heikki Linnakangas	997b48ed96	Support multiple concurrent pg_basebackup backups. With this patch, pg_basebackup doesn't write a backup_label file in the data directory, so it doesn't interfere with a pg_start/stop_backup() based backup anymore. backup_label is still included in the backup, but it is injected directly into the tar stream. Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.	2011-01-31 18:25:39 +02:00
Tom Lane	0f73aae13d	Allow the wal_buffers setting to be auto-tuned to a reasonable value. If wal_buffers is initially set to -1 (which is now the default), it's replaced by 1/32nd of shared_buffers, with a minimum of 8 (the old default) and a maximum of the XLOG segment size. The allowed range for manual settings is still from 4 up to whatever will fit in shared memory. Greg Smith, with implementation correction by me.	2011-01-22 20:31:24 -05:00
Magnus Hagander	4448917d51	Split pg_start_backup() and pg_stop_backup() into two pieces Move the actual functionality into a separate function that's easier to call internally, and change the SQL-callable function to be a wrapper calling this. Also create a pg_abort_backup() function, only callable internally, that does only the most vital parts of pg_stop_backup(), making it safe(r) to call from error handlers.	2011-01-09 21:00:28 +01:00
Robert Haas	a9f72b4083	Improve recovery.conf.sample comments. Jehan-Guillaume de Rorthais, with some additional wordsmithing by me.	2011-01-07 11:01:25 -05:00
Robert Haas	dc8a14311a	Update comments in RecordTransactionCommit() to mention unlogged tables.	2011-01-03 10:29:22 -05:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Alvaro Herrera	55573990ca	Avoid unnecessary public struct declaration in slru.h Instead, declare a public wrapper of the sole function using it for external callers, so that they don't have to always pass a NULL argument. Author: Kevin Grittner	2010-12-30 12:09:17 -03:00
Robert Haas	53dbc27c62	Support unlogged tables. The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.	2010-12-29 06:48:53 -05:00
Magnus Hagander	9b8aff8c19	Add REPLICATION privilege for ROLEs This privilege is required to do Streaming Replication, instead of superuser, making it possible to set up a SR slave that doesn't have write permissions on the master. Superuser privileges do NOT override this check, so in order to use the default superuser account for replication it must be explicitly granted the REPLICATION permissions. This is backwards incompatible change, in the interest of higher default security.	2010-12-29 11:05:03 +01:00
Bruce Momjian	5000472112	Remove quotes from boolean recovery.conf.sample parameters, now that the quotes are not required. This now matches postgresql.conf's specification of booleans.	2010-12-24 11:51:51 -05:00
Heikki Linnakangas	9de3aa65f0	Rewrite the GiST insertion logic so that we don't need the post-recovery cleanup stage to finish incomplete inserts or splits anymore. There was two reasons for the cleanup step: 1. When a new tuple was inserted to a leaf page, the downlink in the parent needed to be updated to contain (ie. to be consistent with) the new key. Updating the parent in turn might require recursively updating the parent of the parent. We now handle that by updating the parent while traversing down the tree, so that when we insert the leaf tuple, all the parents are already consistent with the new key, and the tree is consistent at every step. 2. When a page is split, we need to insert the downlink for the new right page(s), and update the downlink for the original page to not include keys that moved to the right page(s). We now handle that by setting a new flag, F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is set, scans always follow the rightlink, regardless of the NSN mechanism used to detect concurrent page splits. That way the tree is consistent right after split, even though the downlink is still missing. This is very similar to the way B-tree splits are handled. When the downlink is inserted in the parent, the flag is cleared. To keep the insertion algorithm simple, when an insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it finishes the split before doing anything else. These changes allow removing the whole "invalid tuple" mechanism, but I retained the scan code to still follow invalid tuples correctly. While we don't create any such tuples anymore, we want to handle them gracefully in case you pg_upgrade a GiST index that has them. If we encounter any on an insert, though, we just throw an error saying that you need to REINDEX. The issue that got me into doing this is that if you did a checkpoint while an insert or split was in progress, and the checkpoint finishes quickly so that there is no WAL record related to the insert between RedoRecPtr and the checkpoint record, recovery from that checkpoint would not know to finish the incomplete insert. IOW, we have the same issue we solved with the rm_safe_restartpoint mechanism during normal operation too. It's highly unlikely to happen in practice, and this fix is far too large to backpatch, so we're just going to live with in previous versions, but this refactoring fixes it going forward. With this patch, you don't get the annoying 'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices anymore if you crash at an unfortunate moment.	2010-12-23 16:21:47 +02:00
Robert Haas	f6a0863e3c	Allow transactions that don't write WAL to commit asynchronously. This case can arise if a transaction has written data, but only to temporary tables. Loss of the commit record in case of a crash won't matter, because the temporary tables will be lost anyway. Reviewed by Heikki Linnakangas and Simon Riggs.	2010-12-20 12:59:33 -05:00
Robert Haas	34c70c7ac4	Instrument checkpoint sync calls. Greg Smith, reviewed by Jeff Janes	2010-12-14 09:26:19 -05:00
Tom Lane	04f4e10cfc	Use symbolic names not octal constants for file permission flags. Purely cosmetic patch to make our coding standards more consistent --- we were doing symbolic some places and octal other places. This patch fixes all C-coded uses of mkdir, chmod, and umask. There might be some other calls I missed. Inconsistency noted while researching tablespace directory permissions issue.	2010-12-10 17:35:33 -05:00
Simon Riggs	e620ee35b2	Optimize commit_siblings in two ways to improve group commit. First, avoid scanning the whole ProcArray once we know there are at least commit_siblings active; second, skip the check altogether if commit_siblings = 0. Greg Smith	2010-12-08 18:48:03 +00:00
Heikki Linnakangas	5a031a5556	Fix bugs in the hot standby known-assigned-xids tracking logic. If there's an old transaction running in the master, and a lot of transactions have started and finished since, and a WAL-record is written in the gap between the creating the running-xacts snapshot and WAL-logging it, recovery will fail with "too many KnownAssignedXids" error. This bug was reported by Joachim Wieland on Nov 19th. In the same scenario, when fewer transactions have started so that all the xids fit in KnownAssignedXids despite the first bug, a more serious bug arises. We incorrectly initialize the clog code with the oldest still running transaction, and when we see the WAL record belonging to a transaction with an XID larger than one that committed already before the checkpoint we're recovering from, we zero the clog page containing the already committed transaction, leading to data loss. In hindsight, trying to track xids in the known-assigned-xids array before seeing the running-xacts record was too complicated. To fix that, hold XidGenLock while the running-xacts snapshot is taken and WAL-logged. That ensures that no transaction can begin or end in that gap, so that in recvoery we know that the snapshot contains all transactions running at that point in WAL.	2010-12-07 09:23:30 +01:00
Heikki Linnakangas	95e42a2c29	Fix two typos, by Fujii Masao.	2010-12-06 12:38:05 +01:00
Robert Haas	5ef6c91383	Remove now-outdated mention of quotes being required in recovery.conf. Noted by Itagaki Takahiro.	2010-12-03 09:00:18 -05:00
Robert Haas	970a18687f	Use GUC lexer for recovery.conf parsing. This eliminates some crufty, special-purpose code and, as a non-trivial side benefit, allows recovery.conf parameters to be unquoted. Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki Takahiro, and me.	2010-12-03 08:56:44 -05:00
Peter Eisentraut	fc946c39ae	Remove useless whitespace at end of lines	2010-11-23 22:34:55 +02:00
Heikki Linnakangas	542bdb2146	Fix bug introduced by the recent patch to check that the checkpoint redo location read from backup label file can be found: wasShutdown was set incorrectly when a backup label file was found. Jeff Davis, with a little tweaking by me.	2010-11-11 19:32:11 +02:00
Robert Haas	7ba6e4f0e0	Add monitoring function pg_last_xact_replay_timestamp. Fujii Masao, with a little wordsmithing by me.	2010-11-09 22:52:19 -05:00
Heikki Linnakangas	8c843fff2d	Bootstrap WAL to begin at segment logid=0 logseg=1 (000000010000000000000001) rather than 0/0, so that we can safely use 0/0 as an invalid value. This is a more future-proof fix for the corner-case bug in streaming replication that was fixed yesterday. We had a similar corner-case bug with log/seg 0/0 back in February as well. Avoiding 0/0 as a valid value should prevent bugs like that in the future. Per Tom Lane's idea. Back-patch to 9.0. Since this only affects bootstrapping, it makes no difference to existing installations. We don't need to worry about the bug in existing installations, because if you've managed to get past the initial base backup already, you won't hit the bug in the future either.	2010-11-02 11:39:48 +02:00
Heikki Linnakangas	931b6db39b	Fix corner-case bug in tracking of latest removed WAL segment during streaming replication. We used log/seg 0/0 to indicate that no WAL segments have been removed since startup, but 0/0 is a valid value for the very first WAL segment after initdb. To make that disambiguous, store (latest removed WAL segment + 1) in the global variable. Per report from Matt Chesler, also reproduced by Greg Smith.	2010-11-01 10:05:15 +02:00
Heikki Linnakangas	0c6293dd03	Before removing backup_label and irrevocably changing pg_control file, check that WAL file containing the checkpoint redo-location can be found. This avoids making the cluster irrecoverable if the redo location is in an earlie WAL file than the checkpoint record. Report, analysis and patch by Jeff Davis, with small changes by me.	2010-10-26 21:43:52 +03:00
Tom Lane	def30e84c4	Don't try to fetch database name when SetTransactionIdLimit() is executed outside a transaction. This repairs brain fade in my patch of 2009-08-30: the reason we had been storing oldest-database name, not OID, in ShmemVariableCache was of course to avoid having to do a catalog lookup at times when it might be unsafe. This error explains why Aleksandr Dushein is having trouble getting out of an XID wraparound state in bug #5718, though not how he got into that state in the first place. I suspect pg_upgrade is at fault there.	2010-10-20 12:48:51 -04:00
Alvaro Herrera	17a16663d0	Remove AtStart_Cache() call in CommandCounterIncrement(). This call was present in the aboriginal code from Berkeley, and has never been touched; it may very well be that it was there to mask effects of bugs in other places and it may no longer be necessary. The removal has been foreseen in a code comment since 2007; this seems to be a good time to test this hypothesis.	2010-10-20 11:33:57 -03:00
Simon Riggs	3bbcc5c999	Make startup process respond to signals to cancel waiting on latch. A tidy up for recently committed changes to startup latch. Fujii Masao	2010-10-14 19:15:26 +01:00
Simon Riggs	45cd9199c2	Fix bug in comment of timeline history file. Fujii Masao	2010-10-14 19:06:06 +01:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Tom Lane	54d0e2886a	Add some documentation about how we WAL-log filesystem actions. Per a question from Robert Haas.	2010-09-17 00:42:39 +00:00
Heikki Linnakangas	79b54816db	Fix two typos in comments, spotted by Fujii Masao and Thom Brown	2010-09-15 13:58:22 +00:00
Heikki Linnakangas	723d0184e2	Use a latch to make startup process wake up and replay immediately when new WAL arrives via streaming replication. This reduces the latency, and also allows us to use a longer polling interval, which is good for energy efficiency. We still need to poll to check for the appearance of a trigger file, but the interval is now 5 seconds (instead of 100ms), like when waiting for a new WAL segment to appear in WAL archive.	2010-09-15 10:35:05 +00:00
Heikki Linnakangas	2746e5f21d	Introduce latches. A latch is a boolean variable, with the capability to wait until it is set. Latches can be used to reliably wait until a signal arrives, which is hard otherwise because signals don't interrupt select() on some platforms, and even when they do, there's race conditions. On Unix, latches use the so called self-pipe trick under the covers to implement the sleep until the latch is set, without race conditions. On Windows, Windows events are used. Use the new latch abstraction to sleep in walsender, so that as soon as a transaction finishes, walsender is woken up to immediately send the WAL to the standby. This reduces the latency between master and standby, which is good. Preliminary work by Fujii Masao. The latch implementation is by me, with helpful comments from many people.	2010-09-11 15:48:04 +00:00
Tom Lane	eb36d1ad51	Fix oversight in RelFileNodeBackend patch: CreateFakeRelcacheEntry needs to initialize the rd_backend field of a fake Relation entry correctly. Fortunately, that is easy, since only non-temp relations should ever be mentioned in the WAL stream.	2010-08-30 16:46:23 +00:00
Simon Riggs	ac791d3ca1	Fix misleading DEBUG2 issued during RemoveOldXlogFiles()	2010-08-30 15:37:41 +00:00
Simon Riggs	e72f15ed60	Truncate subtrans after each restartpoint. Issue reported by Harald Kolb, patch by Fujii Masao, review by me.	2010-08-30 14:22:05 +00:00
Alvaro Herrera	3a1b51de19	Remove duplicate translatable phrase	2010-08-26 19:23:41 +00:00
Robert Haas	debcec7dc3	Include the backend ID in the relpath of temporary relations. This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.	2010-08-13 20:10:54 +00:00
Robert Haas	95ef7cd40d	Make RecordTransactionCommit() respect wal_level. Since the only purpose of WAL-loggin SharedInvalidationMessages is to support Hot Standby operation, they needn't be included when wal_level < hot_standby. Back-patch to 9.0. Review by Heikki Linnakanagas and Fujii Masao.	2010-08-13 15:42:21 +00:00
Robert Haas	30c22eb8fc	Correct sundry errors in Hot Standby-related comments. Fujii Masao	2010-08-12 23:24:54 +00:00
Simon Riggs	5b8bd0529e	Rename asyncCommitLSN to asyncXactLSN to reflect changed role in 9.0. Transaction aborts now record their LSN to avoid corner case behaviour in SR/HS, hence change of name of variables and functions. As pointed out by Fujii Masao. Cosmetic changes only.	2010-07-29 22:27:27 +00:00
Robert Haas	7be8946c78	Avoid deep recursion when assigning XIDs to multiple levels of subxacts. Backpatch to 8.0. Andres Freund, with cleanup and adjustment for older branches by me.	2010-07-23 00:43:00 +00:00
Tom Lane	672efc0865	Update obsolete comment. Noted by Josh Tolley.	2010-07-08 16:08:30 +00:00
Bruce Momjian	239d769e7e	pgindent run for 9.0, second run	2010-07-06 19:19:02 +00:00
Tom Lane	8771634666	Don't set recoveryLastXTime when replaying a checkpoint --- that was a bogus idea from the start since the variable is only meant to track commit/abort events. This patch reverts the logic around the variable to what it was in 8.4, except that the value is now kept in shared memory rather than a static variable, so that it can be reported correctly by CreateRestartPoint (which is executed in the bgwriter).	2010-07-03 22:15:45 +00:00
Tom Lane	e76c1a0f4d	Replace max_standby_delay with two parameters, max_standby_archive_delay and max_standby_streaming_delay, and revise the implementation to avoid assuming that timestamps found in WAL records can meaningfully be compared to clock time on the standby server. Instead, the delay limits are compared to the elapsed time since we last obtained a new WAL segment from archive or since we were last "caught up" to WAL data arriving via streaming replication. This avoids problems with clock skew between primary and standby, as well as other corner cases that the original coding would misbehave in, such as the primary server having significant idle time between transactions. Per my complaint some time ago and considerable ensuing discussion. Do some desultory editing on the hot standby documentation, too.	2010-07-03 20:43:58 +00:00
Bruce Momjian	b57ddccf05	Add C comment about why synchronous_commit=off behavior can lose committed transactions in a postmaster crash.	2010-06-29 18:44:58 +00:00
Robert Haas	400916b6d7	emode_for_corrupt_record shouldn't reduce LOG messages to WARNING. In non-interactive sessions, WARNING sorts below LOG.	2010-06-28 19:46:19 +00:00

1 2 3 4 5 ...

832 Commits