postgresql

Commit Graph

Author	SHA1	Message	Date
Peter Eisentraut	5caa3479c2	Clean up most -Wunused-but-set-variable warnings from gcc 4.6 This warning is new in gcc 4.6 and part of -Wall. This patch cleans up most of the noise, but there are some still warnings that are trickier to remove.	2011-04-11 22:28:45 +03:00
Bruce Momjian	bf50caf105	pgindent run before PG 9.1 beta 1.	2011-04-10 11:42:00 -04:00
Tom Lane	1766a5b63a	Tweak collation setup for GIN index comparison functions. Honor index column's collation spec if there is one, don't go to the expense of calling get_typcollation when we can reasonably assume that all GIN storage types will use default collation, and be sure to set a collation for the comparePartialFn too.	2011-04-08 16:48:25 -04:00
Tom Lane	2594cf0e8c	Revise the API for GUC variable assign hooks. The previous functions of assign hooks are now split between check hooks and assign hooks, where the former can fail but the latter shouldn't. Aside from being conceptually clearer, this approach exposes the "canonicalized" form of the variable value to guc.c without having to do an actual assignment. And that lets us fix the problem recently noted by Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus log messages about "parameter "wal_buffers" cannot be changed without restarting the server". There may be some speed advantage too, because this design lets hook functions avoid re-parsing variable values when restoring a previous state after a rollback (they can store a pre-parsed representation of the value instead). This patch also resolves a longstanding annoyance about custom error messages from variable assign hooks: they should modify, not appear separately from, guc.c's own message about "invalid parameter value".	2011-04-07 00:12:02 -04:00
Simon Riggs	88f32b7ca2	Avoid assuming there will be only 3 states for synchronous_commit. Also avoid hardcoding the current default state by giving it the name "on" and replace with a meaningful name that reflects its behaviour. Coding only, no change in behaviour.	2011-04-04 23:23:13 +01:00
Robert Haas	240067b3b0	Merge synchronous_replication setting into synchronous_commit. This means one less thing to configure when setting up synchronous replication, and also avoids some ambiguity around what the behavior should be when the settings of these variables conflict. Fujii Masao, with additional hacking by me.	2011-04-04 16:25:52 -04:00
Heikki Linnakangas	1f0bab8494	Improve error message when WAL ends before reaching end of online backup.	2011-03-31 10:09:49 +03:00
Heikki Linnakangas	acf4740132	Check that we've reached end-of-backup also when we're not performing archive recovery. It's possible to restore an online backup without recovery.conf, by simply copying all the necessary WAL files to pg_xlog. "pg_basebackup -x" does that too. That's the use case where this cross-check is useful. Backpatch to 9.0. We used to do this in earlier versins, but in 9.0 the code was inadvertently changed so that the check is only performed after archive recovery. Fujii Masao.	2011-03-30 10:53:28 +03:00
Tom Lane	7208fae18f	Clean up cruft around collation initialization for tupdescs and scankeys. I found actual bugs in GiST and plpgsql; the rest of this is cosmetic but meant to decrease the odds of future bugs of omission.	2011-03-26 18:28:40 -04:00
Simon Riggs	b5f2f2a712	Minor changes to recovery pause behaviour. Change location LOG message so it works each time we pause, not just for final pause. Ensure that we pause only if we are in Hot Standby and can connect to allow us to run resume function. This change supercedes the code to override parameter recoveryPauseAtTarget to false if not attempting to enter Hot Standby, which is now removed.	2011-03-23 19:35:53 +00:00
Simon Riggs	b98ac467f5	Prevent intermittent hang in recovery from bgwriter interaction. Startup process waited for cleanup lock but when hot_standby = off the pid was not registered, so that the bgwriter would not wake the waiting process as intended.	2011-03-23 13:30:05 +00:00
Heikki Linnakangas	6d8096e2f3	When two base backups are started at the same time with pg_basebackup, ensure that they use different checkpoints as the starting point. We use the checkpoint redo location as a unique identifier for the base backup in the end-of-backup record, and in the backup history file name. Bug spotted by Fujii Masao.	2011-03-21 11:25:25 +02:00
Tom Lane	b310b6e31c	Revise collation derivation method and expression-tree representation. All expression nodes now have an explicit output-collation field, unless they are known to only return a noncollatable data type (such as boolean or record). Also, nodes that can invoke collation-aware functions store a separate field that is the collation value to pass to the function. This avoids confusion that arises when a function has collatable inputs and noncollatable output type, or vice versa. Also, replace the parser's on-the-fly collation assignment method with a post-pass over the completed expression tree. This allows us to use a more complex (and hopefully more nearly spec-compliant) assignment rule without paying for it in extra storage in every expression node. Fix assorted bugs in the planner's handling of collations by making collation one of the defining properties of an EquivalenceClass and by converting CollateExprs into discardable RelabelType nodes during expression preprocessing.	2011-03-19 20:30:08 -04:00
Robert Haas	777e8c0015	Remove bogus semicolons in recoveryPausesHere. Without this, the startup process goes into a tight loop, consuming 100% of one CPU and failing to respond to interrupts.	2011-03-18 08:09:09 -04:00
Robert Haas	84abea76f6	Add pause_at_recovery_target to recovery.conf.sample; improve docs. Fujii Masao, but with the proposed behavior change reverted, and the rest adjusted accordingly.	2011-03-17 14:04:11 -04:00
Bruce Momjian	5ca543fb2e	Clarify C comment that O_SYNC/O_FSYNC are really the same settting, as opposed to O_DSYNC.	2011-03-10 20:02:52 -05:00
Robert Haas	d16e290a8a	Emit a LOG message when pausing at the recovery target. Fujii Masao	2011-03-10 14:37:14 -05:00
Tom Lane	a051ef699c	Remove collation information from TypeName, where it does not belong. The initial collations patch treated a COLLATE spec as part of a TypeName, following what can only be described as brain fade on the part of the SQL committee. It's a lot more reasonable to treat COLLATE as a syntactically separate object, so that it can be added in only the productions where it actually belongs, rather than needing to reject it in a boatload of places where it doesn't belong (something the original patch mostly failed to do). In addition this change lets us meet the spec's requirement to allow COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly behavior for constructs such as "foo::type COLLATE collation". To do this, pull collation information out of TypeName and put it in ColumnDef instead, thus reverting most of the collation-related changes in parse_type.c's API. I made one additional structural change, which was to use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd nodes. This provides enough room to get rid of the "transform" wart in AlterTableCmd too, since the ColumnDef can carry the USING expression easily enough. Also fix some other minor bugs that have crept in in the same areas, like failure to copy recently-added fields of ColumnDef in copyfuncs.c. While at it, document the formerly secret ability to specify a collation in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about what the default collation selection will be when COLLATE is omitted. BTW, the three-parameter form of format_type() should go away too, since it just contributes to the confusion in this area; but I'll do that in a separate patch.	2011-03-09 22:39:20 -05:00
Heikki Linnakangas	4cd3fb6e12	Truncate predicate lock manager's SLRU lazily at checkpoint. That's safer than doing it aggressively whenever the tail-XID pointer is advanced, because this way we don't need to do it while holding SerializableXactHashLock. This also fixes bug #5915 spotted by YAMAMOTO Takashi, and removes an obsolete comment spotted by Kevin Grittner.	2011-03-08 12:12:54 +02:00
Heikki Linnakangas	1a4ab9ec23	If recovery_target_timeline is set to 'latest' and standby mode is enabled, periodically rescan the archive for new timelines, while waiting for new WAL segments to arrive. This allows you to set up a standby server that follows the TLI change if another standby server is promoted to master. Before this, you had to restart the standby server to make it notice the new timeline. This patch only scans the archive for TLI changes, it won't follow a TLI change in streaming replication. That is much needed too, but it would be a much bigger patch than I dare to sneak in this late in the release cycle. There was discussion on improving the sanity checking of the WAL segments so that the system would notice more reliably if the new timeline isn't an ancestor of the current one, but that is not included in this patch. Reviewed by Fujii Masao.	2011-03-07 21:14:47 +02:00
Simon Riggs	a8a8a3e096	Efficient transaction-controlled synchronous replication. If a standby is broadcasting reply messages and we have named one or more standbys in synchronous_standby_names then allow users who set synchronous_replication to wait for commit, which then provides strict data integrity guarantees. Design avoids sending and receiving transaction state information so minimises bookkeeping overheads. We synchronize with the highest priority standby that is connected and ready to synchronize. Other standbys can be defined to takeover in case of standby failure. This version has very strict behaviour; more relaxed options may be added at a later date. Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime Casanova, Heikki Linnakangas and Robert Haas, plus the assistance of many other design reviewers.	2011-03-06 22:49:16 +00:00
Tom Lane	149b2673c2	Fix incorrect access to pg_index.indcollation. Since this field is after a variable-length field, it can't simply be accessed via the C struct for pg_index. Fortunately, the relcache already did the dirty work of pulling the information out to where it can be accessed easily, so this is a one-line fix. Andres Freund	2011-03-06 12:10:50 -05:00
Heikki Linnakangas	ee3838b1d3	You must hold a lock on the heap page when you call CheckForSerializableConflictOut(), because it can set hint bits. YAMAMOTO Takashi	2011-03-04 15:43:11 +02:00
Heikki Linnakangas	47ad79122b	Fix bugs in Serializable Snapshot Isolation. Change the way UPDATEs are handled. Instead of maintaining a chain of tuple-level locks in shared memory, copy any existing locks on the old tuple to the new tuple at UPDATE. Any existing page-level lock needs to be duplicated too, as a lock on the new tuple. That was neglected previously. Store xmin on tuple-level predicate locks, to distinguish a lock on an old already-recycled tuple from a new tuple at the same physical location. Failure to distinguish them caused loops in the tuple-lock chains, as reported by YAMAMOTO Takashi. Although we don't use the chain representation of UPDATEs anymore, it seems like a good idea to store the xmin to avoid some false positives if no other reason. CheckSingleTargetForConflictsIn now correctly handles the case where a lock that's being held is not reflected in the local lock table. That happens if another backend acquires a lock on our behalf due to an UPDATE or a page split. PredicateLockPageCombine now retains locks for the page that is being removed, rather than removing them. This prevents a potentially dangerous false-positive inconsistency where the local lock table believes that a lock is held, but it is actually not. Dan Ports and Kevin Grittner	2011-03-01 19:05:16 +02:00
Tom Lane	a874fe7b4c	Refactor the executor's API to support data-modifying CTEs better. The originally committed patch for modifying CTEs didn't interact well with EXPLAIN, as noted by myself, and also had corner-case problems with triggers, as noted by Dean Rasheed. Those problems show it is really not practical for ExecutorEnd to call any user-defined code; so split the cleanup duties out into a new function ExecutorFinish, which must be called between the last ExecutorRun call and ExecutorEnd. Some Asserts have been added to these functions to help verify correct usage. It is no longer necessary for callers of the executor to call AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now done by ExecutorStart/ExecutorFinish respectively. If you really need to suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to ExecutorStart. Also, refactor portal commit processing to allow for the possibility that PortalDrop will invoke user-defined code. I think this is not actually necessary just yet, since the portal-execution-strategy logic forces any non-pure-SELECT query to be run to completion before we will consider committing. But it seems like good future-proofing.	2011-02-27 13:44:12 -05:00
Robert Haas	79ad8fc5f8	Named restore point improvements. Emit a log message when creating a named restore point, and improve documentation for pg_create_restore_point(). Euler Taveira de Oliveira, per suggestions from Thom Brown, with some additional wordsmithing by me.	2011-02-24 19:02:00 -05:00
Tom Lane	82220e8832	Un-break building with BTREE_BUILD_STATS. This has been broken for awhile, but not clear it's worth back-patching. Euler Taveira de Oliveira	2011-02-18 14:06:16 -05:00
Tom Lane	6595dd04d1	Add backwards-compatible declarations of some core GIN support functions. These are needed to support reloading dumps of 9.0 installations containing contrib/intarray or contrib/tsearch2. Since not only regular dump/reload but binary upgrade would fail, it seems worth the trouble to carry these stubs for awhile. Note that the contrib opclasses referencing these functions will still work fine, since GIN doesn't actually pay any attention to the declared signature of a support function.	2011-02-16 17:24:46 -05:00
Simon Riggs	bca8b7f16a	Hot Standby feedback for avoidance of cleanup conflicts on standby. Standby optionally sends back information about oldestXmin of queries which is then checked and applied to the WALSender's proc->xmin. GetOldestXmin() is modified slightly to agree with GetSnapshotData(), so that all backends on primary include WALSender within their snapshots. Note this does nothing to change the snapshot xmin on either master or standby. Feedback piggybacks on the standby reply message. vacuum_defer_cleanup_age is no longer used on standby, though parameter still exists on primary, since some use cases still exist. Simon Riggs, review comments from Fujii Masao, Heikki Linnakangas, Robert Haas	2011-02-16 19:29:37 +00:00
Robert Haas	4695da5ae9	pg_ctl promote Fujii Masao, reviewed by Robert Haas, Stephen Frost, and Magnus Hagander.	2011-02-15 21:30:23 -05:00
Simon Riggs	5c588be729	PITR can stop at a named restore point when recovery target = time though must not update the last transaction timestamp. Plus comment and message cleanup for recent named restore point. Fujii Masao, minor changes by me	2011-02-15 00:51:39 +00:00
Heikki Linnakangas	b186523fd9	Send status updates back from standby server to master, indicating how far the standby has written, flushed, and applied the WAL. At the moment, this is for informational purposes only, the values are only shown in pg_stat_replication system view, but in the future they will also be needed for synchronous replication. Extracted from Simon riggs' synchronous replication patch by Robert Haas, with some tweaking by me.	2011-02-10 21:04:02 +02:00
Magnus Hagander	3144c33a2f	Implement NOWAIT option for BASE_BACKUP command Specifying this option makes the server not wait for the xlog to be archived, or emit a warning that it can't, instead leaving the responsibility with the client. This is useful when the log is being streamed using the streaming protocol in parallel with the backup, without having log archiving enabled.	2011-02-09 10:59:53 +01:00
Peter Eisentraut	414c5a2ea6	Per-column collation support This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch	2011-02-08 23:04:18 +02:00
Simon Riggs	c016ce7281	Named restore points in recovery. Users can record named points, then new recovery.conf parameter recovery_target_name allows PITR to specify named points as recovery targets. Jaime Casanova, reviewed by Euler Taveira de Oliveira, plus minor edits	2011-02-08 19:39:08 +00:00
Simon Riggs	8c6e3adbf7	Basic Recovery Control functions for use in Hot Standby. Pause, Resume, Status check functions only. Also, new recovery.conf parameter to pause_at_recovery_target, default on. Simon Riggs, reviewed by Fujii Masao	2011-02-08 18:30:22 +00:00
Simon Riggs	faa0550572	Remove rare corner case for data loss when triggering standby server. If the standby was streaming when trigger file arrives, check also in the archive for additional WAL files. This is a corner case since it is unlikely that we would trigger a failover while the master is still available and sending data to standby, while at the same time running in archive mode and also while the streaming standby has fallen behind archive. Someone would eventually be unlucky; we must plug all gaps however small. Fujii Masao	2011-02-08 14:38:02 +00:00
Heikki Linnakangas	dafaa3efb7	Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen	2011-02-08 00:09:08 +02:00
Robert Haas	0af695fd43	Log restartpoints in the same fashion as checkpoints. Prior to 9.0, restartpoints never created, deleted, or recycled WAL files, but now they can. This code makes log_checkpoints treat checkpoints and restartpoints symmetrically. It also adjusts up the documentation of the parameter to mention restartpoints. Fujii Masao. Docs by me, as suggested by Itagaki Takahiro.	2011-02-02 21:08:53 -05:00
Heikki Linnakangas	997b48ed96	Support multiple concurrent pg_basebackup backups. With this patch, pg_basebackup doesn't write a backup_label file in the data directory, so it doesn't interfere with a pg_start/stop_backup() based backup anymore. backup_label is still included in the backup, but it is injected directly into the tar stream. Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.	2011-01-31 18:25:39 +02:00
Tom Lane	0f73aae13d	Allow the wal_buffers setting to be auto-tuned to a reasonable value. If wal_buffers is initially set to -1 (which is now the default), it's replaced by 1/32nd of shared_buffers, with a minimum of 8 (the old default) and a maximum of the XLOG segment size. The allowed range for manual settings is still from 4 up to whatever will fit in shared memory. Greg Smith, with implementation correction by me.	2011-01-22 20:31:24 -05:00
Magnus Hagander	4448917d51	Split pg_start_backup() and pg_stop_backup() into two pieces Move the actual functionality into a separate function that's easier to call internally, and change the SQL-callable function to be a wrapper calling this. Also create a pg_abort_backup() function, only callable internally, that does only the most vital parts of pg_stop_backup(), making it safe(r) to call from error handlers.	2011-01-09 21:00:28 +01:00
Heikki Linnakangas	ca63029eac	Fix crash in the new GiST insertion code, when an update splits the root page. This bug was exercised by contrib/intarray/bench, as noted by Tom Lane.	2011-01-09 21:36:22 +02:00
Tom Lane	56a57473a9	Refactor GIN's handling of duplicate search entries. The original coding could combine duplicate entries only when they originated from the same qual condition. In particular it could not combine cases where multiple qual conditions all give rise to full-index scan requests, which is an expensive case well worth optimizing. Refactor so that duplicates are recognized across all the quals.	2011-01-08 14:48:08 -05:00
Tom Lane	73912e7fbd	Fix GIN to support null keys, empty and null items, and full index scans. Per my recent proposal(s). Null key datums can now be returned by extractValue and extractQuery functions, and will be stored in the index. Also, placeholder entries are made for indexable items that are NULL or contain no keys according to extractValue. This means that the index is now always complete, having at least one entry for every indexed heap TID, and so we can get rid of the prohibition on full-index scans. A full-index scan is implemented much the same way as partial-match scans were already: we build a bitmap representing all the TIDs found in the index, and then drive the results off that. Also, introduce a concept of a "search mode" that can be requested by extractQuery when the operator requires matching to empty items (this is just as cheap as matching to a single key) or requires a full index scan (which is not so cheap, but it sure beats failing or giving wrong answers). The behavior remains backward compatible for opclasses that don't return any null keys or request a non-default search mode. Using these features, we can now make the GIN index opclass for anyarray behave in a way that matches the actual anyarray operators for &&, <@, @>, and = ... which it failed to do before in assorted corner cases. This commit fixes the core GIN code and ginarrayprocs.c, updates the documentation, and adds some simple regression test cases for the new behaviors using the array operators. The tsearch and contrib GIN opclass support functions still need to be looked over and probably fixed. Another thing I intend to fix separately is that this is pretty inefficient for cases where more than one scan condition needs a full-index search: we'll run duplicate GinScanEntrys, each one of which builds a large bitmap. There is some existing logic to merge duplicate GinScanEntrys but it needs refactoring to make it work for entries belonging to different scan keys. Note that most of gin.h has been split out into a new file gin_private.h, so that gin.h doesn't export anything that's not supposed to be used by GIN opclasses or the rest of the backend. I did quite a bit of other code beautification work as well, mostly fixing comments and choosing more appropriate names for things.	2011-01-07 19:16:24 -05:00
Robert Haas	a9f72b4083	Improve recovery.conf.sample comments. Jehan-Guillaume de Rorthais, with some additional wordsmithing by me.	2011-01-07 11:01:25 -05:00
Robert Haas	dc8a14311a	Update comments in RecordTransactionCommit() to mention unlogged tables.	2011-01-03 10:29:22 -05:00
Robert Haas	0d692a0dc9	Basic foreign table support. Foreign tables are a core component of SQL/MED. This commit does not provide a working SQL/MED infrastructure, because foreign tables cannot yet be queried. Support for foreign table scans will need to be added in a future patch. However, this patch creates the necessary system catalog structure, syntax support, and support for ancillary operations such as COMMENT and SECURITY LABEL. Shigeru Hanada, heavily revised by Robert Haas	2011-01-01 23:48:11 -05:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Alvaro Herrera	55573990ca	Avoid unnecessary public struct declaration in slru.h Instead, declare a public wrapper of the sole function using it for external callers, so that they don't have to always pass a NULL argument. Author: Kevin Grittner	2010-12-30 12:09:17 -03:00

1 2 3 4 5 ...

1794 Commits