Commit Graph

1815 Commits

Author SHA1 Message Date
Peter Eisentraut e2a0cb1a80 Message style and spelling improvements 2011-06-22 00:45:34 +03:00
Heikki Linnakangas cb94db91b2 pgindent run of recent SSI changes. Also, remove an unnecessary #include.
Kevin Grittner
2011-06-16 16:17:22 +03:00
Simon Riggs 758bd2a433 Respect Hot Standby controls while recycling btree index pages.
Btree pages were recycled after VACUUM deletes all records on a
page and then a subsequent VACUUM occurs after the RecentXmin
horizon is reached. Using RecentXmin meant that we did not respond
correctly to the user controls provide to avoid Hot Standby
conflicts and so spurious conflicts could be generated in some
workload combinations. We now reuse pages only when we reach
RecentGlobalXmin, which can be much later in the presence of long
running queries and is also controlled by vacuum_defer_cleanup_age
and hot_standby_feedback.

Noah Misch and Simon Riggs
2011-06-16 10:19:10 +01:00
Heikki Linnakangas 0a0e2b52a5 Make non-MVCC snapshots exempt from predicate locking. Scans with non-MVCC
snapshots, like in REINDEX, are basically non-transactional operations. The
DDL operation itself might participate in SSI, but there's separate
functions for that.

Kevin Grittner and Dan Ports, with some changes by me.
2011-06-15 12:11:18 +03:00
Heikki Linnakangas 85ea93384a Oops, forgot to change the order of entries in 2PC callback arrays when I
renumbered the resource managers. This should fix the buildfarm..
2011-06-14 15:16:36 +03:00
Tom Lane c2ba0121c7 Work around gcc 4.6.0 bug that breaks WAL replay.
ReadRecord's habit of using both direct references to tmpRecPtr and
references to *RecPtr (which is pointing at tmpRecPtr) triggers an
optimization bug in gcc 4.6.0, which apparently has forgotten about
aliasing rules.  Avoid the compiler bug, and make the code more readable
to boot, by getting rid of the direct references.  Improve the comments
while at it.

Back-patch to all supported versions, in case they get built with 4.6.0.

Tom Lane, with some cosmetic suggestions from Alex Hunsaker
2011-06-10 17:04:29 -04:00
Bruce Momjian 6560407c7d Pgindent run before 9.1 beta2. 2011-06-09 14:32:50 -04:00
Tom Lane 6923d699bc Protect GIST logic that assumes penalty values can't be negative.
Apparently sane-looking penalty code might return small negative values,
for example because of roundoff error.  This will confuse places like
gistchoose().  Prevent problems by clamping negative penalty values to
zero.  (Just to be really sure, I also made it force NaNs to zero.)
Back-patch to all supported branches.

Alexander Korotkov
2011-05-31 17:53:45 -04:00
Heikki Linnakangas 3103f9a77d The row-version chaining in Serializable Snapshot Isolation was still wrong.
On further analysis, it turns out that it is not needed to duplicate predicate
locks to the new row version at update, the lock on the version that the
transaction saw as visible is enough. However, there was a different bug in
the code that checks for dangerous structures when a new rw-conflict happens.
Fix that bug, and remove all the row-version chaining related code.

Kevin Grittner & Dan Ports, with some comment editorialization by me.
2011-05-30 20:47:17 +03:00
Peter Eisentraut c13dc6402b Spell checking and markup refinement 2011-05-19 01:14:45 +03:00
Alvaro Herrera c6eb5740b3 Fix assorted typos 2011-05-12 08:52:56 -04:00
Heikki Linnakangas a0c8514149 Shut down WAL receiver if it's still running at end of recovery. We used to
just check that it's not running and PANIC if it was, but that can rightfully
happen if recovery stops at recovery target.
2011-05-11 12:46:08 +03:00
Tom Lane d2088ae949 Move RegisterPredicateLockingXid() call to a safer place.
The SSI patch inserted a call of RegisterPredicateLockingXid into
GetNewTransactionId, which was a bad idea on a couple of grounds.  First,
it's not necessary to hold XidGenLock while manipulating that shared
memory, and doing so is bad because XidGenLock is a high-contention lock
that should be held for as short a time as possible.  (Not to mention that
it adds an entirely unnecessary deadlock hazard, since we must take
SerializableXactHashLock as well.)  Second, the specific place where it was
put was between extending CLOG and advancing nextXid, which could result in
unpleasant behavior in case of a failure there.  Pull the call out to
AssignTransactionId, which is much safer and arguably better from a
modularity standpoint too.

There is more work to do to clean up the failure-before-advancing-nextXid
issue, but that is a separate change that will need to be back-patched.
So for the moment I just want to make GetNewTransactionId look the same as
it did in prior versions.
2011-05-06 12:57:28 -04:00
Robert Haas b429519d8d Fix SSI-related assertion failure.
Bug #5899, reported by Marko Tiikkaja.

Heikki Linnakangas, reviewed by Kevin Grittner and Dan Ports.
2011-04-25 09:47:28 -04:00
Tom Lane a0b75a41a9 Hash indexes had better pass the index collation to support functions, too.
Per experimentation with contrib/citext, whose hash function assumes that
it'll be passed a collation.
2011-04-23 14:13:09 -04:00
Tom Lane ae20bf1740 Make GIN and GIST pass the index collation to all their support functions.
Experimentation with contrib/btree_gist shows that the majority of the GIST
support functions potentially need collation information.  Safest policy
seems to be to pass it to all of them, instead of making assumptions about
which ones could possibly need it.
2011-04-22 20:13:12 -04:00
Tom Lane 9e9b9ac7d1 Make a code-cleanup pass over the collations patch.
This patch is almost entirely cosmetic --- mostly cleaning up a lot of
neglected comments, and fixing code layout problems in places where the
patch made lines too long and then pgindent did weird things with that.
I did find a bug-of-omission in equalTupleDescs().
2011-04-22 17:43:18 -04:00
Robert Haas aea1f24c2c recoveryStopsHere() must check the resource manager ID.
Before commit c016ce7281, this wasn't
needed, but now that multiple resource manager IDs can percolate down
through here, we have to make sure we know which one we've got.
Otherwise, we can confuse (for example) an XLOG_XACT_COMMIT record
with an XLOG_CHECKPOINT_SHUTDOWN record.

Review by Jaime Casanova
2011-04-18 08:27:19 -04:00
Tom Lane d2f60a3ab0 Add an Assert that indexam.c isn't used on an index awaiting reindexing.
This might have caught the recent embarrassment over trying to modify
pg_index while its indexes were being rebuilt.

Noah Misch
2011-04-16 19:29:10 -04:00
Heikki Linnakangas 54685b1c2b Revert the patch to check if we've reached end-of-backup also when doing
crash recovery, and throw an error if not. hubert depesz lubaczewski pointed
out that that situation also happens in the crash recovery following a
system crash that happens during an online backup.

We might want to do something smarter in 9.1, like put the check back for
backups taken with pg_basebackup, but that's for another patch.
2011-04-13 22:05:40 +03:00
Tom Lane d64713df7e Pass collations to functions in FunctionCallInfoData, not FmgrInfo.
Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings.  Fix by passing it in FunctionCallInfoData
instead.  In particular this allows a clean fix for bug #5970 (record_cmp
not working).  This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
2011-04-12 19:19:24 -04:00
Peter Eisentraut 5caa3479c2 Clean up most -Wunused-but-set-variable warnings from gcc 4.6
This warning is new in gcc 4.6 and part of -Wall.  This patch cleans
up most of the noise, but there are some still warnings that are
trickier to remove.
2011-04-11 22:28:45 +03:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane 1766a5b63a Tweak collation setup for GIN index comparison functions.
Honor index column's collation spec if there is one, don't go to the
expense of calling get_typcollation when we can reasonably assume that
all GIN storage types will use default collation, and be sure to set
a collation for the comparePartialFn too.
2011-04-08 16:48:25 -04:00
Tom Lane 2594cf0e8c Revise the API for GUC variable assign hooks.
The previous functions of assign hooks are now split between check hooks
and assign hooks, where the former can fail but the latter shouldn't.
Aside from being conceptually clearer, this approach exposes the
"canonicalized" form of the variable value to guc.c without having to do
an actual assignment.  And that lets us fix the problem recently noted by
Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus
log messages about "parameter "wal_buffers" cannot be changed without
restarting the server".  There may be some speed advantage too, because
this design lets hook functions avoid re-parsing variable values when
restoring a previous state after a rollback (they can store a pre-parsed
representation of the value instead).  This patch also resolves a
longstanding annoyance about custom error messages from variable assign
hooks: they should modify, not appear separately from, guc.c's own message
about "invalid parameter value".
2011-04-07 00:12:02 -04:00
Simon Riggs 88f32b7ca2 Avoid assuming there will be only 3 states for synchronous_commit.
Also avoid hardcoding the current default state by giving it the name
"on" and replace with a meaningful name that reflects its behaviour.
Coding only, no change in behaviour.
2011-04-04 23:23:13 +01:00
Robert Haas 240067b3b0 Merge synchronous_replication setting into synchronous_commit.
This means one less thing to configure when setting up synchronous
replication, and also avoids some ambiguity around what the behavior
should be when the settings of these variables conflict.

Fujii Masao, with additional hacking by me.
2011-04-04 16:25:52 -04:00
Heikki Linnakangas 1f0bab8494 Improve error message when WAL ends before reaching end of online backup. 2011-03-31 10:09:49 +03:00
Heikki Linnakangas acf4740132 Check that we've reached end-of-backup also when we're not performing
archive recovery.

It's possible to restore an online backup without recovery.conf, by simply
copying all the necessary WAL files to pg_xlog. "pg_basebackup -x" does that
too. That's the use case where this cross-check is useful.

Backpatch to 9.0. We used to do this in earlier versins, but in 9.0 the code
was inadvertently changed so that the check is only performed after archive
recovery.

Fujii Masao.
2011-03-30 10:53:28 +03:00
Tom Lane 7208fae18f Clean up cruft around collation initialization for tupdescs and scankeys.
I found actual bugs in GiST and plpgsql; the rest of this is cosmetic
but meant to decrease the odds of future bugs of omission.
2011-03-26 18:28:40 -04:00
Simon Riggs b5f2f2a712 Minor changes to recovery pause behaviour.
Change location LOG message so it works each time we pause, not
just for final pause.
Ensure that we pause only if we are in Hot Standby and can connect
to allow us to run resume function. This change supercedes the
code to override parameter recoveryPauseAtTarget to false if not
attempting to enter Hot Standby, which is now removed.
2011-03-23 19:35:53 +00:00
Simon Riggs b98ac467f5 Prevent intermittent hang in recovery from bgwriter interaction.
Startup process waited for cleanup lock but when hot_standby = off
the pid was not registered, so that the bgwriter would not wake
the waiting process as intended.
2011-03-23 13:30:05 +00:00
Heikki Linnakangas 6d8096e2f3 When two base backups are started at the same time with pg_basebackup,
ensure that they use different checkpoints as the starting point. We use
the checkpoint redo location as a unique identifier for the base backup in
the end-of-backup record, and in the backup history file name.

Bug spotted by Fujii Masao.
2011-03-21 11:25:25 +02:00
Tom Lane b310b6e31c Revise collation derivation method and expression-tree representation.
All expression nodes now have an explicit output-collation field, unless
they are known to only return a noncollatable data type (such as boolean
or record).  Also, nodes that can invoke collation-aware functions store
a separate field that is the collation value to pass to the function.
This avoids confusion that arises when a function has collatable inputs
and noncollatable output type, or vice versa.

Also, replace the parser's on-the-fly collation assignment method with
a post-pass over the completed expression tree.  This allows us to use
a more complex (and hopefully more nearly spec-compliant) assignment
rule without paying for it in extra storage in every expression node.

Fix assorted bugs in the planner's handling of collations by making
collation one of the defining properties of an EquivalenceClass and
by converting CollateExprs into discardable RelabelType nodes during
expression preprocessing.
2011-03-19 20:30:08 -04:00
Robert Haas 777e8c0015 Remove bogus semicolons in recoveryPausesHere.
Without this, the startup process goes into a tight loop, consuming
100% of one CPU and failing to respond to interrupts.
2011-03-18 08:09:09 -04:00
Robert Haas 84abea76f6 Add pause_at_recovery_target to recovery.conf.sample; improve docs.
Fujii Masao, but with the proposed behavior change reverted, and the
rest adjusted accordingly.
2011-03-17 14:04:11 -04:00
Bruce Momjian 5ca543fb2e Clarify C comment that O_SYNC/O_FSYNC are really the same settting, as
opposed to O_DSYNC.
2011-03-10 20:02:52 -05:00
Robert Haas d16e290a8a Emit a LOG message when pausing at the recovery target.
Fujii Masao
2011-03-10 14:37:14 -05:00
Tom Lane a051ef699c Remove collation information from TypeName, where it does not belong.
The initial collations patch treated a COLLATE spec as part of a TypeName,
following what can only be described as brain fade on the part of the SQL
committee.  It's a lot more reasonable to treat COLLATE as a syntactically
separate object, so that it can be added in only the productions where it
actually belongs, rather than needing to reject it in a boatload of places
where it doesn't belong (something the original patch mostly failed to do).
In addition this change lets us meet the spec's requirement to allow
COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly
behavior for constructs such as "foo::type COLLATE collation".

To do this, pull collation information out of TypeName and put it in
ColumnDef instead, thus reverting most of the collation-related changes in
parse_type.c's API.  I made one additional structural change, which was to
use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd
nodes.  This provides enough room to get rid of the "transform" wart in
AlterTableCmd too, since the ColumnDef can carry the USING expression
easily enough.

Also fix some other minor bugs that have crept in in the same areas,
like failure to copy recently-added fields of ColumnDef in copyfuncs.c.

While at it, document the formerly secret ability to specify a collation
in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and
ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about
what the default collation selection will be when COLLATE is omitted.

BTW, the three-parameter form of format_type() should go away too,
since it just contributes to the confusion in this area; but I'll do
that in a separate patch.
2011-03-09 22:39:20 -05:00
Heikki Linnakangas 4cd3fb6e12 Truncate predicate lock manager's SLRU lazily at checkpoint. That's safer
than doing it aggressively whenever the tail-XID pointer is advanced, because
this way we don't need to do it while holding SerializableXactHashLock.

This also fixes bug #5915 spotted by YAMAMOTO Takashi, and removes an
obsolete comment spotted by Kevin Grittner.
2011-03-08 12:12:54 +02:00
Heikki Linnakangas 1a4ab9ec23 If recovery_target_timeline is set to 'latest' and standby mode is enabled,
periodically rescan the archive for new timelines, while waiting for new WAL
segments to arrive. This allows you to set up a standby server that follows
the TLI change if another standby server is promoted to master. Before this,
you had to restart the standby server to make it notice the new timeline.

This patch only scans the archive for TLI changes, it won't follow a TLI
change in streaming replication. That is much needed too, but it would be a
much bigger patch than I dare to sneak in this late in the release cycle.

There was discussion on improving the sanity checking of the WAL segments so
that the system would notice more reliably if the new timeline isn't an
ancestor of the current one, but that is not included in this patch.

Reviewed by Fujii Masao.
2011-03-07 21:14:47 +02:00
Simon Riggs a8a8a3e096 Efficient transaction-controlled synchronous replication.
If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.

This version has very strict behaviour; more relaxed options
may be added at a later date.

Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime
Casanova, Heikki Linnakangas and Robert Haas, plus the assistance
of many other design reviewers.
2011-03-06 22:49:16 +00:00
Tom Lane 149b2673c2 Fix incorrect access to pg_index.indcollation.
Since this field is after a variable-length field, it can't simply be
accessed via the C struct for pg_index.  Fortunately, the relcache already
did the dirty work of pulling the information out to where it can be
accessed easily, so this is a one-line fix.

Andres Freund
2011-03-06 12:10:50 -05:00
Heikki Linnakangas ee3838b1d3 You must hold a lock on the heap page when you call
CheckForSerializableConflictOut(), because it can set hint bits.

YAMAMOTO Takashi
2011-03-04 15:43:11 +02:00
Heikki Linnakangas 47ad79122b Fix bugs in Serializable Snapshot Isolation.
Change the way UPDATEs are handled. Instead of maintaining a chain of
tuple-level locks in shared memory, copy any existing locks on the old
tuple to the new tuple at UPDATE. Any existing page-level lock needs to
be duplicated too, as a lock on the new tuple. That was neglected
previously.

Store xmin on tuple-level predicate locks, to distinguish a lock on an old
already-recycled tuple from a new tuple at the same physical location.
Failure to distinguish them caused loops in the tuple-lock chains, as
reported by YAMAMOTO Takashi. Although we don't use the chain representation
of UPDATEs anymore, it seems like a good idea to store the xmin to avoid
some false positives if no other reason.

CheckSingleTargetForConflictsIn now correctly handles the case where a lock
that's being held is not reflected in the local lock table. That happens
if another backend acquires a lock on our behalf due to an UPDATE or a page
split.

PredicateLockPageCombine now retains locks for the page that is being
removed, rather than removing them. This prevents a potentially dangerous
false-positive inconsistency where the local lock table believes that a lock
is held, but it is actually not.

Dan Ports and Kevin Grittner
2011-03-01 19:05:16 +02:00
Tom Lane a874fe7b4c Refactor the executor's API to support data-modifying CTEs better.
The originally committed patch for modifying CTEs didn't interact well
with EXPLAIN, as noted by myself, and also had corner-case problems with
triggers, as noted by Dean Rasheed.  Those problems show it is really not
practical for ExecutorEnd to call any user-defined code; so split the
cleanup duties out into a new function ExecutorFinish, which must be called
between the last ExecutorRun call and ExecutorEnd.  Some Asserts have been
added to these functions to help verify correct usage.

It is no longer necessary for callers of the executor to call
AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now
done by ExecutorStart/ExecutorFinish respectively.  If you really need to
suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to
ExecutorStart.

Also, refactor portal commit processing to allow for the possibility that
PortalDrop will invoke user-defined code.  I think this is not actually
necessary just yet, since the portal-execution-strategy logic forces any
non-pure-SELECT query to be run to completion before we will consider
committing.  But it seems like good future-proofing.
2011-02-27 13:44:12 -05:00
Robert Haas 79ad8fc5f8 Named restore point improvements.
Emit a log message when creating a named restore point, and improve
documentation for pg_create_restore_point().

Euler Taveira de Oliveira, 	per suggestions from Thom Brown, with some
additional wordsmithing by me.
2011-02-24 19:02:00 -05:00
Tom Lane 82220e8832 Un-break building with BTREE_BUILD_STATS.
This has been broken for awhile, but not clear it's worth back-patching.

Euler Taveira de Oliveira
2011-02-18 14:06:16 -05:00
Tom Lane 6595dd04d1 Add backwards-compatible declarations of some core GIN support functions.
These are needed to support reloading dumps of 9.0 installations containing
contrib/intarray or contrib/tsearch2.  Since not only regular dump/reload
but binary upgrade would fail, it seems worth the trouble to carry these
stubs for awhile.  Note that the contrib opclasses referencing these
functions will still work fine, since GIN doesn't actually pay any
attention to the declared signature of a support function.
2011-02-16 17:24:46 -05:00
Simon Riggs bca8b7f16a Hot Standby feedback for avoidance of cleanup conflicts on standby.
Standby optionally sends back information about oldestXmin of queries
which is then checked and applied to the WALSender's proc->xmin.
GetOldestXmin() is modified slightly to agree with GetSnapshotData(),
so that all backends on primary include WALSender within their snapshots.
Note this does nothing to change the snapshot xmin on either master or
standby. Feedback piggybacks on the standby reply message.
vacuum_defer_cleanup_age is no longer used on standby, though parameter
still exists on primary, since some use cases still exist.

Simon Riggs, review comments from Fujii Masao, Heikki Linnakangas, Robert Haas
2011-02-16 19:29:37 +00:00