Commit Graph

12041 Commits

Author SHA1 Message Date
Tom Lane
c5ff3ff492 Avoid an unnecessary syscache lookup in parse_coerce.c.
All the other fields of the constant are being extracted from the syscache
entry we already have, so handle collation similarly.  (There don't seem
to be any other uses for the new function at the moment.)
2011-04-08 16:11:41 -04:00
Robert Haas
0bd155cbf2 Fix bug in propagating ALTER TABLE actions to typed tables.
We need to propagate such actions to all typed table children of a
given type, not just the first one.

Noah Misch
2011-04-08 15:46:13 -04:00
Robert Haas
fbc0d07796 Partially roll back overenthusiastic SSI optimization.
When a regular lock is held, SSI can use that in lieu of a predicate lock
to detect rw conflicts; but if the regular lock is being taken by a
subtransaction, we can't assume that it'll commit, so releasing the
parent transaction's lock in that case is a no-no.

Kevin Grittner
2011-04-08 15:29:02 -04:00
Robert Haas
56c7140ca8 Tweaks for SSI out-of-shared memory behavior.
If we call hash_search() with HASH_ENTER, it will bail out rather than
return NULL, so it's redundant to check for NULL again in the caller.
Thus, in cases where we believe it's impossible for the hash table to run
out of slots anyway, we can simplify the code slightly.

On the flip side, in cases where it's theoretically possible to run out of
space, we don't want to rely on dynahash.c to throw an error; instead,
we pass HASH_ENTER_NULL and throw the error ourselves if a NULL comes
back, so that we can provide a more descriptive error message.

Kevin Grittner
2011-04-07 16:43:39 -04:00
Tom Lane
73d9a90814 Modernize dlopen interface code for FreeBSD and OpenBSD.
Remove the hard-wired assumption that __mips__ (and only __mips__) lacks
dlopen in FreeBSD and OpenBSD.  This assumption is outdated at least for
OpenBSD, as per report from an anonymous 9.1 tester.  We can perfectly well
use HAVE_DLOPEN instead to decide which code to use.

Some other cosmetic adjustments to make freebsd.c, netbsd.c, and openbsd.c
exactly alike.
2011-04-07 15:14:39 -04:00
Tom Lane
d8d429890d Fix collations when we call transformWhereClause from outside the parser.
Previous patches took care of assorted places that call transformExpr from
outside the main parser, but I overlooked the fact that some places use
transformWhereClause as a shortcut for transformExpr + coerce_to_boolean.
In particular this broke collation-sensitive index WHERE clauses, as per
report from Thom Brown.  Trigger WHEN and rule WHERE clauses too.

I'm not forcing initdb for this fix, but any affected indexes, triggers,
or rules will need to be dropped and recreated.
2011-04-07 02:34:57 -04:00
Tom Lane
2594cf0e8c Revise the API for GUC variable assign hooks.
The previous functions of assign hooks are now split between check hooks
and assign hooks, where the former can fail but the latter shouldn't.
Aside from being conceptually clearer, this approach exposes the
"canonicalized" form of the variable value to guc.c without having to do
an actual assignment.  And that lets us fix the problem recently noted by
Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus
log messages about "parameter "wal_buffers" cannot be changed without
restarting the server".  There may be some speed advantage too, because
this design lets hook functions avoid re-parsing variable values when
restoring a previous state after a rollback (they can store a pre-parsed
representation of the value instead).  This patch also resolves a
longstanding annoyance about custom error messages from variable assign
hooks: they should modify, not appear separately from, guc.c's own message
about "invalid parameter value".
2011-04-07 00:12:02 -04:00
Robert Haas
632f0faa7c Repair some flakiness in CheckTargetForConflictsIn.
When we release and reacquire SerializableXactHashLock, we must recheck
whether an R/W conflict still needs to be flagged, because it could have
changed under us in the meantime.  And when we release the partition
lock, we must re-walk the list of predicate locks from the beginning,
because our pointer could get invalidated under us.

Bug report #5952 by Yamamoto Takashi.  Patch by Kevin Grittner.
2011-04-05 15:17:25 -04:00
Robert Haas
f5e524d92b Add casts from int4 and int8 to numeric.
Joey Adams, per gripe from Ramanujam.  Review by myself and Tom Lane.
2011-04-05 09:35:43 -04:00
Simon Riggs
88f32b7ca2 Avoid assuming there will be only 3 states for synchronous_commit.
Also avoid hardcoding the current default state by giving it the name
"on" and replace with a meaningful name that reflects its behaviour.
Coding only, no change in behaviour.
2011-04-04 23:23:13 +01:00
Robert Haas
240067b3b0 Merge synchronous_replication setting into synchronous_commit.
This means one less thing to configure when setting up synchronous
replication, and also avoids some ambiguity around what the behavior
should be when the settings of these variables conflict.

Fujii Masao, with additional hacking by me.
2011-04-04 16:25:52 -04:00
Robert Haas
a0e50e698b Include pid in pg_lock_status() results even for SIREAD locks.
Dan Ports
2011-04-04 13:23:43 -04:00
Robert Haas
6c57239985 Rearrange "add column" logic to merge columns at exec time.
The previous coding set attinhcount too high in some cases, resulting in
an undumpable, undroppable column.  Per bug #5856, reported by Naoya
Anzai.  See also commit 31b6fc06d8, which
fixes a similar bug in ALTER TABLE .. ADD CONSTRAINT.

Patch by Noah Misch.
2011-04-03 21:53:32 -04:00
Robert Haas
38b27792ea Avoid possible hang during smart shutdown.
If a smart shutdown occurs just as a child is starting up, and the
child subsequently becomes a walsender, there is a race condition:
the postmaster might count the exstant backends, determine that there
is one normal backend, and wait for it to die off.  Had the walsender
transition already occurred before the postmaster counted, it would
have proceeded with the shutdown.

To fix this, have each child that transforms into a walsender kick
the postmaster just after doing so, so that the state machine is
certain to advance.

Fujii Masao
2011-04-03 19:42:00 -04:00
Magnus Hagander
5735efee15 Avoid palloc before CurrentMemoryContext is set up on win32
Instead, write the unconverted output - it will be in the wrong
encoding, but at least we don't crash.

Rushabh Lathia
2011-04-01 19:59:44 +02:00
Robert Haas
50533a6dc5 Support comments on FOREIGN DATA WRAPPER and SERVER objects.
This mostly involves making it work with the objectaddress.c framework,
which does most of the heavy lifting.  In that vein, change
GetForeignDataWrapperOidByName to get_foreign_data_wrapper_oid and
GetForeignServerOidByName to get_foreign_server_oid, to match the
pattern we use for other object types.

Robert Haas and Shigeru Hanada
2011-04-01 11:28:28 -04:00
Robert Haas
7fcc75dd26 Fix compiler warning. 2011-04-01 10:36:38 -04:00
Heikki Linnakangas
9d56886112 Fix two missing spaces in error messages.
Josh Kupershmidt
2011-04-01 14:42:38 +03:00
Heikki Linnakangas
60b142b9a6 Fix a tiny race condition in predicate locking. Need to hold the lock while
examining the head of predicate locks list. Also, fix the comment of
RemoveTargetIfNoLongerUsed, it was neglected when we changed the way update
chains are handled.

Kevin Grittner
2011-03-31 18:43:23 +03:00
Heikki Linnakangas
1f0bab8494 Improve error message when WAL ends before reaching end of online backup. 2011-03-31 10:09:49 +03:00
Andrew Dunstan
382fb6a08f Attempt to unbreak windows builds broken by commit 754baa2. 2011-03-30 16:43:31 -04:00
Heikki Linnakangas
acf4740132 Check that we've reached end-of-backup also when we're not performing
archive recovery.

It's possible to restore an online backup without recovery.conf, by simply
copying all the necessary WAL files to pg_xlog. "pg_basebackup -x" does that
too. That's the use case where this cross-check is useful.

Backpatch to 9.0. We used to do this in earlier versins, but in 9.0 the code
was inadvertently changed so that the check is only performed after archive
recovery.

Fujii Masao.
2011-03-30 10:53:28 +03:00
Heikki Linnakangas
754baa21f7 Automatically terminate replication connections that are idle for more
than replication_timeout (a new GUC) milliseconds. The TCP timeout is often
too long, you want the master to notice a dead connection much sooner.
People complained about that in 9.0 too, but with synchronous replication
it's even more important to notice dead connections promptly.

Fujii Masao and Heikki Linnakangas
2011-03-30 10:20:37 +03:00
Heikki Linnakangas
bc03c5937d Adjust error message, now that we expect other message types than connection
close at this point. Fix PQsetnonblocking() comment.

Fujii Masao
2011-03-30 08:54:28 +03:00
Peter Eisentraut
f564e65cda Update SQL features list
Feature F692 "Extended collation support" is now also supported.  This
refers to allowing the COLLATE clause anywhere in a column or domain
definition instead of just directly after the type.

Also correct the name of the feature in accordance with the latest SQL
standard.
2011-03-29 23:23:50 +03:00
Tom Lane
eb51af71f2 Prevent a rowtype from being included in itself.
Eventually we might be able to allow that, but it's not clear how many
places need to be fixed to prevent infinite recursion when there's a direct
or indirect inclusion of a rowtype in itself.  One such place is
CheckAttributeType(), which will recurse to stack overflow in cases such as
those exhibited in bug #5950 from Alex Perepelica.  If we were sure it was
the only such place, we could easily modify the code added by this patch to
stop the recursion without a complaint ... but it probably isn't the only
such place.  Hence, throw error until such time as someone is excited
enough about this type of usage to put work into making it safe.

Back-patch as far as 8.3.  8.2 doesn't have the recursive call in
CheckAttributeType in the first place, so I see no need to add code there
in the absence of clear evidence of a problem elsewhere.
2011-03-28 15:46:04 -04:00
Tom Lane
d0dd5c7352 Fix check_exclusion_constraint() to insert correct collations in ScanKeys. 2011-03-27 13:29:52 -04:00
Tom Lane
7208fae18f Clean up cruft around collation initialization for tupdescs and scankeys.
I found actual bugs in GiST and plpgsql; the rest of this is cosmetic
but meant to decrease the odds of future bugs of omission.
2011-03-26 18:28:40 -04:00
Tom Lane
0c9d9e8dd6 More collations cleanup, from trawling for missed collation assignments.
Mostly cosmetic, though I did find that generateClonedIndexStmt failed
to clone the index's collations.
2011-03-26 16:35:25 -04:00
Tom Lane
b23c9fa929 Clean up a few failures to set collation fields in expression nodes.
I'm not sure these have any non-cosmetic implications, but I'm not sure
they don't, either.  In particular, ensure the CaseTestExpr generated
by transformAssignmentIndirection to represent the base target column
carries the correct collation, because parse_collate.c won't fix that.
Tweak lsyscache.c API so that we can get the appropriate collation
without an extra syscache lookup.
2011-03-26 14:25:48 -04:00
Simon Riggs
92f4786fa9 Additional test for each commit in sync rep path to plug minute
possibility of race condition that would effect performance only.
Requested by Robert Haas. Re-arrange related comments.
2011-03-26 10:09:37 +00:00
Tom Lane
bfa4440ca5 Pass collation to makeConst() instead of looking it up internally.
In nearly all cases, the caller already knows the correct collation, and
in a number of places, the value the caller has handy is more correct than
the default for the type would be.  (In particular, this patch makes it
significantly less likely that eval_const_expressions will result in
changing the exposed collation of an expression.)  So an internal lookup
is both expensive and wrong.
2011-03-25 20:10:42 -04:00
Tom Lane
c8e993503d Fix failure to propagate collation in negate_clause().
Turns out it was this, and not so much plpgsql, that was at fault in Stefan
Huehner's collation-error-in-a-trigger bug report of a couple weeks ago.
2011-03-25 18:44:47 -04:00
Robert Haas
30f6136f28 Make walreceiver send a reply after receiving data but before flushing it.
It originally worked this way, but was changed by commit
a8a8a3e096, since which time it's been impossible
for walreceiver to ever send a reply with write_location and flush_location
set to different values.
2011-03-25 11:28:16 -04:00
Tom Lane
27dc7e240b Fix handling of collation in SQL-language functions.
Ensure that parameter symbols receive collation from the function's
resolved input collation, and fix inlining to behave properly.

BTW, this commit lays about 90% of the infrastructure needed to support
use of argument names in SQL functions.  Parsing of parameters is now
done via the parser-hook infrastructure ... we'd just need to supply
a column-ref hook ...
2011-03-24 20:30:23 -04:00
Robert Haas
a432e2783b Add post-creation hook for extensions, consistent with other object types.
KaiGai Kohei
2011-03-24 18:44:49 -04:00
Tom Lane
3bba9ce945 Clean up handling of COLLATE clauses in index column definitions.
Ensure that COLLATE at the top level of an index expression is treated the
same as a grammatically separate COLLATE.  Fix bogus reverse-parsing logic
in pg_get_indexdef.
2011-03-24 15:29:52 -04:00
Simon Riggs
b5f2f2a712 Minor changes to recovery pause behaviour.
Change location LOG message so it works each time we pause, not
just for final pause.
Ensure that we pause only if we are in Hot Standby and can connect
to allow us to run resume function. This change supercedes the
code to override parameter recoveryPauseAtTarget to false if not
attempting to enter Hot Standby, which is now removed.
2011-03-23 19:35:53 +00:00
Robert Haas
19584ec659 Remove synchronous_replication/max_wal_senders cross-check.
This is no longer necessary, and might result in a situation where the
configuration file is reloaded (and everything seems OK) but a subsequent
restart of the database fails.

Per an observation from Fujii Masao.
2011-03-23 11:44:27 -04:00
Simon Riggs
b98ac467f5 Prevent intermittent hang in recovery from bgwriter interaction.
Startup process waited for cleanup lock but when hot_standby = off
the pid was not registered, so that the bgwriter would not wake
the waiting process as intended.
2011-03-23 13:30:05 +00:00
Simon Riggs
ec497a5ad6 Make FKs valid at creation when added as column constraints.
Bug report from Alvaro Herrera
2011-03-22 23:10:35 +00:00
Tom Lane
6e197cb2e5 Improve reporting of run-time-detected indeterminate-collation errors.
pg_newlocale_from_collation does not have enough context to give an error
message that's even a little bit useful, so move the responsibility for
complaining up to its callers.  Also, reword ERRCODE_INDETERMINATE_COLLATION
error messages in a less jargony, more message-style-guide-compliant
fashion.
2011-03-22 16:55:32 -04:00
Tom Lane
37d6d07dda Throw error for indeterminate collation of an ORDER/GROUP/DISTINCT target.
This restores a parse error that was thrown (though only in the ORDER BY
case) by the original collation patch.  I had removed it in my recent
revisions because it was thrown at a place where collations now haven't
been computed yet; but I thought of another way to handle it.

Throwing the error at parse time, rather than leaving it to be done at
runtime, is good because a syntax error pointer is helpful for localizing
the problem.  We can reasonably assume that the comparison function for a
collatable datatype will complain if it doesn't have a collation to use.
Now the planner might choose to implement GROUP or DISTINCT via hashing,
in which case no runtime error would actually occur, but it seems better
to throw error consistently rather than let the error depend on what the
planner chooses to do.  Another possible objection is that the user might
specify a nondefault sort operator that doesn't care about collation
... but that's surely an uncommon usage, and it wouldn't hurt him to throw
in a COLLATE clause anyway.  This change also makes the ORDER BY/GROUP
BY/DISTINCT case more consistent with the UNION/INTERSECT/EXCEPT case,
which was already coded to throw this error even though the same objections
could be raised there.
2011-03-22 15:58:03 -04:00
Tom Lane
1192ba8b67 Avoid potential deadlock in InitCatCachePhase2().
Opening a catcache's index could require reading from that cache's own
catalog, which of course would acquire AccessShareLock on the catalog.
So the original coding here risks locking index before heap, which could
deadlock against another backend trying to get exclusive locks in the
normal order.  Because InitCatCachePhase2 is only called when a backend
has to start up without a relcache init file, the deadlock was seldom seen
in the field.  (And by the same token, there's no need to worry about any
performance disadvantage; so not much point in trying to distinguish
exactly which catalogs have the risk.)

Bug report, diagnosis, and patch by Nikhil Sontakke.  Additional commentary
by me.  Back-patch to all supported branches.
2011-03-22 13:00:48 -04:00
Tom Lane
8df08c8489 Reimplement planner's handling of MIN/MAX aggregate optimization (again).
Instead of playing cute games with pathkeys, just build a direct
representation of the intended sub-select, and feed it through
query_planner to get a Path for the index access.  This is a bit slower
than 9.1's previous method, since we'll duplicate most of the overhead of
query_planner; but since the whole optimization only applies to rather
simple single-table queries, that probably won't be much of a problem in
practice.  The advantage is that we get to do the right thing when there's
a partial index that needs the implicit IS NOT NULL clause to be usable.
Also, although this makes planagg.c be a bit more closely tied to the
ordering of operations in grouping_planner, we can get rid of some coupling
to lower-level parts of the planner.  Per complaint from Marti Raudsepp.
2011-03-22 00:34:31 -04:00
Heikki Linnakangas
6d8096e2f3 When two base backups are started at the same time with pg_basebackup,
ensure that they use different checkpoints as the starting point. We use
the checkpoint redo location as a unique identifier for the base backup in
the end-of-backup record, and in the backup history file name.

Bug spotted by Fujii Masao.
2011-03-21 11:25:25 +02:00
Tom Lane
82e4d45bd0 Suppress platform-dependent unused-variable warning.
The local variable "sock" can be unused depending on compilation flags.
But there seems no particular need for it, since the kernel calls can
just as easily say port->sock instead.
2011-03-20 13:34:31 -04:00
Tom Lane
176d5bae1d Fix up handling of C/POSIX collations.
Install just one instance of the "C" and "POSIX" collations into
pg_collation, rather than one per encoding.  Make these instances exist
and do something useful even in machines without locale_t support: to wit,
it's now possible to force comparisons and case-folding functions to use C
locale in an otherwise non-C database, whether or not the platform has
support for using any additional collations.

Fix up severely broken upper/lower/initcap functions, too: the C/POSIX
fastpath now does what it is supposed to, and non-default collations are
handled correctly in single-byte database encodings.

Merge the two separate collation hashtables that were being maintained in
pg_locale.c, and be more wary of the possibility that we fail partway
through filling a cache entry.
2011-03-20 12:44:13 -04:00
Tom Lane
b310b6e31c Revise collation derivation method and expression-tree representation.
All expression nodes now have an explicit output-collation field, unless
they are known to only return a noncollatable data type (such as boolean
or record).  Also, nodes that can invoke collation-aware functions store
a separate field that is the collation value to pass to the function.
This avoids confusion that arises when a function has collatable inputs
and noncollatable output type, or vice versa.

Also, replace the parser's on-the-fly collation assignment method with
a post-pass over the completed expression tree.  This allows us to use
a more complex (and hopefully more nearly spec-compliant) assignment
rule without paying for it in extra storage in every expression node.

Fix assorted bugs in the planner's handling of collations by making
collation one of the defining properties of an EquivalenceClass and
by converting CollateExprs into discardable RelabelType nodes during
expression preprocessing.
2011-03-19 20:30:08 -04:00
Magnus Hagander
6f9192df61 Rename ident authentication over local connections to peer
This removes an overloading of two authentication options where
one is very secure (peer) and one is often insecure (ident). Peer
is also the name used in libpq from 9.1 to specify the same type
of authentication.

Also make initdb select peer for local connections when ident is
chosen, and ident for TCP connections when peer is chosen.

ident keyword in pg_hba.conf is still accepted and maps to peer
authentication.
2011-03-19 18:44:35 +01:00
Robert Haas
fbcf4b92aa Fix possible "tuple concurrently updated" error in ALTER TABLE.
When adding an inheritance parent to a table, an AccessShareLock on the
parent isn't strong enough to prevent trouble, so take
ShareUpdateExclusiveLock instead.  Since this is a behavior change,
albeit a fairly unobtrusive one, and since we have only one report
from the field, no back-patch.

Report by Jon Nelson, analysis by Alvaro Herrera, fix by me.
2011-03-18 22:09:57 -04:00
Robert Haas
727589995a Move synchronous_standbys_defined updates from WAL writer to BG writer.
This is advantageous because the BG writer is alive until much later in
the shutdown sequence than WAL writer; we want to make sure that it's
possible to shut off synchronous replication during a smart shutdown,
else it might not be possible to complete the shutdown at all.

Per very reasonable gripes from Fujii Masao and Simon Riggs.
2011-03-18 21:43:45 -04:00
Robert Haas
7a37900443 Make synchronous replication query cancel/die messages more consistent.
Per a gripe from Thom Brown about my previous commit in this area,
commit 9a56dc3389.
2011-03-18 10:20:22 -04:00
Robert Haas
777e8c0015 Remove bogus semicolons in recoveryPausesHere.
Without this, the startup process goes into a tight loop, consuming
100% of one CPU and failing to respond to interrupts.
2011-03-18 08:09:09 -04:00
Robert Haas
02b1f84e7d Remove bogus comment. 2011-03-17 14:43:57 -04:00
Peter Eisentraut
8c0a5eb78a Raise maximum value of several timeout parameters
The maximum value of deadlock_timeout, max_standby_archive_delay,
max_standby_streaming_delay, log_min_duration_statement, and
log_autovacuum_min_duration was INT_MAX/1000 milliseconds, which is
about 35min, which is too short for some practical uses.  Raise the
maximum value to INT_MAX; the code that uses the parameters already
supports that just fine.
2011-03-17 20:19:51 +02:00
Robert Haas
84abea76f6 Add pause_at_recovery_target to recovery.conf.sample; improve docs.
Fujii Masao, but with the proposed behavior change reverted, and the
rest adjusted accordingly.
2011-03-17 14:04:11 -04:00
Robert Haas
9a56dc3389 Fix various possible problems with synchronous replication.
1. Don't ignore query cancel interrupts.  Instead, if the user asks to
cancel the query after we've already committed it, but before it's on
the standby, just emit a warning and let the COMMIT finish.

2. Don't ignore die interrupts (pg_terminate_backend or fast shutdown).
Instead, emit a warning message and close the connection without
acknowledging the commit.  Other backends will still see the effect of
the commit, but there's no getting around that; it's too late to abort
at this point, and ignoring die interrupts altogether doesn't seem like
a good idea.

3. If synchronous_standby_names becomes empty, wake up all backends
waiting for synchronous replication to complete.  Without this, someone
attempting to shut synchronous replication off could easily wedge the
entire system instead.

4. Avoid depending on the assumption that if a walsender updates
MyProc->syncRepState, we'll see the change even if we read it without
holding the lock.  The window for this appears to be quite narrow (and
probably doesn't exist at all on machines with strong memory ordering)
but protecting against it is practically free, so do that.

5. Remove useless state SYNC_REP_MUST_DISCONNECT, which isn't needed and
doesn't actually do anything.

There's still some further work needed here to make the behavior of fast
shutdown plausible, but that looks complex, so I'm leaving it for a
separate commit.  Review by Fujii Masao.
2011-03-17 13:12:21 -04:00
Tom Lane
72cfc17aef Improve handling of unknown-type literals in UNION/INTERSECT/EXCEPT.
This patch causes unknown-type Consts to be coerced to the resolved output
type of the set operation at parse time.  Formerly such Consts were left
alone until late in the planning stage.  The disadvantage of that approach
is that it disables some optimizations, because the planner sees the set-op
leaf query as having different output column types than the overall set-op.
We saw an example of that in a recent performance gripe from Claudio
Freire.

Fixing such a Const requires scribbling on the leaf query in
transformSetOperationTree, but that should be all right since if the leaf
query's semantics depended on that output column, it would already have
resolved the unknown to something else.

Most of the bulk of this patch is a simple adjustment of
transformSetOperationTree's API so that upper levels can get at the
TargetEntry containing a Const to be replaced: it now returns a list of
TargetEntries, instead of just the bare expressions.
2011-03-15 21:52:10 -04:00
Robert Haas
5ca4dfc79f Remove 13 keywords that are used only for ROLE options.
Review by Tom Lane.
2011-03-15 10:22:58 -04:00
Tom Lane
a2eb9e0c08 Simplify list traversal logic in add_path().
Its mechanism for recovering after deleting the current list cell was
a bit klugy.  Borrow the technique used in other places.
2011-03-13 12:57:14 -04:00
Tom Lane
696d1f7f06 Make all comparisons done for/with statistics use the default collation.
While this will give wrong answers when estimating selectivity for a
comparison operator that's using a non-default collation, the estimation
error probably won't be large; and anyway the former approach created
estimation errors of its own by trying to use a histogram that might have
been computed with some other collation.  So we'll adopt this simplified
approach for now and perhaps improve it sometime in the future.

This patch incorporates changes from Andres Freund to make sure that
selfuncs.c passes a valid collation OID to any datatype-specific function
it calls, in case that function wants collation information.  Said OID will
now always be DEFAULT_COLLATION_OID, but at least we won't get errors.
2011-03-12 16:30:36 -05:00
Bruce Momjian
94fe9c0f4e Use "backend process" rather than "backend server", where appropriate. 2011-03-12 09:38:56 -05:00
Bruce Momjian
3a3f39fdc0 Use macros for time-based constants, rather than constants. 2011-03-12 09:35:56 -05:00
Tom Lane
2a26639a5d On further reflection, we'd better do the same in int.c.
We previously heard of the same problem in int24div(), so there's not a
good reason to suppose the problem is confined to cases involving int8.
2011-03-11 19:04:02 -05:00
Tom Lane
72330995a5 Put in some more safeguards against executing a division-by-zero.
Add dummy returns before every potential division-by-zero in int8.c,
because apparently further "improvements" in gcc's optimizer have
enabled it to break functions that weren't broken before.

Aurelien Jarno, via Martin Pitt
2011-03-11 18:18:55 -05:00
Tom Lane
8acdb8bf9c Split CollateClause into separate raw and analyzed node types.
CollateClause is now used only in raw grammar output, and CollateExpr after
parse analysis.  This is for clarity and to avoid carrying collation names
in post-analysis parse trees: that's both wasteful and possibly misleading,
since the collation's name could be changed while the parsetree still
exists.

Also, clean up assorted infelicities and omissions in processing of the
node type.
2011-03-11 16:28:18 -05:00
Tom Lane
e3c732a85c Create an explicit concept of collations that work for any encoding.
Use collencoding = -1 to represent such a collation in pg_collation.
We need this to make the "default" entry work sanely, and a later
patch will fix the C/POSIX entries to be represented this way instead
of duplicating them across all encodings.  All lookup operations now
search first for an entry that's database-encoding-specific, and then
for the same name with collencoding = -1.

Also some incidental code cleanup in collationcmds.c and pg_collation.c.
2011-03-11 13:20:11 -05:00
Bruce Momjian
5ca543fb2e Clarify C comment that O_SYNC/O_FSYNC are really the same settting, as
opposed to O_DSYNC.
2011-03-10 20:02:52 -05:00
Tom Lane
7564654adf Revert addition of third argument to format_type().
Including collation in the behavior of that function promotes a world view
we do not want.  Moreover, it was producing the wrong behavior for pg_dump
anyway: what we want is to dump a COLLATE clause on attributes whose
attcollation is different from the underlying type, and likewise for
domains, and the function cannot do that for us.  Doing it the hard way
in pg_dump is a bit more tedious but produces more correct output.

In passing, fix initdb so that the initial entry in pg_collation is
properly pinned.  It was droppable before :-(
2011-03-10 17:30:46 -05:00
Robert Haas
551c07d84a Make error handling of synchronous_standby_names consistent.
It's not a good idea to kill the postmaster just because someone muffs
this, and it's not consistent with what we do for other, similar GUCs.

Fujii Masao, with a bit more hacking by me
2011-03-10 16:24:52 -05:00
Robert Haas
2e019c8611 More synchronous replication typo fixes.
Fujii Masao
2011-03-10 15:56:18 -05:00
Robert Haas
b8bb8dbf20 More synchronous replication tweaks.
SyncRepRequested() must check not only the value of the
synchronous_replication GUC but also whether max_wal_senders > 0.
Otherwise, we might end up waiting for sync rep even when there's no
possibility of a standby ever managing to connect.  There are some
existing cross-checks to prevent this, but they're not quite sufficient:
the user can start the server with max_wal_senders=0,
synchronous_standby_names='', and synchronous_replication=off and then
subsequent make synchronous_standby_names not empty using pg_ctl reload,
and then SET synchronous_standby=on, leading to an indefinite hang.

Along the way, rename the global variable for the synchronous_replication
GUC to match the name of the GUC itself, for clarity.

Report by Fujii Masao, though I didn't use his patch.
2011-03-10 15:43:37 -05:00
Robert Haas
6436098795 Minor sync rep corrections.
Fujii Masao, with a bit of additional wordsmithing by me.
2011-03-10 14:57:02 -05:00
Robert Haas
d16e290a8a Emit a LOG message when pausing at the recovery target.
Fujii Masao
2011-03-10 14:37:14 -05:00
Robert Haas
fcb99609b6 Replication README updates.
Fujii Masao
2011-03-10 08:59:59 -05:00
Itagaki Takahiro
2d8de0a50b Cleanup copyright years and file names in the header comments of some files. 2011-03-10 15:05:33 +09:00
Tom Lane
f6587019ed replication/repl_gram.h needs to be cleaned too ... 2011-03-10 00:12:38 -05:00
Tom Lane
174f65ab00 Fix some oversights in distprep and maintainer-clean targets.
At least two recent commits have apparently imagined that a comment in
a Makefile stating that something would be included in the distribution
tarball was sufficient to make it so.  They hadn't bothered to hook
into the upper maintainer-clean targets either.  Per bug #5923 from
Charles Johnson, in which it emerged that the 9.1alpha4 tarballs are
short a few files that should be there.
2011-03-10 00:04:05 -05:00
Bruce Momjian
76fdee31c4 Mention gcc version in C comment. 2011-03-09 23:41:13 -05:00
Tom Lane
a051ef699c Remove collation information from TypeName, where it does not belong.
The initial collations patch treated a COLLATE spec as part of a TypeName,
following what can only be described as brain fade on the part of the SQL
committee.  It's a lot more reasonable to treat COLLATE as a syntactically
separate object, so that it can be added in only the productions where it
actually belongs, rather than needing to reject it in a boatload of places
where it doesn't belong (something the original patch mostly failed to do).
In addition this change lets us meet the spec's requirement to allow
COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly
behavior for constructs such as "foo::type COLLATE collation".

To do this, pull collation information out of TypeName and put it in
ColumnDef instead, thus reverting most of the collation-related changes in
parse_type.c's API.  I made one additional structural change, which was to
use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd
nodes.  This provides enough room to get rid of the "transform" wart in
AlterTableCmd too, since the ColumnDef can carry the USING expression
easily enough.

Also fix some other minor bugs that have crept in in the same areas,
like failure to copy recently-added fields of ColumnDef in copyfuncs.c.

While at it, document the formerly secret ability to specify a collation
in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and
ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about
what the default collation selection will be when COLLATE is omitted.

BTW, the three-parameter form of format_type() should go away too,
since it just contributes to the confusion in this area; but I'll do
that in a separate patch.
2011-03-09 22:39:20 -05:00
Tom Lane
49a08ca1e9 Adjust the permissions required for COMMENT ON ROLE.
Formerly, any member of a role could change the role's comment, as of
course could superusers; but holders of CREATEROLE privilege could not,
unless they were also members.  This led to the odd situation that a
CREATEROLE holder could create a role but then could not comment on it.
It also seems a bit dubious to let an unprivileged user change his own
comment, let alone those of group roles he belongs to.  So, change the
rule to be "you must be superuser to comment on a superuser role, or
hold CREATEROLE to comment on non-superuser roles".  This is the same
as the privilege check for creating/dropping roles, and thus fits much
better with the rule for other object types, namely that only the owner
of an object can comment on it.

In passing, clean up the documentation for COMMENT a little bit.

Per complaint from Owen Jacobson and subsequent discussion.
2011-03-09 11:28:34 -05:00
Tom Lane
3f7d24da16 Add missing keywords to gram.y's unreserved_keywords list.
We really need an automated check for this ... and did VALIDATE really
need to become a keyword at all, rather than picking some other syntax
using existing keywords?
2011-03-08 16:43:56 -05:00
Heikki Linnakangas
46c333a963 Fix overly strict assertion in SummarizeOldestCommittedSxact(). There's a
race condition where SummarizeOldestCommittedSxact() is called even though
another backend already cleared out all finished sxact entries. That's OK,
RegisterSerializableTransactionInt() can just retry getting a news xact
slot from the available-list when that happens.

Reported by YAMAMOTO Takashi, bug #5918.
2011-03-08 21:06:26 +02:00
Heikki Linnakangas
93d888232e Don't throw a warning if vacuum sees PD_ALL_VISIBLE flag set on a page that
contains newly-inserted tuples that according to our OldestXmin are not
yet visible to everyone. The value returned by GetOldestXmin() is conservative,
and it can move backwards on repeated calls, so if we see that contradiction
between the PD_ALL_VISIBLE flag and status of tuples on the page, we have to
assume it's because an earlier vacuum calculated a higher OldestXmin value,
and all the tuples really are visible to everyone.

We have received several reports of this bug, with the "PD_ALL_VISIBLE flag
was incorrectly set in relation ..." warning appearing in logs. We were
finally able to hunt it down with David Gould's help to run extra diagnostics
in an environment where this happened frequently.

Also reword the warning, per Robert Haas' suggestion, to not imply that the
PD_ALL_VISIBLE flag is necessarily at fault, as it might also be a symptom
of corruption on a tuple header.

Backpatch to 8.4, where the PD_ALL_VISIBLE flag was introduced.
2011-03-08 20:30:53 +02:00
Heikki Linnakangas
4cd3fb6e12 Truncate predicate lock manager's SLRU lazily at checkpoint. That's safer
than doing it aggressively whenever the tail-XID pointer is advanced, because
this way we don't need to do it while holding SerializableXactHashLock.

This also fixes bug #5915 spotted by YAMAMOTO Takashi, and removes an
obsolete comment spotted by Kevin Grittner.
2011-03-08 12:12:54 +02:00
Heikki Linnakangas
1a4ab9ec23 If recovery_target_timeline is set to 'latest' and standby mode is enabled,
periodically rescan the archive for new timelines, while waiting for new WAL
segments to arrive. This allows you to set up a standby server that follows
the TLI change if another standby server is promoted to master. Before this,
you had to restart the standby server to make it notice the new timeline.

This patch only scans the archive for TLI changes, it won't follow a TLI
change in streaming replication. That is much needed too, but it would be a
much bigger patch than I dare to sneak in this late in the release cycle.

There was discussion on improving the sanity checking of the WAL segments so
that the system would notice more reliably if the new timeline isn't an
ancestor of the current one, but that is not included in this patch.

Reviewed by Fujii Masao.
2011-03-07 21:14:47 +02:00
Tom Lane
7193a90fc1 Zero out vacuum_count and related counters in pgstat_recv_tabstat().
This fixes an oversight in commit 946045f04d
of 2010-08-21, as reported by Itagaki Takahiro.  Also a couple of minor
cosmetic adjustments.
2011-03-07 11:17:47 -05:00
Heikki Linnakangas
97e3dacd84 Begin error message with lower-case letter. 2011-03-07 10:41:13 +02:00
Heikki Linnakangas
baabf05196 Silence compiler warning about undefined function when compiling without
assertions.
2011-03-07 09:56:53 +02:00
Simon Riggs
cae4974e3d Dynamic array required within pg_stat_replication. 2011-03-07 00:26:30 +00:00
Simon Riggs
966fb05b58 Add new files for syncrep missed in previous commit 2011-03-06 23:39:14 +00:00
Simon Riggs
a8a8a3e096 Efficient transaction-controlled synchronous replication.
If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.

This version has very strict behaviour; more relaxed options
may be added at a later date.

Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime
Casanova, Heikki Linnakangas and Robert Haas, plus the assistance
of many other design reviewers.
2011-03-06 22:49:16 +00:00
Tom Lane
149b2673c2 Fix incorrect access to pg_index.indcollation.
Since this field is after a variable-length field, it can't simply be
accessed via the C struct for pg_index.  Fortunately, the relcache already
did the dirty work of pulling the information out to where it can be
accessed easily, so this is a one-line fix.

Andres Freund
2011-03-06 12:10:50 -05:00
Peter Eisentraut
9650364b7b Update of SQL feature conformance 2011-03-05 17:03:21 +02:00
Tom Lane
63b656b7bf Create extension infrastructure for the core procedural languages.
This mostly just involves creating control, install, and
update-from-unpackaged scripts for them.  However, I had to adjust plperl
and plpython to not share the same support functions between variants,
because we can't put the same function into multiple extensions.

catversion bump forced due to new contents of pg_pltemplate, and because
initdb now installs plpgsql as an extension not a bare language.

Add support for regression testing these as extensions not bare
languages.

Fix a couple of other issues that popped up while testing this: my initial
hack at pg_dump binary-upgrade support didn't work right, and we don't want
an extra schema permissions test after all.

Documentation changes still to come, but I'm committing now to see
whether the MSVC build scripts need work (likely they do).
2011-03-04 21:51:14 -05:00
Robert Haas
efa415da8c Refactor seclabel.c to use the new check_object_ownership function.
This avoids duplicate (and not-quite-matching) code, and makes the logic
for SECURITY LABEL match COMMENT and ALTER EXTENSION ADD/DROP.
2011-03-04 17:26:37 -05:00
Peter Eisentraut
b9cff97fdf Don't allow CREATE TABLE AS to create a column with invalid collation
It is possible that an expression ends up with a collatable type but
without a collation.  CREATE TABLE AS could then create a table based
on that.  But such a column cannot be dumped with valid SQL syntax, so
we disallow creating such a column.

per test report from Noah Misch
2011-03-04 23:42:07 +02:00
Tom Lane
8d3b421f5f Allow non-superusers to create (some) extensions.
Remove the unconditional superuser permissions check in CREATE EXTENSION,
and instead define a "superuser" extension property, which when false
(not the default) skips the superuser permissions check.  In this case
the calling user only needs enough permissions to execute the commands
in the extension's installation script.  The superuser property is also
enforced in the same way for ALTER EXTENSION UPDATE cases.

In other ALTER EXTENSION cases and DROP EXTENSION, test ownership of
the extension rather than superuserness.  ALTER EXTENSION ADD/DROP needs
to insist on ownership of the target object as well; to do that without
duplicating code, refactor comment.c's big switch for permissions checks
into a separate function in objectaddress.c.

I also removed the superuserness checks in pg_available_extensions and
related functions; there's no strong reason why everybody shouldn't
be able to see that info.

Also invent an IF NOT EXISTS variant of CREATE EXTENSION, and use that
in pg_dump, so that dumps won't fail for installed-by-default extensions.
We don't have any of those yet, but we will soon.

This is all per discussion of wrapping the standard procedural languages
into extensions.  I'll make those changes in a separate commit; this is
just putting the core infrastructure in place.
2011-03-04 16:08:53 -05:00
Peter Eisentraut
4442e1975d When creating a collation, check that the locales can be loaded
This is the same check that would happen later when the collation is
used, but it's friendlier to check the collation already when it is
created.
2011-03-04 22:14:37 +02:00
Heikki Linnakangas
ee3838b1d3 You must hold a lock on the heap page when you call
CheckForSerializableConflictOut(), because it can set hint bits.

YAMAMOTO Takashi
2011-03-04 15:43:11 +02:00
Tom Lane
6252c4f9e2 Run a portal's cleanup hook immediately when pushing it to DONE state.
This works around the problem noted by Yamamoto Takashi in bug #5906,
that there were code paths whereby we could reach AtCleanup_Portals
with a portal's cleanup hook still unexecuted.  The changes I made
a few days ago were intended to prevent that from happening, and
I think that on balance it's still a good thing to avoid, so I don't
want to remove the Assert in AtCleanup_Portals.  Hence do this instead.
2011-03-03 13:04:06 -05:00
Peter Eisentraut
091bda0188 Add collations to information_schema.usage_privileges
This is faked information like for domains.
2011-03-02 23:17:56 +02:00
Heikki Linnakangas
6eba5a7c57 Change pg_last_xlog_receive_location() not to move backwards. That makes
it a lot more useful for determining which standby is most up-to-date,
for example. There was long discussions on whether overwriting existing
existing WAL makes sense to begin with, and whether we should do some more
extensive variable renaming, but this change nevertheless seems quite
uncontroversial.

Fujii Masao, reviewed by Jeff Janes, Robert Haas, Stephen Frost.
2011-03-01 20:54:35 +02:00
Heikki Linnakangas
47ad79122b Fix bugs in Serializable Snapshot Isolation.
Change the way UPDATEs are handled. Instead of maintaining a chain of
tuple-level locks in shared memory, copy any existing locks on the old
tuple to the new tuple at UPDATE. Any existing page-level lock needs to
be duplicated too, as a lock on the new tuple. That was neglected
previously.

Store xmin on tuple-level predicate locks, to distinguish a lock on an old
already-recycled tuple from a new tuple at the same physical location.
Failure to distinguish them caused loops in the tuple-lock chains, as
reported by YAMAMOTO Takashi. Although we don't use the chain representation
of UPDATEs anymore, it seems like a good idea to store the xmin to avoid
some false positives if no other reason.

CheckSingleTargetForConflictsIn now correctly handles the case where a lock
that's being held is not reflected in the local lock table. That happens
if another backend acquires a lock on our behalf due to an UPDATE or a page
split.

PredicateLockPageCombine now retains locks for the page that is being
removed, rather than removing them. This prevents a potentially dangerous
false-positive inconsistency where the local lock table believes that a lock
is held, but it is actually not.

Dan Ports and Kevin Grittner
2011-03-01 19:05:16 +02:00
Tom Lane
97c4ee94ad Include the target table in EXPLAIN output for ModifyTable nodes.
Per discussion, this seems important for plans involving writable CTEs,
since there can now be more than one ModifyTable node in the plan.

To retain the same formatting as for target tables of scan nodes, we
show only one target table, which will be the parent table in case of
an UPDATE or DELETE on an inheritance tree.  Individual child tables
can be determined by inspecting the child plan trees if needed.
2011-03-01 11:37:01 -05:00
Robert Haas
59d6a75942 Avoid excessive Hot Standby feedback messages.
Without this patch, when wal_receiver_status_interval=0, indicating that no
status messages should be sent, Hot Standby feedback messages are instead sent
extremely frequently.

Fujii Masao, with documentation changes by me.
2011-03-01 11:34:25 -05:00
Tom Lane
c0b0076036 Rearrange snapshot handling to make rule expansion more consistent.
With this patch, portals, SQL functions, and SPI all agree that there
should be only a CommandCounterIncrement between the queries that are
generated from a single SQL command by rule expansion.  Fetching a whole
new snapshot now happens only between original queries.  This is equivalent
to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the
best choice since it eliminates one source of concurrency hazards for
rules.  The patch should also make things marginally faster by reducing the
number of snapshot push/pop operations.

The patch removes pg_parse_and_rewrite(), which is no longer used anywhere.
There was considerable discussion about more aggressive refactoring of the
query-processing functions exported by postgres.c, but for the moment
nothing more has been done there.

I also took the opportunity to refactor snapmgr.c's API slightly: the
former PushUpdatedSnapshot() has been split into two functions.

Marko Tiikkaja, reviewed by Steve Singer and Tom Lane
2011-02-28 23:28:06 -05:00
Robert Haas
92c30fd2ed Rename pg_stat_replication.apply_location to replay_location.
For consistency with pg_last_xlog_replay_location.  Per discussion.
2011-02-28 12:49:57 -05:00
Tom Lane
a874fe7b4c Refactor the executor's API to support data-modifying CTEs better.
The originally committed patch for modifying CTEs didn't interact well
with EXPLAIN, as noted by myself, and also had corner-case problems with
triggers, as noted by Dean Rasheed.  Those problems show it is really not
practical for ExecutorEnd to call any user-defined code; so split the
cleanup duties out into a new function ExecutorFinish, which must be called
between the last ExecutorRun call and ExecutorEnd.  Some Asserts have been
added to these functions to help verify correct usage.

It is no longer necessary for callers of the executor to call
AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now
done by ExecutorStart/ExecutorFinish respectively.  If you really need to
suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to
ExecutorStart.

Also, refactor portal commit processing to allow for the possibility that
PortalDrop will invoke user-defined code.  I think this is not actually
necessary just yet, since the portal-execution-strategy logic forces any
non-pure-SELECT query to be run to completion before we will consider
committing.  But it seems like good future-proofing.
2011-02-27 13:44:12 -05:00
Bruce Momjian
67a5e727c8 Be less detailed about reporting shared memory failure by avoiding the
output of actual Postgres parameter _values_ related to shared memory,
and suggesting that these are only possible parameters to reduce.
2011-02-27 12:21:58 -05:00
Heikki Linnakangas
be6668d6ef Increase the default for wal_sender_delay from 200ms to 1s. Now that WAL
sender is immediately woken up by transaction commit, there's no need to
wake up so aggressively.
2011-02-26 23:38:25 +02:00
Tom Lane
000128bc7f Fix order of shutdown processing when CTEs contain inter-references.
We need ExecutorEnd to run the ModifyTable nodes to completion in
reverse order of initialization, not forward order.  Easily done
by constructing the list back-to-front.
2011-02-25 23:53:34 -05:00
Tom Lane
389af95155 Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH.
This patch implements data-modifying WITH queries according to the
semantics that the updates all happen with the same command counter value,
and in an unspecified order.  Therefore one WITH clause can't see the
effects of another, nor can the outer query see the effects other than
through the RETURNING values.  And attempts to do conflicting updates will
have unpredictable results.  We'll need to document all that.

This commit just fixes the code; documentation updates are waiting on
author.

Marko Tiikkaja and Hitoshi Harada
2011-02-25 18:58:02 -05:00
Robert Haas
79ad8fc5f8 Named restore point improvements.
Emit a log message when creating a named restore point, and improve
documentation for pg_create_restore_point().

Euler Taveira de Oliveira, 	per suggestions from Thom Brown, with some
additional wordsmithing by me.
2011-02-24 19:02:00 -05:00
Tom Lane
bdca82f44d Add a relkind field to RangeTblEntry to avoid some syscache lookups.
The recent additions for FDW support required checking foreign-table-ness
in several places in the parse/plan chain.  While it's not clear whether
that would really result in a noticeable slowdown, it seems best to avoid
any performance risk by keeping a copy of the relation's relkind in
RangeTblEntry.  That might have some other uses later, anyway.
Per discussion.
2011-02-22 19:24:40 -05:00
Peter Eisentraut
1c51c7d5ff Add PL/Python functions for quoting strings
Add functions plpy.quote_ident, plpy.quote_literal,
plpy.quote_nullable, which wrap the equivalent SQL functions.

To be able to propagate char * constness properly, make the argument
of quote_literal_cstr() const char *.  This also makes it more
consistent with quote_identifier().

Jan Urbański, reviewed by Hitoshi Harada, some refinements by Peter
Eisentraut
2011-02-22 23:41:23 +02:00
Robert Haas
3e6b305d9e Fix a couple of unlogged tables goofs.
"SELECT ... INTO UNLOGGED tabname" works, but wasn't documented; CREATE
UNLOGGED SEQUENCE and CREATE UNLOGGED VIEW failed an assertion, instead
of throwing a sensible error.

Latter issue reported by Itagaki Takahiro; patch review by Tom Lane.
2011-02-22 14:46:19 -05:00
Tom Lane
1ab9b012bd Allow binary I/O of type "void".
void_send is useful for the same reason that void_out doesn't throw error,
namely that someone might do "select void_returning_func(...)"  from a
client that prefers to operate in binary mode.  The void_recv function may
or may not have any practical use, but we provide it for symmetry.

Radosław Smogura
2011-02-22 13:08:22 -05:00
Tom Lane
2e852e541c Remove ExecRemoveJunk(), which is no longer used anywhere.
This was a leftover from the pre-8.1 design of junkfilters.  It doesn't
seem to have any reason to live, since it's merely a combination of two
easy function calls, and not a well-designed combination at that (it
encourages callers to leak the result tuple).
2011-02-21 21:41:08 -05:00
Tom Lane
a210be7720 Fix dangling-pointer problem in before-row update trigger processing.
ExecUpdate checked for whether ExecBRUpdateTriggers had returned a new
tuple value by seeing if the returned tuple was pointer-equal to the old
one.  But the "old one" was in estate->es_junkFilter's result slot, which
would be scribbled on if we had done an EvalPlanQual update in response to
a concurrent update of the target tuple; therefore we were comparing a
dangling pointer to a live one.  Given the right set of circumstances we
could get a false match, resulting in not forcing the tuple to be stored in
the slot we thought it was stored in.  In the case reported by Maxim Boguk
in bug #5798, this led to "cannot extract system attribute from virtual
tuple" failures when trying to do "RETURNING ctid".  I believe there is a
very-low-probability chance of more serious errors, such as generating
incorrect index entries based on the original rather than the
trigger-modified version of the row.

In HEAD, change all of ExecBRInsertTriggers, ExecIRInsertTriggers,
ExecBRUpdateTriggers, and ExecIRUpdateTriggers so that they continue to
have similar APIs.  In the back branches I just changed
ExecBRUpdateTriggers, since there is no bug in the ExecBRInsertTriggers
case.
2011-02-21 21:19:50 -05:00
Itagaki Takahiro
ca9cf85d54 Fix pg_server_to_client, that was broken in the previous commit. 2011-02-21 16:27:57 +09:00
Itagaki Takahiro
3cba8240a1 Add ENCODING option to COPY TO/FROM and file_fdw.
File encodings can be specified separately from client encoding.
If not specified, client encoding is used for backward compatibility.

Cases when the encoding doesn't match client encoding are slower
than matched cases because we don't have conversion procs for other
encodings. Performance improvement would be be a future work.

Original patch by Hitoshi Harada, and modified by me.
2011-02-21 14:32:40 +09:00
Tom Lane
7c5d0ae707 Add contrib/file_fdw foreign-data wrapper for reading files via COPY.
This is both very useful in its own right, and an important test case
for the core FDW support.

This commit includes a small refactoring of copy.c to expose its option
checking code as a separately callable function.  The original patch
submission duplicated hundreds of lines of that code, which seemed pretty
unmaintainable.

Shigeru Hanada, reviewed by Itagaki Takahiro and Tom Lane
2011-02-20 14:06:59 -05:00
Tom Lane
bb74240794 Implement an API to let foreign-data wrappers actually be functional.
This commit provides the core code and documentation needed.  A contrib
module test case will follow shortly.

Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
2011-02-20 00:18:14 -05:00
Tom Lane
327e025071 Create the catalog infrastructure for foreign-data-wrapper handlers.
Add a fdwhandler column to pg_foreign_data_wrapper, plus HANDLER options
in the CREATE FOREIGN DATA WRAPPER and ALTER FOREIGN DATA WRAPPER commands,
plus pg_dump support for same.  Also invent a new pseudotype fdw_handler
with properties similar to language_handler.

This is split out of the "FDW API" patch for ease of review; it's all stuff
we will certainly need, regardless of any other details of the FDW API.
FDW handler functions will not actually get called yet.

In passing, fix some omissions and infelicities in foreigncmds.c.

Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
2011-02-19 00:07:15 -05:00
Tom Lane
82220e8832 Un-break building with BTREE_BUILD_STATS.
This has been broken for awhile, but not clear it's worth back-patching.

Euler Taveira de Oliveira
2011-02-18 14:06:16 -05:00
Simon Riggs
bc76695c4c Make a hard state change from catchup to streaming mode.
More useful state change for monitoring purposes, plus a
required change for synchronous replication patch.
2011-02-18 15:07:26 +00:00
Simon Riggs
06828c5feb Separate messages for standby replies and hot standby feedback.
Allow messages to be sent at different times, and greatly reduce
the frequency of hot standby feedback. Refactor to allow additional
message types.
2011-02-18 11:31:49 +00:00
Magnus Hagander
45a6d79b17 Properly initialize variables
Kevin Grittner
2011-02-18 11:59:57 +01:00
Michael Meskes
bc423879cc Applied a patch by Zoltán Böszörményi that makes ecpg's parser accept dynamic cursornames even in WHERE CURRENT OF clauses. 2011-02-18 11:16:16 +01:00
Itagaki Takahiro
5c63982af2 Fix an uninitialized field in DR_copy.
Shigeru HANADA
2011-02-18 14:32:19 +09:00
Itagaki Takahiro
62c7bd31c8 Add transaction-level advisory locks.
They share the same locking namespace with the existing session-level
advisory locks, but they are automatically released at the end of the
current transaction and cannot be released explicitly via unlock
functions.

Marko Tiikkaja, reviewed by me.
2011-02-18 14:05:12 +09:00
Tom Lane
52b60530f2 Fix tsmatchsel() to account properly for null rows.
ts_typanalyze.c computes MCE statistics as fractions of the non-null rows,
which seems fairly reasonable, and anyway changing it in released versions
wouldn't be a good idea.  But then ts_selfuncs.c has to account for that.
Failure to do so results in overestimates in columns with a significant
fraction of null documents.  Back-patch to 8.4 where this stuff was
introduced.

Jesper Krogh
2011-02-17 19:00:49 -05:00
Robert Haas
4a25bc145a Add client_hostname field to pg_stat_activity.
Peter Eisentraut, reviewed by Steve Singer, Alvaro Herrera, and me.
2011-02-17 16:03:28 -05:00
Robert Haas
a3e8486dff Prevent possible compiler warnings.
Simon Riggs reports that rnode.dbNode and rnode.spcNode were
generating unused variable warnings on gcc 4.4.3 with CFLAGS=-O1
2011-02-17 16:01:46 -05:00
Robert Haas
f196738534 Add some words of caution to elog.c.
Stephen Frost, somewhat rewritten by me
2011-02-17 10:29:42 -05:00
Tom Lane
93016983d1 Fix blatantly uninitialized variable in recent commit.
Doesn't anybody around here pay attention to compiler warnings?
2011-02-16 19:53:20 -05:00
Tom Lane
a2095f7fb5 Fix bogus test for hypothetical indexes in get_actual_variable_range().
That function was supposing that indexoid == 0 for a hypothetical index,
but that is not likely to be true in any non-toy implementation of an index
adviser, since assigning a fake OID is the only way to know at EXPLAIN time
which hypothetical index got selected.  Fix by adding a flag to
IndexOptInfo to mark hypothetical indexes.  Back-patch to 9.0 where
get_actual_variable_range() was added.

Gurjeet Singh
2011-02-16 19:24:45 -05:00
Tom Lane
6595dd04d1 Add backwards-compatible declarations of some core GIN support functions.
These are needed to support reloading dumps of 9.0 installations containing
contrib/intarray or contrib/tsearch2.  Since not only regular dump/reload
but binary upgrade would fail, it seems worth the trouble to carry these
stubs for awhile.  Note that the contrib opclasses referencing these
functions will still work fine, since GIN doesn't actually pay any
attention to the declared signature of a support function.
2011-02-16 17:24:46 -05:00
Simon Riggs
bca8b7f16a Hot Standby feedback for avoidance of cleanup conflicts on standby.
Standby optionally sends back information about oldestXmin of queries
which is then checked and applied to the WALSender's proc->xmin.
GetOldestXmin() is modified slightly to agree with GetSnapshotData(),
so that all backends on primary include WALSender within their snapshots.
Note this does nothing to change the snapshot xmin on either master or
standby. Feedback piggybacks on the standby reply message.
vacuum_defer_cleanup_age is no longer used on standby, though parameter
still exists on primary, since some use cases still exist.

Simon Riggs, review comments from Fujii Masao, Heikki Linnakangas, Robert Haas
2011-02-16 19:29:37 +00:00
Tom Lane
65076269ea Make a no-op ALTER EXTENSION UPDATE give just a NOTICE, not ERROR.
This seems a bit more user-friendly.
2011-02-16 12:40:31 -05:00
Robert Haas
3a087369c0 WAL receiver shouldn't try to send a reply when dying.
Per report from, and discussion with, Fujii Masao.
2011-02-16 10:27:35 -05:00
Tom Lane
6e02755b22 Add FOREACH IN ARRAY looping to plpgsql.
(I'm not entirely sure that we've finished bikeshedding the syntax details,
but the functionality seems OK.)

Pavel Stehule, reviewed by Stephen Frost and Tom Lane
2011-02-16 01:53:03 -05:00
Robert Haas
4695da5ae9 pg_ctl promote
Fujii Masao, reviewed by Robert Haas, Stephen Frost, and Magnus Hagander.
2011-02-15 21:30:23 -05:00
Itagaki Takahiro
8ddc05fb01 Export the external file reader used in COPY FROM as APIs.
They are expected to be used by extension modules like file_fdw.
There are no user-visible changes.

Itagaki Takahiro
Reviewed and tested by Kevin Grittner and Noah Misch.
2011-02-16 11:19:11 +09:00
Tom Lane
eff027c432 Add CheckTableNotInUse calls in DROP TABLE and DROP INDEX.
Recent releases had a check on rel->rd_refcnt in heap_drop_with_catalog,
but failed to cover the possibility of pending trigger events at DROP time.
(Before 8.4 we didn't even check the refcnt.)  When the trigger events were
eventually fired, you'd get "could not open relation with OID nnn" errors,
as in recent report from strk.  Better to throw a suitable error when the
DROP is attempted.

Also add a similar check in DROP INDEX.

Back-patch to all supported branches.
2011-02-15 15:50:48 -05:00
Robert Haas
883a9659fa Assorted corrections to the patch to add WAL receiver replies.
Per reports from Fujii Masao.
2011-02-15 12:05:00 -05:00
Robert Haas
6a77e9385e Rename max_predicate_locks_per_transaction.
The new name, max_pred_locks_per_transaction, is shorter.

Kevin Grittner, per discussion.
2011-02-15 08:04:55 -05:00
Robert Haas
0d90dc16f8 Avoid a few more SET DATA TYPE table rewrites.
When the new type is an unconstrained domain over the old type, we don't
need to rewrite the table.

Noah Misch and Robert Haas
2011-02-14 23:40:05 -05:00
Robert Haas
8e1124eeeb Delete stray word from comment. 2011-02-14 22:38:08 -05:00
Simon Riggs
5c588be729 PITR can stop at a named restore point when recovery target = time
though must not update the last transaction timestamp.
Plus comment and message cleanup for recent named restore point.

Fujii Masao, minor changes by me
2011-02-15 00:51:39 +00:00
Tom Lane
555353c0c5 Rearrange extension-related views as per recent discussion.
The original design of pg_available_extensions did not consider the
possibility of version-specific control files.  Split it into two views:
pg_available_extensions shows information that is generic about an
extension, while pg_available_extension_versions shows all available
versions together with information that could be version-dependent.
Also, add an SRF pg_extension_update_paths() to assist in checking that
a collection of update scripts provide sane update path sequences.
2011-02-14 19:22:36 -05:00
Tom Lane
e693e97d75 Support replacing MODULE_PATHNAME during extension script file execution.
This avoids the need to find a way to make PGXS' .sql.in-to-.sql rule
insert the right thing.  We'll just deprecate use of that hack for
extensions.
2011-02-13 22:54:43 -05:00
Tom Lane
27d5d7ab10 Change the naming convention for extension files to use double dashes.
This allows us to have an unambiguous rule for deconstructing the names
of script files and secondary control files, without having to forbid
extension and version names from containing any dashes.  We do have to
forbid them from containing double dashes or leading/trailing dashes,
but neither restriction is likely to bother anyone in practice.
Per discussion, this seems like a better solution overall than the
original design.
2011-02-13 22:54:42 -05:00
Tom Lane
6c2e734f0a Refactor ALTER EXTENSION UPDATE to have cleaner multi-step semantics.
This change causes a multi-step update sequence to behave exactly as if the
updates had been commanded one at a time, including updating the "requires"
dependencies afresh at each step.  The initial implementation took the
shortcut of examining only the final target version's "requires" and
changing the catalog entry but once.  But on reflection that's a bad idea,
since it could lead to executing old update scripts under conditions
different than they were designed/tested for.  Better to expend a few extra
cycles and avoid any surprises.

In the same spirit, if a CREATE EXTENSION FROM operation involves applying
a series of update files, it will act as though the CREATE had first been
done using the initial script's target version and then the additional
scripts were invoked with ALTER EXTENSION UPDATE.

I also removed the restriction about not changing encoding in secondary
control files.  The new rule is that a script is assumed to be in whatever
encoding the control file(s) specify for its target version.  Since this
reimplementation causes us to read each intermediate version's control
file, there's no longer any uncertainty about which encoding setting would
get applied.
2011-02-12 16:40:41 -05:00
Bruce Momjian
0de0cc150a Properly handle Win32 paths of 'E:abc', which can be either absolute or
relative, by creating a function path_is_relative_and_below_cwd() to
check for specific requirements.  It is unclear if this fixes a security
problem or not but the new code is more robust.
2011-02-12 09:47:51 -05:00
Peter Eisentraut
b313bca0af DDL support for collations
- collowner field
- CREATE COLLATION
- ALTER COLLATION
- DROP COLLATION
- COMMENT ON COLLATION
- integration with extensions
- pg_dump support for the above
- dependency management
- psql tab completion
- psql \dO command
2011-02-12 15:55:18 +02:00
Robert Haas
d31e2a495b Teach ALTER TABLE .. SET DATA TYPE to avoid some table rewrites.
When the old type is binary coercible to the new type and the using
clause does not change the column contents, we can avoid a full table
rewrite, though any indexes on the affected columns will still need
to be rebuilt.  This applies, for example, when changing a varchar
column to be of type text.

The prior coding assumed that the set of operations that force a
rewrite is identical to the set of operations that must be propagated
to tables making use of the affected table's rowtype.  This is
no longer true: even though the tuples in those tables wouldn't
need to be modified, the data type change invalidate indexes built
using those composite type columns.  Indexes on the table we're
actually modifying can be invalidated too, of course, but the
existing machinery is sufficient to handle that case.

Along the way, add some debugging messages that make it possible
to understand what operations ALTER TABLE is actually performing
in these cases.

Noah Misch and Robert Haas
2011-02-12 08:27:55 -05:00
Tom Lane
24d1280c4d Clean up installation directory choices for extensions.
Arrange for the control files to be in $SHAREDIR/extension not
$SHAREDIR/contrib, since we're generally trying to deprecate the term
"contrib" and this is a once-in-many-moons opportunity to get rid of it in
install paths.  Fix PGXS to install the $EXTENSION file into that directory
no matter what MODULEDIR is set to; a nondefault MODULEDIR should only
affect the script and secondary extension files.  Fix the control file
directory parameter to be interpreted relative to $SHAREDIR, to avoid a
surprising disconnect between how you specify that and what you set
MODULEDIR to.

Per discussion with David Wheeler.
2011-02-11 22:53:43 -05:00
Tom Lane
1214749901 Add support for multiple versions of an extension and ALTER EXTENSION UPDATE.
This follows recent discussions, so it's quite a bit different from
Dimitri's original.  There will probably be more changes once we get a bit
of experience with it, but let's get it in and start playing with it.

This is still just core code.  I'll start converting contrib modules
shortly.

Dimitri Fontaine and Tom Lane
2011-02-11 21:25:57 -05:00
Alvaro Herrera
60141eefaf Fix comment recently obsoleted 2011-02-11 19:42:51 -03:00
Robert Haas
d309acf201 Typo fixes. receivedUpto should be capitalized consistently. 2011-02-11 11:55:12 -05:00
Robert Haas
2c20ba1fd2 Tweak find_composite_type_dependencies API a bit more.
Per discussion with Noah Misch, the previous coding, introduced by
my commit 65377e0b9c on 2011-02-06,
was really an abuse of RELKIND_COMPOSITE_TYPE, since the caller in
typecmds.c is actually passing the name of a domain.  So go back
having a type name argument, but make the first argument a Relation
rather than just a string so we can tell whether it's a table or
a foreign table and emit the proper error message.
2011-02-11 08:47:38 -05:00
Tom Lane
01467d3e4f Extend "ALTER EXTENSION ADD object" to permit "DROP object" as well.
Per discussion, this is something we should have sooner rather than later,
and it doesn't take much additional code to support it.
2011-02-10 17:37:22 -05:00
Bruce Momjian
135724ec35 Fix "variable not used" warnings when USE_WIDE_UPPER_LOWER is not
defined.
2011-02-10 16:58:02 -05:00
Peter Eisentraut
ff81aa3eda Update comment
It was still claiming that the keyword list is in keywords.c, when it
is now in kwlist.h.
2011-02-10 22:49:46 +02:00
Heikki Linnakangas
b186523fd9 Send status updates back from standby server to master, indicating how far
the standby has written, flushed, and applied the WAL. At the moment, this
is for informational purposes only, the values are only shown in
pg_stat_replication system view, but in the future they will also be needed
for synchronous replication.

Extracted from Simon riggs' synchronous replication patch by Robert Haas, with
some tweaking by me.
2011-02-10 21:04:02 +02:00
Magnus Hagander
4c468b37a2 Track last time for statistics reset on databases and bgwriter
Tracks one counter for each database, which is reset whenever
the statistics for any individual object inside the database is
reset, and one counter for the background writer.

Tomas Vondra, reviewed by Greg Smith
2011-02-10 15:14:04 +01:00
Heikki Linnakangas
cecb5901b8 Allocate all entries in the serializable xid hash up-front, so that you don't
run out of shared memory when you try to assign an xid to a transaction.

Kevin Grittner
2011-02-10 12:03:21 +02:00
Tom Lane
e617f0d7e4 Fix improper matching of resjunk column names for FOR UPDATE in subselect.
Flattening of subquery range tables during setrefs.c could lead to the
rangetable indexes in PlanRowMark nodes not matching up with the column
names previously assigned to the corresponding resjunk ctid (resp. tableoid
or wholerow) columns.  Typical symptom would be either a "cannot extract
system attribute from virtual tuple" error or an Assert failure.  This
wasn't a problem before 9.0 because we didn't support FOR UPDATE below the
top query level, and so the final flattening could never renumber an RTE
that was relevant to FOR UPDATE.  Fix by using a plan-tree-wide unique
number for each PlanRowMark to label the associated resjunk columns, so
that the number need not change during flattening.

Per report from David Johnston (though I'm darned if I can see how this got
past initial testing of the relevant code).  Back-patch to 9.0.
2011-02-09 23:27:42 -05:00
Tom Lane
caddcb8f4b Fix pg_upgrade to handle extensions.
This follows my proposal of yesterday, namely that we try to recreate the
previous state of the extension exactly, instead of allowing CREATE
EXTENSION to run a SQL script that might create some entirely-incompatible
on-disk state.  In --binary-upgrade mode, pg_dump won't issue CREATE
EXTENSION at all, but instead uses a kluge function provided by
pg_upgrade_support to recreate the pg_extension row (and extension-level
pg_depend entries) without creating any member objects.  The member objects
are then restored in the same way as if they weren't members, in particular
using pg_upgrade's normal hacks to preserve OIDs that need to be preserved.
Then, for each member object, ALTER EXTENSION ADD is issued to recreate the
pg_depend entry that marks it as an extension member.

In passing, fix breakage in pg_upgrade's enum-type support: somebody didn't
fix it when the noise word VALUE got added to ALTER TYPE ADD.  Also,
rationalize parsetree representation of COMMENT ON DOMAIN and fix
get_object_address() to allow OBJECT_DOMAIN.
2011-02-09 19:18:08 -05:00
Peter Eisentraut
2e2d56fea9 Information schema views for collation support
Add the views character_sets, collations, and
collation_character_set_applicability.
2011-02-09 23:26:48 +02:00
Tom Lane
5bc178b89f Implement "ALTER EXTENSION ADD object".
This is an essential component of making the extension feature usable;
first because it's needed in the process of converting an existing
installation containing "loose" objects of an old contrib module into
the extension-based world, and second because we'll have to use it
in pg_dump --binary-upgrade, as per recent discussion.

Loosely based on part of Dimitri Fontaine's ALTER EXTENSION UPGRADE
patch.
2011-02-09 11:56:37 -05:00
Heikki Linnakangas
036bb15872 Fix allocation of RW-conflict pool in the new predicate lock manager, and
also take the RW-conflict pool into account in the PredicateLockShmemSize()
estimate.
2011-02-09 12:23:07 +02:00
Magnus Hagander
3144c33a2f Implement NOWAIT option for BASE_BACKUP command
Specifying this option makes the server not wait for the
xlog to be archived, or emit a warning that it can't,
instead leaving the responsibility with the client.

This is useful when the log is being streamed using
the streaming protocol in parallel with the backup,
without having log archiving enabled.
2011-02-09 10:59:53 +01:00
Tom Lane
375e5b0a68 Suppress some compiler warnings in recent commits.
Older versions of gcc tend to throw "variable might be clobbered by
`longjmp' or `vfork'" warnings whenever a variable is assigned in more than
one place and then used after the end of a PG_TRY block.  That's reasonably
easy to work around in execute_extension_script, and the overhead of
unconditionally saving/restoring the GUC variables seems unlikely to be a
serious concern.

Also clean up logic in ATExecValidateConstraint to make it easier to read
and less likely to provoke "variable might be used uninitialized in this
function" warnings.
2011-02-08 18:12:17 -05:00
Tom Lane
d9572c4e3b Core support for "extensions", which are packages of SQL objects.
This patch adds the server infrastructure to support extensions.
There is still one significant loose end, namely how to make it play nice
with pg_upgrade, so I am not yet committing the changes that would make
all the contrib modules depend on this feature.

In passing, fix a disturbingly large amount of breakage in
AlterObjectNamespace() and callers.

Dimitri Fontaine, reviewed by Anssi Kääriäinen,
Itagaki Takahiro, Tom Lane, and numerous others
2011-02-08 16:13:22 -05:00
Peter Eisentraut
414c5a2ea6 Per-column collation support
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.

Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-02-08 23:04:18 +02:00
Simon Riggs
c016ce7281 Named restore points in recovery. Users can record named points, then
new recovery.conf parameter recovery_target_name allows PITR to
specify named points as recovery targets.

Jaime Casanova, reviewed by Euler Taveira de Oliveira, plus minor edits
2011-02-08 19:39:08 +00:00
Simon Riggs
8c6e3adbf7 Basic Recovery Control functions for use in Hot Standby. Pause, Resume,
Status check functions only. Also, new recovery.conf parameter to
pause_at_recovery_target, default on.

Simon Riggs, reviewed by Fujii Masao
2011-02-08 18:30:22 +00:00
Simon Riggs
faa0550572 Remove rare corner case for data loss when triggering standby server.
If the standby was streaming when trigger file arrives, check also in the
archive for additional WAL files. This is a corner case since it is
unlikely that we would trigger a failover while the master is still
available and sending data to standby, while at the same time running in
archive mode and also while the streaming standby has fallen behind archive.
Someone would eventually be unlucky; we must plug all gaps however small.

Fujii Masao
2011-02-08 14:38:02 +00:00
Simon Riggs
722bf7017b Extend ALTER TABLE to allow Foreign Keys to be added without initial validation.
FK constraints that are marked NOT VALID may later be VALIDATED, which uses an
ShareUpdateExclusiveLock on constraint table and RowShareLock on referenced
table. Significantly reduces lock strength and duration when adding FKs.
New state visible from psql.

Simon Riggs, with reviews from Marko Tiikkaja and Robert Haas
2011-02-08 12:23:20 +00:00
Heikki Linnakangas
7202ad7b8d Fix copy-pasto in description of pg_serial, and silence compiler warning
about uninitialized field you get on some compilers.
2011-02-08 09:05:13 +02:00
Robert Haas
32896c40ca Avoid having autovacuum workers wait for relation locks.
Waiting for relation locks can lead to starvation - it pins down an
autovacuum worker for as long as the lock is held.  But if we're doing
an anti-wraparound vacuum, then we still wait; maintenance can no longer
be put off.

To assist with troubleshooting, if log_autovacuum_min_duration >= 0,
we log whenever an autovacuum or autoanalyze is skipped for this reason.

Per a gripe by Josh Berkus, and ensuing discussion.
2011-02-07 22:04:29 -05:00
Heikki Linnakangas
dafaa3efb7 Implement genuine serializable isolation level.
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.

To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.

A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.

Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.

We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.

Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.

Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
2011-02-08 00:09:08 +02:00
Itagaki Takahiro
c18f51da17 Fix a comment for MergeAttributes.
We forgot to adjust it when we changed relistemp to relpersistence.
2011-02-07 16:53:05 +09:00
Itagaki Takahiro
fb7355e0ce Fix error messages for FreeFile in COPY command.
They are extracted from COPY API patch.

suggested by Noah Misch
2011-02-07 10:46:56 +09:00
Bruce Momjian
97116ca417 Rename macro DECIMAL to DECIMAL_T to help pgindent; this is already
done for a few other macros in that file, for other reasons.  I also
remove pgindent/README mention of the file.
2011-02-06 10:48:17 -05:00
Magnus Hagander
cedd6515ba IDENTIFY_SYSTEM now returns 3 fields, not 2 2011-02-06 07:46:14 +01:00
Robert Haas
65377e0b9c Tighten ALTER FOREIGN TABLE .. SET DATA TYPE checks.
If the foreign table's rowtype is being used as the type of a column in
another table, we can't just up and change its data type.  This was
already checked for composite types and ordinary tables, but we
previously failed to enforce it for foreign tables.
2011-02-06 00:26:27 -05:00
Bruce Momjian
51dbc87dff Add C comment about why older compilers complain about basebackup.c's
longjump.
2011-02-04 23:28:14 -05:00
Robert Haas
9e7e1172a5 Clarify comment in ATRewriteTable().
Make sure it's clear that the prohibition on adding a column with a default
when the rowtype is used elsewhere is intentional, and be a bit more
explicit about the other cases where we perform this check.
2011-02-04 16:14:54 -05:00
Robert Haas
b1e65c3216 Move pipe.c into the backend.
It's full of backend-specific error reporting, so it's neither possible
nor necessary for this to be used from frontend code.
2011-02-04 15:52:21 -05:00
Robert Haas
356f2cbbb4 Make handling of errcodes.h more consistent with other generated headers.
This fixes make distprep, and seems more robust in other ways as well.
Some special handling is required because errcodes.txt is needed by
some stuff in src/port, but just by src/backend as is the case for the
other generated headers.

While I'm at it, fix a few other things that were overlooked in the
original patch.
2011-02-04 09:29:10 -05:00
Robert Haas
dde9684d65 Unbreak the VPATH build.
My commit ddfe26f644 of 2010-02-03 broke it.

Per buildfarm.
2011-02-04 00:07:08 -05:00
Robert Haas
b8a0467e10 Preserve copyright notice from old errcodes.h file. 2011-02-03 22:38:02 -05:00
Robert Haas
ddfe26f644 Avoid maintaining three separate copies of the error codes list.
src/pl/plpgsql/src/plerrcodes.h, src/include/utils/errcodes.h, and a
big chunk of errcodes.sgml are now automatically generated from a single
file, src/backend/utils/errcodes.txt.

Jan Urbański, reviewed by Tom Lane.
2011-02-03 22:32:49 -05:00
Magnus Hagander
76129e7f14 Include more status information in walsender results
Add the current xlog insert location to the response of
IDENTIFY_SYSTEM, and adds result sets containing start
and stop location of backups to BASE_BACKUP responses.
2011-02-03 13:46:23 +01:00
Bruce Momjian
426227850b Rename function to first_path_var_separator() to clarify it works with
path variables, not directory paths.
2011-02-02 22:49:54 -05:00
Robert Haas
0af695fd43 Log restartpoints in the same fashion as checkpoints.
Prior to 9.0, restartpoints never created, deleted, or recycled WAL
files, but now they can.  This code makes log_checkpoints treat
checkpoints and restartpoints symmetrically.  It also adjusts up
the documentation of the parameter to mention restartpoints.

Fujii Masao.  Docs by me, as suggested by Itagaki Takahiro.
2011-02-02 21:08:53 -05:00
Simon Riggs
56b21b7ae3 Re-classify ERRCODE_DATABASE_DROPPED to 57P04 2011-02-01 08:44:01 +00:00
Itagaki Takahiro
0c707aa458 Fix wrong error reports in 'number of array dimensions exceeds the
maximum allowed' messages, that have reported one-less dimensions.

Alexey Klyukin
2011-02-01 15:21:32 +09:00
Simon Riggs
9e95c9ad55 Create new errcode for recovery conflict caused by db drop on master.
Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now
reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change.
Unlikely to happen on most servers, so low impact change to allow
session poolers to correctly handle this situation.

Tatsuo Ishii, edits by me, review by Robert Haas
2011-02-01 00:20:53 +00:00
Simon Riggs
8585ad3625 Fix error code for canceling statement due to conflict with recovery.
All retryable conflict errors now have an error code that indicates that
a retry is possible, correcting my incomplete fix of 2010/05/12

Tatsuo Ishii and Simon Riggs, input from Robert Haas and Florian Pflug
2011-01-31 19:20:23 +00:00
Heikki Linnakangas
997b48ed96 Support multiple concurrent pg_basebackup backups.
With this patch, pg_basebackup doesn't write a backup_label file in the
data directory, so it doesn't interfere with a pg_start/stop_backup() based
backup anymore. backup_label is still included in the backup, but it is
injected directly into the tar stream.

Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.
2011-01-31 18:25:39 +02:00
Tom Lane
9688c4e6f1 Make reduce_outer_joins() smarter about semijoins.
reduce_outer_joins() mistakenly treated a semijoin like a left join for
purposes of deciding whether not-null constraints created by the join's
quals could be passed down into the join's left-hand side (possibly
resulting in outer-join simplification there).  Actually, semijoin works
like inner join for this purpose, ie, we do not need to see any rows that
can't possibly satisfy the quals.  Hence, two-line fix to treat semi and
inner joins alike.  Per observation by Andres Freund about a performance
gripe from Yazan Suleiman.

Back-patch to 8.4, since this oversight has been there since the current
handling of semijoins was implemented.
2011-01-30 17:04:31 -05:00
Magnus Hagander
507069de6d Add option to include WAL in base backup
When included, this makes the base backup a complete working
"clone" of the initial database, ready to have a postmaster
started against it without the need to set up any log archiving
or similar.

Magnus Hagander, reviewed by Fujii Masao and Heikki Linnakangas
2011-01-30 21:30:09 +01:00
Robert Haas
7f242d880b Try to avoid running with a full fsync request queue.
When we need to insert a new entry and the queue is full, compact the
entire queue in the hopes of making room for the new entry.  Doing this
on every insertion might worsen contention on BgWriterCommLock, but
when the queue it's full, it's far better than allowing the backend to
perform its own fsync, per testing by Greg Smith as reported in
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02665.php

Original idea from Greg Smith.  Patch by me.  Review by Chris Browne
and Greg Smith
2011-01-29 08:08:41 -05:00
Tom Lane
0ac8c8df85 Don't include <asm/ia64regs.h> unnecessarily.
We only need that header when compiling with icc, since the gcc variant of
ia64_get_bsp() uses in-line assembly code.  Per report from Frank Brendel,
the header doesn't exist on all IA64 platforms; so don't include it unless
we need it.
2011-01-27 16:27:27 -05:00
Robert Haas
a40b1e0bf3 Restore ALTER TABLE .. ADD COLUMN w/DEFAULT restriction.
This reverts commit a06e41deeb of 2011-01-26.
Per discussion, this behavior is not wanted, as it would need to change if
we ever made composite types support DEFAULT.
2011-01-27 08:35:34 -05:00
Tom Lane
7ab6f2da23 Change inv_truncate() to not repeat its systable_getnext_ordered() scan.
In the case where the initial call of systable_getnext_ordered() returned
NULL, this function would nonetheless call it again.  That's undefined
behavior that only by chance failed to not give visibly incorrect results.
Put an if-test around the final loop to prevent that, and in passing
improve some comments.  No back-patch since there's no actual failure.

Per report from YAMAMOTO Takashi.
2011-01-26 19:33:50 -05:00
Robert Haas
5c2a7c6e97 Add a comment explaining why we force physical removal of OIDs.
Noah Misch, slightly revised.
2011-01-26 06:42:51 -05:00
Robert Haas
a06e41deeb Remove arbitrary ALTER TABLE .. ADD COLUMN restriction.
The previous coding prevented ALTER TABLE .. ADD COLUMN from being used
with a non-NULL default in situations where the table's rowtype was being
used elsewhere.  But this is a completely arbitrary restriction since
you could do the same operation in multiple steps (add the column, add
the default, update the table).

Inspired by a patch from Noah Misch, though I didn't use his code.
2011-01-26 06:37:08 -05:00
Tom Lane
bd1ad1b019 Replace pg_class.relhasexclusion with pg_index.indisexclusion.
There isn't any need to track this state on a table-wide basis, and trying
to do so introduces undesirable semantic fuzziness.  Move the flag to
pg_index, where it clearly describes just a single index and can be
immutable after index creation.
2011-01-25 17:51:59 -05:00
Tom Lane
88452d5ba6 Implement ALTER TABLE ADD UNIQUE/PRIMARY KEY USING INDEX.
This feature allows a unique or pkey constraint to be created using an
already-existing unique index.  While the constraint isn't very
functionally different from the bare index, it's nice to be able to do that
for documentation purposes.  The main advantage over just issuing a plain
ALTER TABLE ADD UNIQUE/PRIMARY KEY is that the index can be created with
CREATE INDEX CONCURRENTLY, so that there is not a long interval where the
table is locked against updates.

On the way, refactor some of the code in DefineIndex() and index_create()
so that we don't have to pass through those functions in order to create
the index constraint's catalog entries.  Also, in parse_utilcmd.c, pass
around the ParseState pointer in struct CreateStmtContext to save on
notation, and add error location pointers to some error reports that didn't
have one before.

Gurjeet Singh, reviewed by Steve Singer and Tom Lane
2011-01-25 15:43:05 -05:00
Magnus Hagander
966d4f52c2 Typo fix for MemSet size.
Fujii Masao
2011-01-25 10:50:04 +01:00
Magnus Hagander
e5487f65fd Make walsender options order-independent
While doing this, also move base backup options into
a struct instead of increasing the number of parameters
to multiple functions for each new option.
2011-01-23 23:39:18 +01:00
Tom Lane
dd5f0db96b Improve getObjectDescription's display of pg_amop and pg_amproc entries.
Include the lefttype/righttype columns explicitly (instead of assuming
the reader can deduce them from the operator or function description),
and move the operator or function description to the end of the string,
to make it clearer that it's a referenced object and not the amop or
amproc item itself.  Per extensive discussion of Andreas Karlsson's
original patch.

Andreas Karlsson, Tom Lane
2011-01-23 14:13:46 -05:00
Magnus Hagander
f88a638199 Only show pg_stat_replication details to superusers 2011-01-23 17:28:19 +01:00
Magnus Hagander
048d148fe6 Add pg_basebackup tool for streaming base backups
This tool makes it possible to do the pg_start_backup/
copy files/pg_stop_backup step in a single command.

There are still some steps to be done before this is a
complete backup solution, such as the ability to stream
the required WAL logs, but it's still usable, and
could do with some buildfarm coverage.

In passing, make the checkpoint request optionally
fast instead of hardcoding it.

Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine
2011-01-23 12:21:23 +01:00
Robert Haas
6f59777c65 Code cleanup for assign_transaction_read_only.
As in commit fb4c5d2798 on 2011-01-21,
this avoids spurious debug messages and allows idempotent changes at
any time.  Along the way, make assign_XactIsoLevel allow idempotent
changes even when not within a subtransaction, to be consistent with
the new coding of assign_transaction_read_only and because there's
no compelling reason to do otherwise.

Kevin Grittner, with some adjustments.
2011-01-22 20:55:50 -05:00
Tom Lane
0f73aae13d Allow the wal_buffers setting to be auto-tuned to a reasonable value.
If wal_buffers is initially set to -1 (which is now the default), it's
replaced by 1/32nd of shared_buffers, with a minimum of 8 (the old default)
and a maximum of the XLOG segment size.  The allowed range for manual
settings is still from 4 up to whatever will fit in shared memory.

Greg Smith, with implementation correction by me.
2011-01-22 20:31:24 -05:00
Robert Haas
a0c75f5539 Avoid treating WAL senders as normal backends.
The previous coding treated anything that wasn't an autovacuum launcher
as a normal backend, which is wrong now that we also have WAL senders.

Fujii Masao, reviewed by Robert Haas, Alvaro Herrera, Tom Lane,
and Bernd Helmle.
2011-01-21 22:23:01 -05:00
Robert Haas
fb4c5d2798 Code cleanup for assign_XactIsoLevel.
The new coding avoids a spurious debug message when a transaction
that has changed the isolation level has been rolled back.  It also
allows the property to be freely changed to the current value within
a subtransaction.

Kevin Grittner, with one small change by me.
2011-01-21 21:49:19 -05:00
Heikki Linnakangas
8aea1373d8 Don't require usage privileges on the foreign data wrapper when creating a
foreign table. We check for usage privileges on the foreign server, that ought
to be enough.

Shigeru HANADA
2011-01-21 15:05:20 +02:00
Robert Haas
8ceb245680 Make ALTER TABLE revalidate uniqueness and exclusion constraints.
Failure to do so can lead to constraint violations.  This was broken by
commit 1ddc2703a9 on 2010-02-07, so
back-patch to 9.0.

Noah Misch.  Regression test by me.
2011-01-20 22:44:10 -05:00
Tom Lane
1b393f4e5d Avoid detoast in texteq/textne/byteaeq/byteane for unequal-length strings.
We can get the length of a compressed or out-of-line datum without actually
detoasting it.  If the lengths of two strings are unequal, we can then
conclude they are unequal without detoasting.  That saves considerable work
in an admittedly less-common case, without costing anything much when the
optimization doesn't apply.

Noah Misch
2011-01-18 14:11:54 -05:00
Magnus Hagander
6e1726d082 Log replication connections only when log_connections is on
Previously we'd always log replication connections, with no
way to turn them off.
2011-01-18 20:02:25 +01:00
Heikki Linnakangas
b1dc45c11d Fix thinko in comment. Spotted by Jim Nasby. 2011-01-18 10:46:13 +02:00
Tom Lane
bdd8ed973d Fix miscalculation of itemsafter in array_set_slice().
If the slice to be assigned to was before the existing array lower bound
(requiring at least one null element to spring into existence to fill the
gap), the code miscalculated how many entries needed to be copied from
the old array's null bitmap.  This could result in trashing the array's
data area (as seen in bug #5840 from Karsten Loesing), or worse.

This has been broken since we first allowed the behavior of assigning to
non-adjacent slices, in 8.2.  Back-patch to all affected versions.
2011-01-17 12:38:52 -05:00
Magnus Hagander
48075095ac Set fallback_application_name in walreceiver
Makes replication slaves identify themselves in the new
pg_stat_replication view.
2011-01-17 11:42:53 +01:00
Heikki Linnakangas
34ef02b4d4 Before exiting walreceiver, fsync() all the WAL received.
Otherwise WAL recovery will replay the un-flushed WAL after walreceiver has
exited, which can lead to a non-recoverable standby if the system crashes hard
at that point.
2011-01-17 12:27:35 +02:00
Tom Lane
36750dcef5 Add .gitignore to silence git complaints about parser/scanner output files. 2011-01-15 16:05:28 -05:00
Magnus Hagander
3866ff6149 Enumerate available tablespaces after starting the backup
This closes a race condition where if a tablespace was created
after the enumeration happened but before the do_pg_start_backup()
was called, the backup would be incomplete. Now that it's done
while we are in backup mode, WAL replay will recreate it during
restore.

Noted by Heikki.
2011-01-15 19:31:16 +01:00
Heikki Linnakangas
8f5d65e916 Treat a WAL sender process that hasn't started streaming yet as a regular
backend, as far as the postmaster shutdown logic is concerned. That means,
fast shutdown will wait for WAL sender processes to exit before signaling
bgwriter to finish. This avoids race conditions between a base backup stopping
or starting, and bgwriter writing the shutdown checkpoint WAL record. We don't
want e.g the end-of-backup WAL record to be written after the shutdown
checkpoint.
2011-01-15 16:38:21 +02:00
Magnus Hagander
fcd810c69a Use a lexer and grammar for parsing walsender commands
Makes it easier to parse mainly the BASE_BACKUP command
with it's options, and avoids having to manually deal
with quoted identifiers in the label (previously broken),
and makes it easier to add new commands and options in
the future.

In passing, refactor the case statement in the walsender
to put each command in it's own function.
2011-01-14 16:30:33 +01:00
Magnus Hagander
688423d004 Exit from base backups when shutdown is requested
When the exit waits until the whole backup completes, it may take
a very long time.

In passing, add back an error check in the main loop so we detect
clients that disconnect much earlier if the backup is large.
2011-01-14 12:36:45 +01:00
Tom Lane
52948169bc Code review for postmaster.pid contents changes.
Fix broken test for pre-existing postmaster, caused by wrong code for
appending lines to the lockfile; don't write a failed listen_address
setting into the lockfile; don't arbitrarily change the location of the
data directory in the lockfile compared to previous releases; provide more
consistent and useful definitions of the socket path and listen_address
entries; avoid assuming that pg_ctl has the same DEFAULT_PGSOCKET_DIR as
the postmaster; assorted code style improvements.
2011-01-13 19:01:28 -05:00
Tom Lane
f0f36045b2 Revert incorrect memory-conservation hack in inheritance_planner().
This reverts commit d1001a78ce of 2010-12-05,
which was broken as reported by Jeff Davis.  The problem is that the
individual planning steps may have side-effects on substructures of
PlannerGlobal, not only the current PlannerInfo root.  Arranging to keep
all such side effects in the main planning context is probably possible,
but it would change this from a quick local hack into a wide-ranging and
rather fragile endeavor.  Which it's not worth.
2011-01-13 14:33:19 -05:00
Magnus Hagander
9eacd427e8 Make sure walsender state is only read while holding the spinlock
Noted by Robert Haas.
2011-01-13 18:51:13 +01:00
Heikki Linnakangas
a5a02a7445 Fix the logic in libpqrcv_receive() to determine if there's any incoming data
that can be read without blocking. It used to conclude that there isn't, even
though there was data in the socket receive buffer. That lead walreceiver to
flush the WAL after every received chunk, potentially causing big performance
issues.

Backpatch to 9.0, because the performance impact can be very significant.
2011-01-13 18:26:39 +02:00
Peter Eisentraut
c667cc24e8 Workaround for recursive make breakage
Changing a file two directory levels deep under src/backend/ would not
cause the postgres binary to be rebuilt.  This change fixes it, but no
one knows why.
2011-01-13 09:32:06 +02:00
Tom Lane
d487afbb81 Fix PlanRowMark/ExecRowMark structures to handle inheritance correctly.
In an inherited UPDATE/DELETE, each target table has its own subplan,
because it might have a column set different from other targets.  This
means that the resjunk columns we add to support EvalPlanQual might be
at different physical column numbers in each subplan.  The EvalPlanQual
rewrite I did for 9.0 failed to account for this, resulting in possible
misbehavior or even crashes during concurrent updates to the same row,
as seen in a recent report from Gordon Shannon.  Revise the data structure
so that we track resjunk column numbers separately for each subplan.

I also chose to move responsibility for identifying the physical column
numbers back to executor startup, instead of assuming that numbers derived
during preprocess_targetlist would stay valid throughout subsequent
massaging of the plan.  That's a bit slower, so we might want to consider
undoing it someday; but it would complicate the patch considerably and
didn't seem justifiable in a bug fix that has to be back-patched to 9.0.
2011-01-12 20:47:02 -05:00
Robert Haas
7a32ff9732 Revert patch adding support for logging the current role.
This reverts commit a8a8867912, committed
by me earlier today (2011-01-12).  This isn't safe inside an aborted
transaction.

Noted by Tom Lane.
2011-01-12 11:59:21 -05:00
Robert Haas
a8a8867912 Add support for logging the current role.
Stephen Frost, with some editorialization by me.
2011-01-12 11:34:53 -05:00
Peter Eisentraut
e3094fd3a8 Re-add recursive coverage target in src/backend/
This was lost during the recent recursive make change.
2011-01-12 00:26:20 +02:00
Magnus Hagander
4c8e20f815 Track walsender state in shared memory and expose in pg_stat_replication 2011-01-11 21:25:28 +01:00
Magnus Hagander
47a5f3e9da Add missing function prototype, for consistency 2011-01-11 21:12:12 +01:00
Tom Lane
e6dce4e439 Adjust basebackup.c to suppress compiler warnings.
Some versions of gcc complain about "variable `tablespaces' might be
clobbered by `longjmp' or `vfork'" with the original coding.  Fix by
moving the PG_TRY block into a separate subroutine.
2011-01-11 13:41:13 -05:00
Tom Lane
9d1ac2f5fa Tweak create_index_paths()'s test for whether to consider a bitmap scan.
Per my note of a couple days ago, create_index_paths would refuse to
consider any path at all for GIN indexes if the selectivity estimate came
out as 1.0; not even if you tried to force it with enable_seqscan.  While
this isn't really a bad outcome in practice, it could be annoying for
testing purposes.  Adjust the test for "is this path only useful for
sorting" so that it doesn't fire on paths with nil pathkeys, which will
include all GIN paths.
2011-01-11 12:13:02 -05:00
Magnus Hagander
b7ebda9d8c Reset walsender ps title in the main loop
When in streaming mode we can never get out, so it will never
be required, but after a base backup (or other operations)
we can get back to the loop, so the title needs to be cleared.
2011-01-11 10:04:54 +01:00
Magnus Hagander
2e36343f82 Set process title to indicate base backup is running 2011-01-10 21:53:18 +01:00
Heikki Linnakangas
dc1305ce5f Leave temporary files out of streaming base backups. 2011-01-10 19:42:05 +02:00
Magnus Hagander
0eb59c4591 Backend support for streaming base backups
Add BASE_BACKUP command to walsender, allowing it to stream a
base backup to the client (in tar format). The syntax is still
far from ideal, that will be fixed in the switch to use a proper
grammar for walsender.

No client included yet, will come as a separate commit.

Magnus Hagander and Heikki Linnakangas
2011-01-10 14:04:19 +01:00
Magnus Hagander
4448917d51 Split pg_start_backup() and pg_stop_backup() into two pieces
Move the actual functionality into a separate function that's
easier to call internally, and change the SQL-callable function
to be a wrapper calling this.

Also create a pg_abort_backup() function, only callable internally,
that does only the most vital parts of pg_stop_backup(), making it
safe(r) to call from error handlers.
2011-01-09 21:00:28 +01:00
Heikki Linnakangas
ca63029eac Fix crash in the new GiST insertion code, when an update splits the root page.
This bug was exercised by contrib/intarray/bench, as noted by Tom Lane.
2011-01-09 21:36:22 +02:00
Tom Lane
52fd2d65a3 Fix up core tsquery GIN support for new extractQuery API.
No need for the empty-prefix-match kluge to force a full scan anymore.
2011-01-09 14:34:50 -05:00
Tom Lane
304845075c Use array_contains_nulls instead of ARR_HASNULL on user-supplied arrays.
This applies the fix for bug #5784 to remaining places where we wish
to reject nulls in user-supplied arrays.  In all these places, there's
no reason not to allow a null bitmap to be present, so long as none of
the current elements are actually null.

I did not change some other places where we are looking at system catalog
entries or aggregate transition values, as the presence of a null bitmap
in such an array would be suspicious.
2011-01-09 13:09:07 -05:00
Tom Lane
adf328c0e1 Add array_contains_nulls() function in arrayfuncs.c.
This will support fixing contrib/intarray (and probably other places)
so that they don't have to fail on arrays that contain a null bitmap
but no live null entries.
2011-01-08 20:26:14 -05:00
Tom Lane
4d1b76e49e Fix up gincostestimate for new extractQuery API.
The only reason this wasn't crashing while testing the core anyarray
operators was that it was disabled for those cases because of passing the
wrong type information to get_opfamily_proc :-(.  So fix that too, and make
it insist on finding the support proc --- in hindsight, silently doing
nothing is not as sane a coping mechanism as all that.
2011-01-08 20:26:13 -05:00
Tom Lane
7e2f906201 Remove pg_am.amindexnulls.
The only use we have had for amindexnulls is in determining whether an
index is safe to cluster on; but since the addition of the amclusterable
flag, that usage is pretty redundant.

In passing, clean up assorted sloppiness from the last patch that touched
pg_am.h: Natts_pg_am was wrong, and ambuildempty was not documented.
2011-01-08 16:08:05 -05:00
Tom Lane
56a57473a9 Refactor GIN's handling of duplicate search entries.
The original coding could combine duplicate entries only when they
originated from the same qual condition.  In particular it could not
combine cases where multiple qual conditions all give rise to full-index
scan requests, which is an expensive case well worth optimizing.  Refactor
so that duplicates are recognized across all the quals.
2011-01-08 14:48:08 -05:00
Bruce Momjian
d8d3d2a4f3 Fix pg_upgrade of large object permissions by preserving pg_auth.oid,
which is stored in pg_largeobject_metadata.

No backpatch to 9.0 because you can't migrate from 9.0 to 9.0 with the
same catversion (because of tablespace conflict), and a pre-9.0
migration to 9.0 has not large object permissions to migrate.
2011-01-07 21:59:29 -05:00
Bruce Momjian
2896c87ce4 Force pg_upgrade's to preserve pg_class.oid, not pg_class.relfilenode.
Toast tables have identical pg_class.oid and pg_class.relfilenode, but
for clarity it is good to preserve the pg_class.oid.

Update comments regarding what is preserved, and do some
variable/function renaming for clarity.
2011-01-07 21:26:13 -05:00
Tom Lane
73912e7fbd Fix GIN to support null keys, empty and null items, and full index scans.
Per my recent proposal(s).  Null key datums can now be returned by
extractValue and extractQuery functions, and will be stored in the index.
Also, placeholder entries are made for indexable items that are NULL or
contain no keys according to extractValue.  This means that the index is
now always complete, having at least one entry for every indexed heap TID,
and so we can get rid of the prohibition on full-index scans.  A full-index
scan is implemented much the same way as partial-match scans were already:
we build a bitmap representing all the TIDs found in the index, and then
drive the results off that.

Also, introduce a concept of a "search mode" that can be requested by
extractQuery when the operator requires matching to empty items (this is
just as cheap as matching to a single key) or requires a full index scan
(which is not so cheap, but it sure beats failing or giving wrong answers).
The behavior remains backward compatible for opclasses that don't return
any null keys or request a non-default search mode.

Using these features, we can now make the GIN index opclass for anyarray
behave in a way that matches the actual anyarray operators for &&, <@, @>,
and = ... which it failed to do before in assorted corner cases.

This commit fixes the core GIN code and ginarrayprocs.c, updates the
documentation, and adds some simple regression test cases for the new
behaviors using the array operators.  The tsearch and contrib GIN opclass
support functions still need to be looked over and probably fixed.

Another thing I intend to fix separately is that this is pretty inefficient
for cases where more than one scan condition needs a full-index search:
we'll run duplicate GinScanEntrys, each one of which builds a large bitmap.
There is some existing logic to merge duplicate GinScanEntrys but it needs
refactoring to make it work for entries belonging to different scan keys.

Note that most of gin.h has been split out into a new file gin_private.h,
so that gin.h doesn't export anything that's not supposed to be used by GIN
opclasses or the rest of the backend.  I did quite a bit of other code
beautification work as well, mostly fixing comments and choosing more
appropriate names for things.
2011-01-07 19:16:24 -05:00
Robert Haas
a9f72b4083 Improve recovery.conf.sample comments.
Jehan-Guillaume de Rorthais, with some additional wordsmithing by me.
2011-01-07 11:01:25 -05:00
Itagaki Takahiro
a755ea33ae New system view pg_stat_replication displays activity of wal sender processes.
Itagaki Takahiro and Simon Riggs.
2011-01-07 20:35:38 +09:00
Bruce Momjian
46d28820b6 Improve C comments about backend variables set by pg_upgrade_support
functions.
2011-01-06 22:45:36 -05:00
Magnus Hagander
66a8a0428d Give superusers REPLIACTION permission by default
This can be overriden by using NOREPLICATION on the CREATE ROLE
statement, but by default they will have it, making it backwards
compatible and "less surprising" (given that superusers normally
override all checks).
2011-01-05 14:24:17 +01:00
Robert Haas
7f60be72b0 Fix crash in ALTER OPERATOR CLASS/FAMILY .. SET SCHEMA.
In the previous coding, the parser emitted a List containing a C string,
which is no good, because copyObject() can't handle it.

Dimitri Fontaine
2011-01-03 22:08:55 -05:00
Robert Haas
dc8a14311a Update comments in RecordTransactionCommit() to mention unlogged tables. 2011-01-03 10:29:22 -05:00
Magnus Hagander
40d9e94bd7 Add views and functions to monitor hot standby query conflicts
Add the view pg_stat_database_conflicts and a column to pg_stat_database,
and the underlying functions to provide the information.
2011-01-03 12:46:03 +01:00
Peter Eisentraut
39b8843296 Implement remaining fields of information_schema.sequences view
Add new function pg_sequence_parameters that returns a sequence's start,
minimum, maximum, increment, and cycle values, and use that in the view.
(bug #5662; design suggestion by Tom Lane)

Also slightly adjust the view's column order and permissions after review of
SQL standard.
2011-01-02 15:15:21 +02:00
Robert Haas
e657b55e66 Fix typo.
Noted by Magnus Hagander.
2011-01-02 07:26:10 -05:00
Robert Haas
0d692a0dc9 Basic foreign table support.
Foreign tables are a core component of SQL/MED.  This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried.  Support for foreign table scans will need to
be added in a future patch.  However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.

Shigeru Hanada, heavily revised by Robert Haas
2011-01-01 23:48:11 -05:00
Peter Eisentraut
6a208aa404 Allow casting a table's row type to the table's supertype if it's a typed table
This is analogous to the existing facility that allows casting a row type to a
supertable's row type.
2011-01-01 23:04:14 +02:00
Bruce Momjian
5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Bruce Momjian
30aeda4394 Include the first valid listen address in pg_ctl to improve server start
"wait" detection and add postmaster start time to help determine if the
postmaster is actually using the specified data directory.
2010-12-31 17:25:02 -05:00
Tom Lane
39c8dd6620 Invert and rename flag variable to improve code readability.
No change in functionality.  Per discussion with Robert.
2010-12-31 11:59:38 -05:00
Tom Lane
7b46401557 Move symbols for ExecMergeJoin's state machine into nodeMergejoin.c.
There's no reason for these values to be known anywhere else.  After
doing this, executor/execdefs.h is vestigial and can be removed.
2010-12-30 22:12:40 -05:00
Tom Lane
f4e4b32743 Support RIGHT and FULL OUTER JOIN in hash joins.
This is advantageous first because it allows us to hash the smaller table
regardless of the outer-join type, and second because hash join can be more
flexible than merge join in dealing with arbitrary join quals in a FULL
join.  For merge join all the join quals have to be mergejoinable, but hash
join will work so long as there's at least one hashjoinable qual --- the
others can be any condition.  (This is true essentially because we don't
keep per-inner-tuple match flags in merge join, while hash join can do so.)

To do this, we need a has-it-been-matched flag for each tuple in the
hashtable, not just one for the current outer tuple.  The key idea that
makes this practical is that we can store the match flag in the tuple's
infomask, since there are lots of bits there that are of no interest for a
MinimalTuple.  So we aren't increasing the size of the hashtable at all for
the feature.

To write this without turning the hash code into even more of a pile of
spaghetti than it already was, I rewrote ExecHashJoin in a state-machine
style, similar to ExecMergeJoin.  Other than that decision, it was pretty
straightforward.
2010-12-30 20:26:08 -05:00
Alvaro Herrera
55573990ca Avoid unnecessary public struct declaration in slru.h
Instead, declare a public wrapper of the sole function using it for
external callers, so that they don't have to always pass a NULL
argument.

Author: Kevin Grittner
2010-12-30 12:09:17 -03:00
Robert Haas
53dbc27c62 Support unlogged tables.
The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery.  Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.
2010-12-29 06:48:53 -05:00
Magnus Hagander
9b8aff8c19 Add REPLICATION privilege for ROLEs
This privilege is required to do Streaming Replication, instead of
superuser, making it possible to set up a SR slave that doesn't
have write permissions on the master.

Superuser privileges do NOT override this check, so in order to
use the default superuser account for replication it must be
explicitly granted the REPLICATION permissions. This is backwards
incompatible change, in the interest of higher default security.
2010-12-29 11:05:03 +01:00
Tom Lane
f2ba1e994c Avoid unexpected conversion overflow in planner for distant date values.
The "date" type supports a wider range of dates than int64 timestamps do.
However, there is pre-int64-timestamp code in the planner that assumes that
all date values can be converted to timestamp with impunity.  Fortunately,
what we really need out of the conversion is always a double (float8)
value; so even when the date is out of timestamp's range it's possible to
produce a sane answer.  All we need is a code path that doesn't try to
force the result into int64.  Per trouble report from David Rericha.

Back-patch to all supported versions.  Although this is surely a corner
case, there's not much point in advertising a date range wider than
timestamp's if we will choke on such values in unexpected places.
2010-12-28 22:49:57 -05:00
Bruce Momjian
b4d3792daa Another fix for larger postmaster.pid files. 2010-12-28 09:34:46 -05:00
Bruce Momjian
bada44a2a2 Fix code to properly pull out shared memory key now that the
postmaster.pid file is larger than in previous major versions.
This is a bug introduced when I added lines to the file recently.
2010-12-27 23:11:33 -05:00
Tom Lane
84fc571395 Rename the C functions bitand(), bitor() to bit_and(), bit_or().
This is to avoid use of the C++ keywords "bitand" and "bitor" in
the header file utils/varbit.h.  Note the functions' SQL-level
names are not changed, only their C-level names.

In passing, make some comments in varbit.c conform to project-standard
layout.
2010-12-27 14:57:41 -05:00
Tom Lane
275411912d Fix ill-chosen use of "private" as an argument and struct field name.
"private" is a keyword in C++, so this breaks the poorly-enforced policy
that header files should be include-able in C++ code.  Per report from
Craig Ringer and some investigation with cpluspluscheck.
2010-12-27 11:26:19 -05:00
Andrew Dunstan
a534728afb Only build in crashdump support on Windows if there's a working dbghelp.h. 2010-12-26 10:34:47 -05:00
Bruce Momjian
5000472112 Remove quotes from boolean recovery.conf.sample parameters, now that the
quotes are not required.  This now matches postgresql.conf's
specification of booleans.
2010-12-24 11:51:51 -05:00
Bruce Momjian
075354ad1b Improve "pg_ctl -w start" server detection by writing the postmaster
port and socket directory into postmaster.pid, and have pg_ctl read from
that file, for use by PQping().
2010-12-24 09:45:52 -05:00
Heikki Linnakangas
9de3aa65f0 Rewrite the GiST insertion logic so that we don't need the post-recovery
cleanup stage to finish incomplete inserts or splits anymore. There was two
reasons for the cleanup step:

1. When a new tuple was inserted to a leaf page, the downlink in the parent
needed to be updated to contain (ie. to be consistent with) the new key.
Updating the parent in turn might require recursively updating the parent of
the parent. We now handle that by updating the parent while traversing down
the tree, so that when we insert the leaf tuple, all the parents are already
consistent with the new key, and the tree is consistent at every step.

2. When a page is split, we need to insert the downlink for the new right
page(s), and update the downlink for the original page to not include keys
that moved to the right page(s). We now handle that by setting a new flag,
F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
set, scans always follow the rightlink, regardless of the NSN mechanism used
to detect concurrent page splits. That way the tree is consistent right after
split, even though the downlink is still missing. This is very similar to the
way B-tree splits are handled. When the downlink is inserted in the parent,
the flag is cleared. To keep the insertion algorithm simple, when an
insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
finishes the split before doing anything else.

These changes allow removing the whole "invalid tuple" mechanism, but I
retained the scan code to still follow invalid tuples correctly. While we
don't create any such tuples anymore, we want to handle them gracefully in
case you pg_upgrade a GiST index that has them. If we encounter any on an
insert, though, we just throw an error saying that you need to REINDEX.

The issue that got me into doing this is that if you did a checkpoint while
an insert or split was in progress, and the checkpoint finishes quickly so
that there is no WAL record related to the insert between RedoRecPtr and the
checkpoint record, recovery from that checkpoint would not know to finish
the incomplete insert. IOW, we have the same issue we solved with the
rm_safe_restartpoint mechanism during normal operation too. It's highly
unlikely to happen in practice, and this fix is far too large to backpatch,
so we're just going to live with in previous versions, but this refactoring
fixes it going forward.

With this patch, you don't get the annoying
'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
anymore if you crash at an unfortunate moment.
2010-12-23 16:21:47 +02:00
Robert Haas
32ba2b5160 Use memcmp() rather than strncmp() when shorter string length is known.
It appears that this will be faster for all but the shortest strings;
at least one some platforms, memcmp() can use word-at-a-time comparisons.

Noah Misch, somewhat pared down.
2010-12-21 22:11:40 -05:00
Robert Haas
c5160b7eec Fix typos.
Andreas Karlsson
2010-12-21 17:58:53 -05:00
Robert Haas
24ecde7742 Work around unfortunate getppid() behavior on BSD-ish systems.
On MacOS X, and apparently also on other BSD-derived systems, attaching
a debugger causes getppid() to return the pid of the debugging process
rather than the actual parent PID.  As a result, debugging the
autovacuum launcher, startup process, or WAL sender on such systems
causes it to exit, because the previous coding of PostmasterIsAlive()
detects postmaster death by testing whether getppid() == PostmasterPid.

Work around that behavior by checking the return value of getppid()
more carefully.  If it's PostmasterPid, the postmaster must be alive;
if it's 1, assume the postmaster is dead.  If it's any other value,
assume we've been debugged and fall through to the less-reliable
kill() test.

Review by Tom Lane.
2010-12-21 06:30:32 -05:00
Robert Haas
f6a0863e3c Allow transactions that don't write WAL to commit asynchronously.
This case can arise if a transaction has written data, but only to
temporary tables.  Loss of the commit record in case of a crash won't
matter, because the temporary tables will be lost anyway.

Reviewed by Heikki Linnakangas and Simon Riggs.
2010-12-20 12:59:33 -05:00
Magnus Hagander
d382828f6e Remove thread dumping constant that requires newer Platform SDK
Since we're not multithreaded it only provides marginally useful
information, and it does require a newer version of the Platform SDK
than we target. We may want to reconsider this in the future along
with a fix for MinGW.
2010-12-19 21:32:58 +01:00
Tom Lane
1b19e2c0ba Fix up handling of simple-form CASE with constant test expression.
eval_const_expressions() can replace CaseTestExprs with constants when
the surrounding CASE's test expression is a constant.  This confuses
ruleutils.c's heuristic for deparsing simple-form CASEs, leading to
Assert failures or "unexpected CASE WHEN clause" errors.  I had put in
a hack solution for that years ago (see commit
514ce7a331 of 2006-10-01), but bug #5794
from Peter Speck shows that that solution failed to cover all cases.

Fortunately, there's a much better way, which came to me upon reflecting
that Peter's "CASE TRUE WHEN" seemed pretty redundant: we can "simplify"
the simple-form CASE to the general form of CASE, by simply omitting the
constant test expression from the rebuilt CASE construct.  This is
intuitively valid because there is no need for the executor to evaluate
the test expression at runtime; it will never be referenced, because any
CaseTestExprs that would have referenced it are now replaced by constants.
This won't save a whole lot of cycles, since evaluating a Const is pretty
cheap, but a cycle saved is a cycle earned.  In any case it beats kluging
ruleutils.c still further.  So this patch improves const-simplification
and reverts the previous change in ruleutils.c.

Back-patch to all supported branches.  The bug exists in 8.1 too, but it's
out of warranty.
2010-12-19 15:30:44 -05:00