Commit Graph

13672 Commits

Author SHA1 Message Date
Heikki Linnakangas
5cb0e33597 Speed up operations on numeric, mostly by avoiding palloc() overhead.
In many functions, a NumericVar was initialized from an input Numeric, to be
passed as input to a calculation function. When the NumericVar is not
modified, the digits array of the NumericVar can point directly to the digits
array in the original Numeric, and we can avoid a palloc() and memcpy(). Add
init_var_from_num() function to initialize a var like that.

Remove dscale argument from get_str_from_var(), as all the callers just
passed the dscale of the variable. That means that the rounding it used to
do was not actually necessary, and get_str_from_var() no longer scribbles on
its input. That makes it safer in general, and allows us to use the new
init_var_from_num() function in e.g numeric_out().

Also modified numericvar_to_int8() to no scribble on its input either. It
creates a temporary copy to avoid that. To compensate, the callers no longer
need to create a temporary copy, so the net # of pallocs is the same, but this
is nicer.

In the passing, use a constant for the number 10 in get_str_from_var_sci(),
when calculating 10^exponent. Saves a palloc() and some cycles to convert
integer 10 to numeric.

Original patch by Kyotaro HORIGUCHI, with further changes by me. Reviewed
by Pavel Stehule.
2012-11-21 15:53:35 +02:00
Tom Lane
1f7cb5c309 Improve handling of INT_MIN / -1 and related cases.
Some platforms throw an exception for this division, rather than returning
a necessarily-overflowed result.  Since we were testing for overflow after
the fact, an exception isn't nice.  We can avoid the problem by treating
division by -1 as negation.

Add some regression tests so that we'll find out if any compilers try to
optimize away the overflow check conditions.

This ought to be back-patched, but I'm going to see what the buildfarm
reports about the regression tests first.

Per discussion with Xi Wang, though this is different from the patch he
submitted.
2012-11-19 12:24:25 -05:00
Heikki Linnakangas
644a0a6379 Fix archive_cleanup_command.
When I moved ExecuteRecoveryCommand() from xlog.c to xlogarchive.c, I didn't
realize that it's called from the checkpoint process, not the startup
process. I tried to use InRedo variable to decide whether or not to attempt
cleaning up the archive (must not do so before we have read the initial
checkpoint record), but that variable is only valid within the startup
process.

Instead, let ExecuteRecoveryCommand() always clean up the archive, and add
an explicit argument to RestoreArchivedFile() to say whether that's allowed
or not. The caller knows better.

Reported by Erik Rijkers, diagnosis by Fujii Masao. Only 9.3devel is
affected.
2012-11-19 10:14:20 +02:00
Tom Lane
b6e3798f3a Limit values of archive_timeout, post_auth_delay, auth_delay.milliseconds.
The previous definitions of these GUC variables allowed them to range
up to INT_MAX, but in point of fact the underlying code would suffer
overflows or other errors with large values.  Reduce the maximum values
to something that won't misbehave.  There's no apparent value in working
harder than this, since very large delays aren't sensible for any of
these.  (Note: the risk with archive_timeout is that if we're late
checking the state, the timestamp difference it's being compared to
might overflow.  So we need some amount of slop; the choice of INT_MAX/2
is arbitrary.)

Per followup investigation of bug #7670.  Although this isn't a very
significant fix, might as well back-patch.
2012-11-18 17:15:06 -05:00
Tom Lane
d038966ddb Fix syslogger to not fail when log_rotation_age exceeds 2^31 milliseconds.
We need to avoid calling WaitLatch with timeouts exceeding INT_MAX.
Fortunately a simple clamp will do the trick, since no harm is done if
the wait times out before it's really time to rotate the log file.
Per bug #7670 (probably bug #7545 is the same thing, too).

In passing, fix bogus definition of log_rotation_age's maximum value in
guc.c --- it was numerically right, but only because MINS_PER_HOUR and
SECS_PER_MINUTE have the same value.

Back-patch to 9.2.  Before that, syslogger wasn't using WaitLatch.
2012-11-18 16:16:39 -05:00
Tom Lane
14ddff44c2 Assert that WaitLatch's timeout is not more than INT_MAX milliseconds.
The behavior with larger values is unspecified by the Single Unix Spec.
It appears that BSD-derived kernels report EINVAL, although Linux does not.
If waiting for longer intervals is desired, the calling code has to do
something to limit the delay; we can't portably fix it here since "long"
may not be any wider than "int" in the first place.

Part of response to bug #7670, though this change doesn't fix that
(in fact, it converts the problem from an ERROR into an Assert failure).
No back-patch since it's just an assertion addition.
2012-11-18 15:39:51 -05:00
Tom Lane
1746ba9256 Improve check_partial_indexes() to consider join clauses in proof attempts.
Traditionally check_partial_indexes() has only looked at restriction
clauses while trying to prove partial indexes usable in queries.  However,
join clauses can also be used in some cases; mainly, that a strict operator
on "x" proves an "x IS NOT NULL" index predicate, even if the operator is
in a join clause rather than a restriction clause.  Adding this code fixes
a regression in 9.2, because previously we would take join clauses into
account when considering whether a partial index could be used in a
nestloop inner indexscan path.  9.2 doesn't handle nestloop inner
indexscans in the same way, and this consideration was overlooked in the
rewrite.  Moving the work to check_partial_indexes() is a better solution
anyway, since the proof applies whether or not we actually use the index
in that particular way, and we don't have to do it over again for each
possible outer relation.  Per report from Dave Cramer.
2012-11-15 19:29:05 -05:00
Tom Lane
a235b85a0b Fix the int8 and int2 cases of (minimum possible integer) % (-1).
The correct answer for this (or any other case with arg2 = -1) is zero,
but some machines throw a floating-point exception instead of behaving
sanely.  Commit f9ac414c35 dealt with this
in int4mod, but overlooked the fact that it also happens in int8mod
(at least on my Linux x86_64 machine).  Protect int2mod as well; it's
not clear whether any machines fail there (mine does not) but since the
test is so cheap it seems better safe than sorry.  While at it, simplify
the original guard in int4mod: we need only check for arg2 == -1, we
don't need to check arg1 explicitly.

Xi Wang, with some editing by me.
2012-11-14 17:30:00 -05:00
Tom Lane
273986bf0d Fix memory leaks in record_out() and record_send().
record_out() leaks memory: it fails to free the strings returned by the
per-column output functions, and also is careless about detoasted values.
This results in a query-lifespan memory leakage when returning composite
values to the client, because printtup() runs the output functions in the
query-lifespan memory context.  Fix it to handle these issues the same way
printtup() does.  Also fix a similar leakage in record_send().

(At some point we might want to try to run output functions in
shorter-lived memory contexts, so that we don't need a zero-leakage policy
for them.  But that would be a significantly more invasive patch, which
doesn't seem like material for back-patching.)

In passing, use appendStringInfoCharMacro instead of appendStringInfoChar
in the innermost data-copying loop of record_out, to try to shave a few
cycles from this function's runtime.

Per trouble report from Carlos Henrique Reimer.  Back-patch to all
supported versions.
2012-11-13 14:45:26 -05:00
Simon Riggs
d9fad1076d Skip searching for subxact locks at commit.
At commit all standby locks are released
for the top-level transaction, so searching
for locks for each subtransaction is both
pointless and costly (N^2) in the presence
of many AccessExclusiveLocks.
2012-11-13 16:00:19 -03:00
Simon Riggs
68f7fe140b Clarify docs on hot standby lock release
Andres Freund and Simon Riggs
2012-11-13 15:54:01 -03:00
Tom Lane
3bbf668de9 Fix multiple problems in WAL replay.
Most of the replay functions for WAL record types that modify more than
one page failed to ensure that those pages were locked correctly to ensure
that concurrent queries could not see inconsistent page states.  This is
a hangover from coding decisions made long before Hot Standby was added,
when it was hardly necessary to acquire buffer locks during WAL replay
at all, let alone hold them for carefully-chosen periods.

The key problem was that RestoreBkpBlocks was written to hold lock on each
page restored from a full-page image for only as long as it took to update
that page.  This was guaranteed to break any WAL replay function in which
there was any update-ordering constraint between pages, because even if the
nominal order of the pages is the right one, any mixture of full-page and
non-full-page updates in the same record would result in out-of-order
updates.  Moreover, it wouldn't work for situations where there's a
requirement to maintain lock on one page while updating another.  Failure
to honor an update ordering constraint in this way is thought to be the
cause of bug #7648 from Daniel Farina: what seems to have happened there
is that a btree page being split was rewritten from a full-page image
before the new right sibling page was written, and because lock on the
original page was not maintained it was possible for hot standby queries to
try to traverse the page's right-link to the not-yet-existing sibling page.

To fix, get rid of RestoreBkpBlocks as such, and instead create a new
function RestoreBackupBlock that restores just one full-page image at a
time.  This function can be invoked by WAL replay functions at the points
where they would otherwise perform non-full-page updates; in this way, the
physical order of page updates remains the same no matter which pages are
replaced by full-page images.  We can then further adjust the logic in
individual replay functions if it is necessary to hold buffer locks
for overlapping periods.  A side benefit is that we can simplify the
handling of concurrency conflict resolution by moving that code into the
record-type-specfic functions; there's no more need to contort the code
layout to keep conflict resolution in front of the RestoreBkpBlocks call.

In connection with that, standardize on zero-based numbering rather than
one-based numbering for referencing the full-page images.  In HEAD, I
removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4.  They are
still there in the header files in previous branches, but are no longer
used by the code.

In addition, fix some other bugs identified in the course of making these
changes:

spgRedoAddNode could fail to update the parent downlink at all, if the
parent tuple is in the same page as either the old or new split tuple and
we're not doing a full-page image: it would get fooled by the LSN having
been advanced already.  This would result in permanent index corruption,
not just transient failure of concurrent queries.

Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old
tail page as a candidate for a full-page image; in the worst case this
could result in torn-page corruption.

heap_xlog_freeze() was inconsistent about using a cleanup lock or plain
exclusive lock: it did the former in the normal path but the latter for a
full-page image.  A plain exclusive lock seems sufficient, so change to
that.

Also, remove gistRedoPageDeleteRecord(), which has been dead code since
VACUUM FULL was rewritten.

Back-patch to 9.0, where hot standby was introduced.  Note however that 9.0
had a significantly different WAL-logging scheme for GIST index updates,
and it doesn't appear possible to make that scheme safe for concurrent hot
standby queries, because it can leave inconsistent states in the index even
between WAL records.  Given the lack of complaints from the field, we won't
work too hard on fixing that branch.
2012-11-12 22:05:53 -05:00
Heikki Linnakangas
dbdf9679d7 Use correct text domain for translating errcontext() messages.
errcontext() is typically used in an error context callback function, not
within an ereport() invocation like e.g errmsg and errdetail are. That means
that the message domain that the TEXTDOMAIN magic in ereport() determines
is not the right one for the errcontext() calls. The message domain needs to
be determined by the C file containing the errcontext() call, not the file
containing the ereport() call.

Fix by turning errcontext() into a macro that passes the TEXTDOMAIN to use
for the errcontext message. "errcontext" was used in a few places as a
variable or struct field name, I had to rename those out of the way, now
that errcontext is a macro.

We've had this problem all along, but this isn't doesn't seem worth
backporting. It's a fairly minor issue, and turning errcontext from a
function to a macro requires at least a recompile of any external code that
calls errcontext().
2012-11-12 17:07:29 +02:00
Tom Lane
34f3b396a6 Check for stack overflow in transformSetOperationTree().
Since transformSetOperationTree() recurses, it can be driven to stack
overflow with enough UNION/INTERSECT/EXCEPT clauses in a query.  Add a
check to ensure it fails cleanly instead of crashing.  Per report from
Matthew Gerber (though it's not clear whether this is the only thing
going wrong for him).

Historical note: I think the reasoning behind not putting a check here in
the beginning was that the check in transformExpr() ought to be sufficient
to guard the whole parser.  However, because transformSetOperationTree()
recurses all the way to the bottom of the set-operation tree before doing
any analysis of the statement's expressions, that check doesn't save it.
2012-11-11 19:56:10 -05:00
Alvaro Herrera
fa12cb7f02 Remove leftover LWLockRelease() call
This code was refactored in d5497b95 but an extra LWLockRelease call was
left behind.

Per report from Erik Rijkers
2012-11-09 10:19:34 -03:00
Tom Lane
3e7fdcffd6 Fix WaitLatch() to return promptly when the requested timeout expires.
If the sleep is interrupted by a signal, we must recompute the remaining
time to wait; otherwise, a steady stream of non-wait-terminating interrupts
could delay return from WaitLatch indefinitely.  This has been shown to be
a problem for the autovacuum launcher, and there may well be other places
now or in the future with similar issues.  So we'd better make the function
robust, even though this'll add at least one gettimeofday call per wait.

Back-patch to 9.2.  We might eventually need to fix 9.1 as well, but the
code is quite different there, and the usage of WaitLatch in 9.1 is so
limited that it's not clearly important to do so.

Reported and diagnosed by Jeff Janes, though I rewrote his patch rather
heavily.
2012-11-08 20:04:48 -05:00
Tom Lane
dcc55dd21a Rename ResolveNew() to ReplaceVarsFromTargetList(), and tweak its API.
This function currently lacks the option to throw error if the provided
targetlist doesn't have any matching entry for a Var to be replaced.
Two of the four existing call sites would be better off with an error,
as would the usage in the pending auto-updatable-views patch, so it seems
past time to extend the API to support that.  To do so, replace the "event"
parameter (historically of type CmdType, though it was declared plain int)
with a special-purpose enum type.

It's unclear whether this function might be called by third-party code.
Since many C compilers wouldn't warn about a call site continuing to use
the old calling convention, rename the function to forcibly break any
such code that hasn't been updated.  The old name was none too well chosen
anyhow.
2012-11-08 16:52:49 -05:00
Tom Lane
75af5ae9c0 Don't trash input list structure in does_not_exist_skipping().
The trigger and rule cases need to split up the input name list, but
they mustn't corrupt the passed-in data structure, since it could be part
of a cached utility-statement parsetree.  Per bug #7641.
2012-11-08 11:34:32 -05:00
Alvaro Herrera
4ee5c40b06 Don't try to use a unopened relation
Commit 4c9d0901 mistakenly introduced a call to
TransferPredicateLocksToHeapRelation() on an index relation that had
been closed a few lines above.  Moving up an index_open() call that's
below is enough to fix the problem.

Discovered by me while testing an unrelated patch.
2012-11-07 16:23:39 -03:00
Heikki Linnakangas
add6c3179a Make the streaming replication protocol messages architecture-independent.
We used to send structs wrapped in CopyData messages, which works as long as
the client and server agree on things like endianess, timestamp format and
alignment. That's good enough for running a standby server, which has to run
on the same platform anyway, but it's useful for tools like pg_receivexlog
to work across platforms.

This breaks protocol compatibility of streaming replication, but we never
promised that to be compatible across versions, anyway.
2012-11-07 19:09:13 +02:00
Tom Lane
5ed6546cf7 Fix handling of inherited check constraints in ALTER COLUMN TYPE.
This case got broken in 8.4 by the addition of an error check that
complains if ALTER TABLE ONLY is used on a table that has children.
We do use ONLY for this situation, but it's okay because the necessary
recursion occurs at a higher level.  So we need to have a separate
flag to suppress recursion without making the error check.

Reported and patched by Pavan Deolasee, with some editorial adjustments by
me.  Back-patch to 8.4, since this is a regression of functionality that
worked in earlier branches.
2012-11-05 13:36:16 -05:00
Tom Lane
19e36477b3 Limit the number of rel sets considered in consider_index_join_outer_rels.
In bug #7626, Brian Dunavant exposes a performance problem created by
commit 3b8968f252: that commit attempted to
consider *all* possible combinations of indexable join clauses, but if said
clauses join to enough different relations, there's an exponential increase
in the number of outer-relation sets considered.

In Brian's example, all the clauses come from the same equivalence class,
which means it's redundant to use more than one of them in an indexscan
anyway.  So we can prevent the problem in this class of cases (which is
probably the majority of real examples) by rejecting combinations that
would only serve to add a known-redundant clause.

But that still leaves us exposed to exponential growth of planning time
when the query has a lot of non-equivalence join clauses that are usable
with the same index.  I chose to prevent such cases by setting an upper
limit on the number of relation sets considered, equal to ten times the
number of index clauses considered so far.  (This sliding limit still
allows new relsets to be added on as we move to additional index columns,
which is probably more important than considering even more combinations of
clauses for the previous column.)  This should keep the amount of work done
roughly linear rather than exponential in the apparent query complexity.
This part of the fix is pretty ad-hoc; but without a clearer idea of
real-world cases for which this would result in markedly inferior plans,
it's hard to see how to do better.
2012-11-01 14:08:42 -04:00
Alvaro Herrera
2f1692d213 Fix erroneous choice of timeline variable, too 2012-10-31 17:05:55 -03:00
Alvaro Herrera
9b8dd7e8aa Fix erroneous choices of segNo variables
Commit dfda6eba (which changed segment numbers to use a single 64 bit
variable instead of log/seg) introduced a couple of bogus choices of
exactly which log segment number variable to use in each case.

This is currently pretty harmless; in one place, the bogus number was
only being used in an error message for a pretty unlikely condition
(failure to fsync a WAL segment file).  In the other, it was using a
global variable instead of the local variable; but all callsites were
passing the value of the global variable anyway.

No need to backpatch because that commit is not on earlier branches.
2012-10-31 11:05:28 -03:00
Alvaro Herrera
04f28bdb84 Fix ALTER EXTENSION / SET SCHEMA
In its original conception, it was leaving some objects into the old
schema, but without their proper pg_depend entries; this meant that the
old schema could be dropped, causing future pg_dump calls to fail on the
affected database.  This was originally reported by Jeff Frost as #6704;
there have been other complaints elsewhere that can probably be traced
to this bug.

To fix, be more consistent about altering a table's subsidiary objects
along the table itself; this requires some restructuring in how tables
are relocated when altering an extension -- hence the new
AlterTableNamespaceInternal routine which encapsulates it for both the
ALTER TABLE and the ALTER EXTENSION cases.

There was another bug lurking here, which was unmasked after fixing the
previous one: certain objects would be reached twice via the dependency
graph, and the second attempt to move them would cause the entire
operation to fail.  Per discussion, it seems the best fix for this is to
do more careful tracking of objects already moved: we now maintain a
list of moved objects, to avoid attempting to do it twice for the same
object.

Authors: Alvaro Herrera, Dimitri Fontaine
Reviewed by Tom Lane
2012-10-31 10:52:55 -03:00
Kevin Grittner
6868ed7491 Throw error if expiring tuple is again updated or deleted.
This prevents surprising behavior when a FOR EACH ROW trigger
BEFORE UPDATE or BEFORE DELETE directly or indirectly updates or
deletes the the old row.  Prior to this patch the requested action
on the row could be silently ignored while all triggered actions
based on the occurence of the requested action could be committed.
One example of how this could happen is if the BEFORE DELETE
trigger for a "parent" row deleted "children" which had trigger
functions to update summary or status data on the parent.

This also prevents similar surprising problems if the query has a
volatile function which updates a target row while it is already
being updated.

There are related issues present in FOR UPDATE cursors and READ
COMMITTED queries which are not handled by this patch.  These
issues need further evalution to determine what change, if any, is
needed.

Where the new error messages are generated, in most cases the best
fix will be to move code from the BEFORE trigger to an AFTER
trigger.  Where this is not feasible, the trigger can avoid the
error by re-issuing the triggering statement and returning NULL.

Documentation changes will be submitted in a separate patch.

Kevin Grittner and Tom Lane with input from Florian Pflug and
Robert Haas, based on problems encountered during conversion of
Wisconsin Circuit Court trigger logic to plpgsql triggers.
2012-10-26 14:55:36 -05:00
Tom Lane
17804fa71b Prefer actual constants to pseudo-constants in equivalence class machinery.
generate_base_implied_equalities_const() should prefer plain Consts over
other em_is_const eclass members when choosing the "pivot" value that
all the other members will be equated to.  This makes it more likely that
the generated equalities will be useful in constraint-exclusion proofs.
Per report from Rushabh Lathia.
2012-10-26 14:19:34 -04:00
Tom Lane
bf01e34b55 Tweak genericcostestimate's fudge factor for index size.
To provide some bias against using a large index when a small one would do
as well, genericcostestimate adds a "fudge factor", which for a long time
was random_page_cost * index_pages/10000.  However, this can grow to be the
dominant term in indexscan cost estimates when the index involved is large
enough, a behavior that was never intended.  Change to a ln(1 + n/10000)
formulation, which has nearly the same behavior up to a few hundred pages
but tails off significantly thereafter.  (A log curve seems correct on
first principles, since what we're trying to account for here is index
descent costs, which are typically logarithmic.)  Per bug #7619 from Niko
Kiirala.

Possibly this change should get back-patched, but I'm hesitant to mess with
cost estimates in stable branches.
2012-10-24 16:25:40 -04:00
Tom Lane
a4e8680a6c When converting a table to a view, remove its system columns.
Views should not have any pg_attribute entries for system columns.
However, we forgot to remove such entries when converting a table to a
view.  This could lead to crashes later on, if someone attempted to
reference such a column, as reported by Kohei KaiGai.

Patch in HEAD only.  This bug has been there forever, but in the back
branches we will have to defend against existing mis-converted views,
so it doesn't seem worthwhile to change the conversion code too.
2012-10-24 13:39:37 -04:00
Alvaro Herrera
f4c4335a4a Add context info to OAT_POST_CREATE security hook
... and have sepgsql use it to determine whether to check permissions
during certain operations.  Indexes that are being created as a result
of REINDEX, for instance, do not need to have their permissions checked;
they were already checked when the index was created.

Author: KaiGai Kohei, slightly revised by me
2012-10-23 18:24:24 -03:00
Kevin Grittner
4c9d0901f1 Correct predicate locking for DROP INDEX CONCURRENTLY.
For the non-concurrent case there is an AccessExclusiveLock lock
on both the index and the heap at a time during which no other
process is using either, before which the index is maintained and
used for scans, and after which the index is no longer used or
maintained.  Predicate locks can safely be moved from the index to
the related heap relation under the protection of these locks.
This was done prior to the introductin of DROP INDEX CONCURRENTLY
and continues to be done for non-concurrent index drops.

For concurrent index drops, the predicate locks must be moved when
there are no index scans in progress on that index and no more can
subsequently start, and before heap inserts stop maintaining the
index.  As long as these conditions are guaranteed when the
TransferPredicateLocksToHeapRelation() function is called,
stronger locks are not needed for correctness.

Kevin Grittner based on questions by Tom Lane in reviewing the
DROP INDEX CONCURRENTLY patch and in cooperation with Andres
Freund and Simon Riggs.
2012-10-21 16:35:42 -05:00
Tom Lane
5d1abe64e6 Fix UtilityContainsQuery() to handle CREATE TABLE AS EXECUTE correctly.
The code seems to have been written to handle the pre-parse-analysis
representation, where an ExecuteStmt would appear directly under
CreateTableAsStmt.  But in reality the function is only run on
already-parse-analyzed statements, so there will be a Query node in
between.  We'd not noticed the bug because the function is generally
not used at all except in extended query protocol.

Per report from Robert Haas and Rushabh Lathia.
2012-10-19 18:33:45 -04:00
Tom Lane
4e32f8cd14 Fix hash_search to avoid corruption of the hash table on out-of-memory.
An out-of-memory error during expand_table() on a palloc-based hash table
would leave a partially-initialized entry in the table.  This would not be
harmful for transient hash tables, since they'd get thrown away anyway at
transaction abort.  But for long-lived hash tables, such as the relcache
hash, this would effectively corrupt the table, leading to crash or other
misbehavior later.

To fix, rearrange the order of operations so that table enlargement is
attempted before we insert a new entry, rather than after adding it
to the hash table.

Problem discovered by Hitoshi Harada, though this is a bit different
from his proposed patch.
2012-10-19 15:24:03 -04:00
Tom Lane
0d6895051a Fix ruleutils to print "INSERT INTO foo DEFAULT VALUES" correctly.
Per bug #7615 from Marko Tiikkaja.  Apparently nobody ever tried this
case before ...
2012-10-19 13:39:51 -04:00
Simon Riggs
da85727565 Fix orphan on cancel of drop index concurrently.
Canceling DROP INDEX CONCURRENTLY during
wait could allow an orphaned index to be
left behind which could not be dropped.

Backpatch to 9.2

Andres Freund, tested by Abhijit Menon-Sen
2012-10-19 09:56:29 +01:00
Tom Lane
002191a1a3 Further cleanup of catcache.c ilist changes.
Remove useless duplicate initialization of bucket headers, don't use a
dlist_mutable_iter in a performance-critical path that doesn't need it,
make some other cosmetic changes for consistency's sake.
2012-10-18 19:30:43 -04:00
Tom Lane
dc5aeca168 Remove unnecessary "head" arguments from some dlist/slist functions.
dlist_delete, dlist_insert_after, dlist_insert_before, slist_insert_after
do not need access to the list header, and indeed insisting on that negates
one of the main advantages of a doubly-linked list.

In consequence, revert addition of "cache_bucket" field to CatCTup.
2012-10-18 19:04:20 -04:00
Tom Lane
8f8d746478 Code review for inline-list patch.
Make foreach macros less syntactically dangerous, and fix some typos in
evidently-never-tested ones.  Add missing slist_next_node and
slist_head_node functions.  Fix broken dlist_check code.  Assorted comment
improvements.
2012-10-18 16:47:07 -04:00
Simon Riggs
2f0e480d02 Re-think guts of DROP INDEX CONCURRENTLY.
Concurrent behaviour was flawed when using
a two-step process, so add an additional
phase of processing to ensure concurrency
for both SELECTs and INSERT/UPDATE/DELETEs.

Backpatch to 9.2

Andres Freund, tweaked by me
2012-10-18 18:58:30 +01:00
Tom Lane
72a4231f0c Fix planning of non-strict equivalence clauses above outer joins.
If a potential equivalence clause references a variable from the nullable
side of an outer join, the planner needs to take care that derived clauses
are not pushed to below the outer join; else they may use the wrong value
for the variable.  (The problem arises only with non-strict clauses, since
if an upper clause can be proven strict then the outer join will get
simplified to a plain join.)  The planner attempted to prevent this type
of error by checking that potential equivalence clauses aren't
outerjoin-delayed as a whole, but actually we have to check each side
separately, since the two sides of the clause will get moved around
separately if it's treated as an equivalence.  Bugs of this type can be
demonstrated as far back as 7.4, even though releases before 8.3 had only
a very ad-hoc notion of equivalence clauses.

In addition, we neglected to account for the possibility that such clauses
might have nonempty nullable_relids even when not outerjoin-delayed; so the
equivalence-class machinery lacked logic to compute correct nullable_relids
values for clauses it constructs.  This oversight was harmless before 9.2
because we were only using RestrictInfo.nullable_relids for OR clauses;
but as of 9.2 it could result in pushing constructed equivalence clauses
to incorrect places.  (This accounts for bug #7604 from Bill MacArthur.)

Fix the first problem by adding a new test check_equivalence_delay() in
distribute_qual_to_rels, and fix the second one by adding code in
equivclass.c and called functions to set correct nullable_relids for
generated clauses.  Although I believe the second part of this is not
currently necessary before 9.2, I chose to back-patch it anyway, partly to
keep the logic similar across branches and partly because it seems possible
we might find other reasons why we need valid values of nullable_relids in
the older branches.

Add regression tests illustrating these problems.  In 9.0 and up, also
add test cases checking that we can push constants through outer joins,
since we've broken that optimization before and I nearly broke it again
with an overly simplistic patch for this problem.
2012-10-18 12:30:10 -04:00
Tom Lane
ff3f9c8de5 Close un-owned SMgrRelations at transaction end.
If an SMgrRelation is not "owned" by a relcache entry, don't allow it to
live past transaction end.  This design allows the same SMgrRelation to be
used for blind writes of multiple blocks during a transaction, but ensures
that we don't hold onto such an SMgrRelation indefinitely.  Because an
SMgrRelation typically corresponds to open file descriptors at the fd.c
level, leaving it open when there's no corresponding relcache entry can
mean that we prevent the kernel from reclaiming deleted disk space.
(While CacheInvalidateSmgr messages usually fix that, there are cases
where they're not issued, such as DROP DATABASE.  We might want to add
some more sinval messaging for that, but I'd be inclined to keep this
type of logic anyway, since allowing VFDs to accumulate indefinitely
for blind-written relations doesn't seem like a good idea.)

This code replaces a previous attempt towards the same goal that proved
to be unreliable.  Back-patch to 9.1 where the previous patch was added.
2012-10-17 12:38:21 -04:00
Tom Lane
9bacf0e373 Revert "Use "transient" files for blind writes, take 2".
This reverts commit fba105b109.
That approach had problems with the smgr-level state not tracking what
we really want to happen, and with the VFD-level state not tracking the
smgr-level state very well either.  In consequence, it was still possible
to hold kernel file descriptors open for long-gone tables (as in recent
report from Tore Halset), and yet there were also cases of FDs being closed
undesirably soon.  A replacement implementation will follow.
2012-10-17 12:37:08 -04:00
Alvaro Herrera
a66ee69add Embedded list interface
Provide a common implementation of embedded singly-linked and
doubly-linked lists.  "Embedded" in the sense that the nodes'
next/previous pointers exist within some larger struct; this design
choice reduces memory allocation overhead.

Most of the implementation uses inlineable functions (where supported),
for performance.

Some existing uses of both types of lists have been converted to the new
code, for demonstration purposes.  Other uses can (and probably will) be
converted in the future.  Since dllist.c is unused after this conversion,
it has been removed.

Author: Andres Freund
Some tweaks by me
Reviewed by Tom Lane, Peter Geoghegan
2012-10-17 11:31:20 -03:00
Bruce Momjian
22cc3b35f4 When outputting the session id in log_line_prefix (%c) or in CSV log
output mode, cause the hex digits after the period to always be at least
four hex digits, with zero-padding.
2012-10-16 12:37:59 -04:00
Heikki Linnakangas
7d3ed5ae78 Fix typo in comment.
Fujii Masao
2012-10-15 13:01:31 +03:00
Tom Lane
e81e8f9342 Split up process latch initialization for more-fail-soft behavior.
In the previous coding, new backend processes would attempt to create their
self-pipe during the OwnLatch call in InitProcess.  However, pipe creation
could fail if the kernel is short of resources; and the system does not
recover gracefully from a FATAL error right there, since we have armed the
dead-man switch for this process and not yet set up the on_shmem_exit
callback that would disarm it.  The postmaster then forces an unnecessary
database-wide crash and restart, as reported by Sean Chittenden.

There are various ways we could rearrange the code to fix this, but the
simplest and sanest seems to be to split out creation of the self-pipe into
a new function InitializeLatchSupport, which must be called from a place
where failure is allowed.  For most processes that gets called in
InitProcess or InitAuxiliaryProcess, but processes that don't call either
but still use latches need their own calls.

Back-patch to 9.1, which has only a part of the latch logic that 9.2 and
HEAD have, but nonetheless includes this bug.
2012-10-14 22:59:56 -04:00
Tom Lane
8b728e5c6e Fix oversight in new code for printing rangetable aliases.
In commit 11e131854f, I missed the case of
a CTE RTE that doesn't have a user-defined alias, but does have an
alias assigned by set_rtable_names().  Per report from Peter Eisentraut.

While at it, refactor slightly to reduce code duplication.
2012-10-12 16:14:43 -04:00
Bruce Momjian
49ec613201 In our source code, make a copy of getopt's 'optarg' string arguments,
rather than just storing a pointer.
2012-10-12 13:35:43 -04:00
Tom Lane
a29f7ed554 Get rid of COERCE_DONTCARE.
We don't need this hack any more.
2012-10-12 13:35:00 -04:00
Tom Lane
71e58dcfb9 Make equal() ignore CoercionForm fields for better planning with casts.
This change ensures that the planner will see implicit and explicit casts
as equivalent for all purposes, except in the minority of cases where
there's actually a semantic difference (as reflected by having a 3-argument
cast function).  In particular, this fixes cases where the EquivalenceClass
machinery failed to consider two references to a varchar column as
equivalent if one was implicitly cast to text but the other was explicitly
cast to text, as seen in bug #7598 from Vaclav Juza.  We have had similar
bugs before in other parts of the planner, so I think it's time to fix this
problem at the core instead of continuing to band-aid around it.

Remove set_coercionform_dontcare(), which represents the band-aid
previously in use for allowing matching of index and constraint expressions
with inconsistent cast labeling.  (We can probably get rid of
COERCE_DONTCARE altogether, but I don't think removing that enum value in
back branches would be wise; it's possible there's third party code
referring to it.)

Back-patch to 9.2.  We could go back further, and might want to once this
has been tested more; but for the moment I won't risk destabilizing plan
choices in long-since-stable branches.
2012-10-12 12:11:22 -04:00
Tom Lane
4816d2ea32 Fix cross-type case in partial row matching for hashed subplans.
When hashing a subplan like "WHERE (a, b) NOT IN (SELECT x, y FROM ...)",
findPartialMatch() attempted to match rows using the hashtable's internal
equality operators, which of course are for x and y's datatypes.  What we
need to use are the potentially cross-type operators for a=x, b=y, etc.
Failure to do that leads to wrong answers or even crashes.  The scope for
problems is limited to cases where we have different types with compatible
hash functions (else we'd not be using a hashed subplan), but for example
int4 vs int8 can cause the problem.

Per bug #7597 from Bo Jensen.  This has been wrong since the hashed-subplan
code was written, so patch all the way back.
2012-10-11 12:22:13 -04:00
Heikki Linnakangas
6f60fdd701 Improve replication connection timeouts.
Rename replication_timeout to wal_sender_timeout, and add a new setting
called wal_receiver_timeout that does the same at the walreceiver side.
There was previously no timeout in walreceiver, so if the network went down,
for example, the walreceiver could take a long time to notice that the
connection was lost. Now with the two settings, both sides of a replication
connection will detect a broken connection similarly.

It is no longer necessary to manually set wal_receiver_status_interval to
a value smaller than the timeout. Both wal sender and receiver now
automatically send a "ping" message if more than 1/2 of the configured
timeout has elapsed, and it hasn't received any messages from the other end.

Amit Kapila, heavily edited by me.
2012-10-11 17:48:08 +03:00
Peter Eisentraut
8521d13194 Refactor flex and bison make rules
Numerous flex and bison make rules have appeared in the source tree
over time, and they are all virtually identical, so we can replace
them by pattern rules with some variables for customization.

Users of pgxs will also be able to benefit from this.
2012-10-11 06:57:04 -04:00
Tom Lane
864db11683 Update obsolete comment.
We no longer use GetNewOidWithIndex on pg_largeobject; rather,
pg_largeobject_metadata's regular OID column is considered the repository
of OIDs for large objects.  The special functionality is still needed for
TOAST tables however.
2012-10-10 17:04:37 -04:00
Tom Lane
3f88fa971a Fix PGXS support for building loadable modules on AIX.
Building a shlib on AIX requires use of the mkldexport.sh script, but we
failed to install that, preventing its use from non-source-tree contexts.
Also, Makefile.aix had the wrong idea about where to find the installed
copy of the postgres.imp symbol file used by AIX.

Per report from John Pierce.  Patch all the way back, since this has been
broken since the beginning of PGXS.
2012-10-09 21:04:06 -04:00
Tom Lane
7e0cce0265 Remove unnecessary overhead in backend's large-object operations.
Do read/write permissions checks at most once per large object descriptor,
not once per lo_read or lo_write call as before.  The repeated tests were
quite useless in the read case since the snapshot-based tests were
guaranteed to produce the same answer every time.  In the write case,
the extra tests could in principle detect revocation of write privileges
after a series of writes has started --- but there's a race condition there
anyway, since we'd check privileges before performing and certainly before
committing the write.  So there's no real advantage to checking every
single time, and we might as well redefine it as "only check the first
time".

On the same reasoning, remove the LargeObjectExists checks in inv_write
and inv_truncate.  We already checked existence when the descriptor was
opened, and checking again doesn't provide any real increment of safety
that would justify the cost.
2012-10-09 16:38:00 -04:00
Heikki Linnakangas
2d8c81ac86 Fix silly bug in previous refactoring.
I extracted the refactoring patch from a larger patch that contained other
changes too, but missed one unintentional change and didn't test enough...
2012-10-09 19:33:12 +03:00
Heikki Linnakangas
ff8f160bf4 Put the logic to wait for WAL in standby mode to a separate function.
This is just refactoring with no user-visible effect, to make the code more
readable.
2012-10-09 19:20:17 +03:00
Heikki Linnakangas
0b77aebabf Remove stray newline in comment. 2012-10-09 13:06:48 +03:00
Peter Eisentraut
b6d4522296 Remove generation of repl_gram.h
It was apparently never necessary.
2012-10-08 20:36:46 -04:00
Tom Lane
26fe56481c Code review for 64-bit-large-object patch.
Fix broken-on-bigendian-machines byte-swapping functions, add missed update
of alternate regression expected file, improve error reporting, remove some
unnecessary code, sync testlo64.c with current testlo.c (it seems to have
been cloned from a very old copy of that), assorted cosmetic improvements.
2012-10-08 18:24:32 -04:00
Alvaro Herrera
878daf2e72 Fix thinko in previous commit
Since postgres.h includes palloc.h, definitions that affect the latter
must be present before the former is included.

Per buildfarm results
2012-10-08 18:33:08 -03:00
Alvaro Herrera
976fa10d20 Add support for easily declaring static inline functions
We already had those, but they forced modules to spell out the function
bodies twice.  Eliminate some duplicates we had already grown.

Extracted from a somewhat larger patch from Andres Freund.
2012-10-08 16:28:01 -03:00
Heikki Linnakangas
b28cc92d7d Say ANALYZE, not VACUUM, in error message on analyze in hot standby.
Tomonaru Katsumata
2012-10-08 14:17:27 +03:00
Heikki Linnakangas
9c0e2b9182 Fix walsender handling of postmaster shutdown, to not go into endless loop.
This bug was introduced by my patch to use the regular die/quickdie signal
handlers in walsender processes. I tried to make walsender exit at next
CHECK_FOR_INTERRUPTS() by setting ProcDiePending, but that's not enough, you
need to set InterruptPending too. On second thoght, it was not a very good
way to make walsender exit anyway, so use proc_exit(0) instead.

Also, send a CommandComplete message before exiting; that's what we did
before, and you get a nicer error message in the standby that way.

Reported by Thom Brown.
2012-10-08 13:32:14 +03:00
Andrew Dunstan
ea72bb8ae5 Fix typo in previous MSC commit. 2012-10-07 19:56:26 -04:00
Andrew Dunstan
33a7101281 Quiet a few MSC compiler warnings. 2012-10-07 17:31:10 -04:00
Tatsuo Ishii
461ef73f09 Add API for 64-bit large object access. Now users can access up to
4TB large objects (standard 8KB BLCKSZ case).  For this purpose new
libpq API lo_lseek64, lo_tell64 and lo_truncate64 are added.  Also
corresponding new backend functions lo_lseek64, lo_tell64 and
lo_truncate64 are added. inv_api.c is changed to handle 64-bit
offsets.

Patch contributed by Nozomi Anzai (backend side) and Yugo Nagata
(frontend side, docs, regression tests and example program). Reviewed
by Kohei Kaigai. Committed by Tatsuo Ishii with minor editings.
2012-10-07 08:36:48 +09:00
Heikki Linnakangas
fd5942c18f Use the regular main processing loop also in walsenders.
The regular backend's main loop handles signal handling and error recovery
better than the current WAL sender command loop does. For example, if the
client hangs and a SIGTERM is received before starting streaming, the
walsender will now terminate immediately, rather than hang until the
connection times out.
2012-10-05 17:21:12 +03:00
Tom Lane
1997f34db4 getnameinfo_unix has to be taught not to insist on NI_NUMERIC flags, too.
Per testing of previous patch.
2012-10-04 22:54:18 -04:00
Peter Eisentraut
c424d0d105 Remove redundant code for getnameinfo() replacement
Our getnameinfo() replacement implementation in getaddrinfo.c failed
unless NI_NUMERICHOST and NI_NUMERICSERV were given as flags, because
it doesn't resolve host names, only numeric IPs.  But per standard,
when those flags are not given, an implementation can still degrade to
not returning host names, so this restriction is unnecessary.  When we
remove it, we can eliminate some code in postmaster.c that apparently
tried to work around that.
2012-10-04 21:45:14 -04:00
Tom Lane
e1e60694b4 Make CREATE AGGREGATE complain if the initcond is invalid for the datatype.
The initial transition value is stored as a text string and not fed to the
transition type's input function until runtime (so that values such as
"now" don't get frozen at creation time).  Previously, CREATE AGGREGATE
didn't do anything with it but that, which meant that even erroneous values
would be accepted and not complained of until the aggregate is used.  This
seems unhelpful, and it's confused at least one user, as in Rhys Stewart's
recent report.  It seems worth taking a few more cycles to invoke the input
function and verify that the value is acceptable.  We can't do this if the
transition type is polymorphic, but in normal aggregates we know the actual
transition type so we can call the right input function.
2012-10-04 17:54:53 -04:00
Tom Lane
707263542e Fix parse location tracking for lists that can be empty.
The previous coding of the YYLLOC_DEFAULT macro behaved strangely for empty
productions, assigning the previous nonterminal's location as the parse
location of the result.  The usefulness of that was (at best) debatable
already, but the real problem is that in list-generating nonterminals like
	OptFooList: /* EMPTY */ { ... } | OptFooList Foo { ... } ;
the initially-identified location would get copied up, so that even a
nonempty list would be given a bogus parse location.  Document how to work
around that, and do so for OptSchemaEltList, so that the error condition
just added for CREATE SCHEMA IF NOT EXISTS produces a sane error cursor.
So far as I can tell, there are currently no other cases where the
situation arises, so we don't need other instances of this coding yet.
2012-10-04 17:15:29 -04:00
Heikki Linnakangas
1a956481ba Fix typo in comment, and reword it slightly while we're at it. 2012-10-04 10:35:48 +03:00
Tom Lane
fb34e94d21 Support CREATE SCHEMA IF NOT EXISTS.
Per discussion, schema-element subcommands are not allowed together with
this option, since it's not very obvious what should happen to the element
objects.

Fabrízio de Royes Mello
2012-10-03 19:47:11 -04:00
Alvaro Herrera
994c36e01d refactor ALTER some-obj SET OWNER implementation
Remove duplicate implementation of catalog munging and miscellaneous
privilege and consistency checks.  Instead rely on already existing data
in objectaddress.c to do the work.

Author: KaiGai Kohei
Tweaked by me
Reviewed by Robert Haas
2012-10-03 18:07:46 -03:00
Tom Lane
1f91c8ca1d Avoid planner crash/Assert failure with joins to unflattened subqueries.
examine_simple_variable supposed that any RTE_SUBQUERY rel it gets pointed
at must have been planned already.  However, this isn't a safe assumption
because we must do selectivity estimation while generating indexscan paths,
and that code might look at join clauses involving a rel that the loop in
set_base_rel_sizes() hasn't reached yet.  The simplest fix is to play dumb
in such a situation, that is give up trying to extract any stats for the
Var.  This could possibly be improved by making a separate pass over the
RTE list to plan each unflattened subquery before we start the main
planning work --- but that would be pretty invasive and it doesn't seem
worth it, for now at least.  (We couldn't just break set_base_rel_sizes()
into two loops: the prescan would need to handle all subquery rels in the
query, not only those in the current join subproblem.)

This bug was introduced in commit 1cb108efb0,
although I think that subsequent changes may have exposed it more than it
was originally.  Per bug #7580 from Maxim Boguk.
2012-10-03 13:37:53 -04:00
Alvaro Herrera
fe3b5eb08a REASSIGN OWNED: consider grants on tablespaces, too
Apparently this was considered in the original code (see commit
cec3b0a9) but I failed to notice that such entries would always be
skipped by the database check at the start of the loop.

Per bugs #7578 by Nikolay, #6116 by tushar.qa@gmail.com.
2012-10-03 12:30:00 -03:00
Heikki Linnakangas
7ae1815961 Return the number of rows processed when COPY is executed through SPI.
You can now get the number of rows processed by a COPY statement in a
PL/pgSQL function with "GET DIAGNOSTICS x = ROW_COUNT".

Pavel Stehule, reviewed by Amit Kapila, with some editing by me.
2012-10-03 14:38:22 +03:00
Heikki Linnakangas
bc1229c832 Fix two bugs introduced in the xlog.c split.
The comment explaining the naming of timeline history files was wrong, and
the history file was not being arhived.

Pointed out by Fujii Masao.
2012-10-03 09:15:38 +03:00
Peter Eisentraut
6bd176095b Improve some LDAP authentication error messages 2012-10-02 23:25:05 -04:00
Tom Lane
09ac603c36 Work around unportable behavior of malloc(0) and realloc(NULL, 0).
On some platforms these functions return NULL, rather than the more common
practice of returning a pointer to a zero-sized block of memory.  Hack our
various wrapper functions to hide the difference by substituting a size
request of 1.  This is probably not so important for the callers, who
should never touch the block anyway if they asked for size 0 --- but it's
important for the wrapper functions themselves, which mistakenly treated
the NULL result as an out-of-memory failure.  This broke at least pg_dump
for the case of no user-defined aggregates, as per report from
Matthew Carrington.

Back-patch to 9.2 to fix the pg_dump issue.  Given the lack of previous
complaints, it seems likely that there is no live bug in previous releases,
even though some of these functions were in place before that.
2012-10-02 17:32:42 -04:00
Alvaro Herrera
2164f9a125 Refactor "ALTER some-obj SET SCHEMA" implementation
Instead of having each object type implement the catalog munging
independently, centralize knowledge about how to do it and expand the
existing table in objectaddress.c with enough data about each object
type to support this operation.

Author: KaiGai Kohei
Tweaks by me
Reviewed by Robert Haas
2012-10-02 18:13:54 -03:00
Heikki Linnakangas
93b6d78cf0 Add #includes needed on some platforms in the new files.
Hopefully this makes the *BSD buildfarm animals happy.
2012-10-02 17:19:52 +03:00
Heikki Linnakangas
d5497b95f3 Split off functions related to timeline history files and XLOG archiving.
This is just refactoring, to make the functions accessible outside xlog.c.
A followup patch will make use of that, to allow fetching timeline history
files over streaming replication.
2012-10-02 13:37:19 +03:00
Heikki Linnakangas
0899556e92 Fix access past end of string in date parsing.
This affects date_in(), and a couple of other funcions that use DecodeDate().

Hitoshi Harada
2012-10-02 10:43:48 +03:00
Bruce Momjian
dbdb2172a0 Add C comment that IsBackendPid() is called by external modules, so we
don't accidentally remove it.
2012-10-01 10:14:35 -04:00
Tom Lane
05b555d12b Fix tar files emitted by pg_dump and pg_basebackup to be POSIX conformant.
Both programs got the "magic" string wrong, causing standard-conforming tar
implementations to believe the output was just legacy tar format without
any POSIX extensions.  This doesn't actually matter that much, especially
since pg_dump failed to fill the POSIX fields anyway, but still there is
little point in emitting tar format if we can't be compliant with the
standard.  In addition, pg_dump failed to write the EOF marker correctly
(there should be 2 blocks of zeroes not just one), pg_basebackup put the
numeric group ID in the wrong place, and both programs had a pretty
brain-dead idea of how to compute the checksum.  Fix all that and improve
the comments a bit.

pg_restore is modified to accept either the correct POSIX-compliant "magic"
string or the previous value.  This part of the change will need to be
back-patched to avoid an unnecessary compatibility break when a previous
version tries to read tar-format output from 9.3 pg_dump.

Brian Weaver and Tom Lane
2012-09-28 15:19:15 -04:00
Peter Eisentraut
edc9109c42 Produce textual error messages for LDAP issues instead of numeric codes 2012-09-27 20:22:50 -04:00
Tom Lane
70bc583319 Fix btmarkpos/btrestrpos to handle array keys.
This fixes another error in commit 9e8da0f757.
I neglected to make the mark/restore functionality save and restore the
current set of array key values, which led to strange behavior if an
IndexScan with ScalarArrayOpExpr quals was used as the inner side of a
mergejoin.  Per bug #7570 from Melese Tesfaye.
2012-09-27 17:01:02 -04:00
Alvaro Herrera
ae90ffada4 Have pg_terminate/cancel_backend not ERROR on non-existent processes
This worked fine for superusers, but not for ordinary users trying to
cancel their own processes.  Tweak the order the checks are done in so
that we correctly return SIGNAL_BACKEND_ERROR (which current callers
know to ignore without erroring out) so that an ordinary user can loop
through a resultset without fearing that a process might exit in the
middle of said looping -- causing the remaining processes to go
unsignalled.

Incidentally, the last in-core caller of IsBackendPid() is now gone.
However, the function is exported and must remain in place, because
there are plenty of callers in external modules.

Author: Josh Kupershmidt

Reviewed by Noah Misch
2012-09-27 12:29:51 -03:00
Tom Lane
55c1687a97 Run check_keywords.pl anytime gram.c is rebuilt.
This script is a bit slow, but still it only takes a fraction of the time
the bison run does, so the overhead doesn't seem intolerable.  And we
definitely need some mechanical aid here, because people keep missing
the need to add new keywords to the appropriate keyword-list production.

While at it, I moved check_keywords.pl from src/tools into
src/backend/parser where it's actually used, and did some very minor
cleanup on the script.
2012-09-26 23:12:39 -04:00
Tom Lane
fc68ac86b1 Add new EVENT keyword to unreserved_keyword production.
Once again, somebody who ought to know better forgot this.  We really
need some automated cross-check on the keyword-list productions, I think.
Per report from Brian Weaver.
2012-09-26 20:07:36 -04:00
Heikki Linnakangas
2a0c81a12c Add support for include_dir in config file.
This allows easily splitting configuration into many files, deployed in a
directory.

Magnus Hagander, Greg Smith, Selena Deckelmann, reviewed by Noah Misch.
2012-09-24 18:07:53 +03:00
Tom Lane
31510194cc Minor corrections for ALTER TYPE ADD VALUE IF NOT EXISTS patch.
Produce a NOTICE when the label already exists, for consistency with other
CREATE IF NOT EXISTS commands.  Also, fix the code so it produces something
more user-friendly than an index violation when the label already exists.
This not incidentally enables making a regression test that the previous
patch didn't make for fear of exposing an unpredictable OID in the results.
Also some wordsmithing on the documentation.
2012-09-22 18:35:22 -04:00
Andrew Dunstan
6d12b68cd7 Allow IF NOT EXISTS when add a new enum label.
If the label is already in the enum the statement becomes a no-op.
This will reduce the pain that comes from our not allowing this
operation inside a transaction block.

Andrew Dunstan, reviewed by Tom Lane and Magnus Hagander.
2012-09-22 12:53:31 -04:00
Tom Lane
11e131854f Improve ruleutils.c's heuristics for dealing with rangetable aliases.
The previous scheme had bugs in some corner cases involving tables that had
been renamed since a view was made.  This could result in dumped views that
failed to reload or reloaded incorrectly, as seen in bug #7553 from Lloyd
Albin, as well as in some pgsql-hackers discussion back in January.  Also,
its behavior for printing EXPLAIN plans was sometimes confusing because of
willingness to use the same alias for multiple RTEs (it was Ashutosh
Bapat's complaint about that aspect that started the January thread).

To fix, ensure that each RTE in the query has a unique unqualified alias,
by modifying the alias if necessary (we add "_" and digits as needed to
create a non-conflicting name).  Then we can just print its variables with
that alias, avoiding the confusing and bug-prone scheme of sometimes
schema-qualifying variable names.  In EXPLAIN, it proves to be expedient to
take the further step of only assigning such aliases to RTEs that are
actually referenced in the query, since the planner has a habit of
generating extra RTEs with the same alias in situations such as
inheritance-tree expansion.

Although this fixes a bug of very long standing, I'm hesitant to back-patch
such a noticeable behavioral change.  My experiments while creating a
regression test convinced me that actually incorrect output (as opposed to
confusing output) occurs only in very narrow cases, which is backed up by
the lack of previous complaints from the field.  So we may be better off
living with it in released branches; and in any case it'd be smart to let
this ripen awhile in HEAD before we consider back-patching it.
2012-09-21 19:03:10 -04:00
Heikki Linnakangas
7c45e3a3c6 Parse pg_ident.conf when it's loaded, keeping it in memory in parsed format.
Similar changes were done to pg_hba.conf earlier already, this commit makes
pg_ident.conf to behave the same as pg_hba.conf.

This has two user-visible effects. First, if pg_ident.conf contains multiple
errors, the whole file is parsed at postmaster startup time and all the
errors are immediately reported. Before this patch, the file was parsed and
the errors were reported only when someone tries to connect using an
authentication method that uses the file, and the parsing stopped on first
error. Second, if you SIGHUP to reload the config files, and the new
pg_ident.conf file contains an error, the error is logged but the old file
stays in effect.

Also, regular expressions in pg_ident.conf are now compiled only once when
the file is loaded, rather than every time the a user is authenticated. That
should speed up authentication if you have a lot of regexps in the file.

Amit Kapila
2012-09-21 17:54:39 +03:00
Heikki Linnakangas
9d5e9730e5 Fix obsolete comment.
load_hba and load_ident load stuff in a separate memory context nowadays,
not in the current memory context.
2012-09-21 15:22:56 +03:00
Alvaro Herrera
22c734fcdb Remove execdesc.h inclusion from tcopprot.h 2012-09-20 11:07:59 -03:00
Tom Lane
96cc18eef6 Put back AcceptInvalidationMessages calls in heap_openrv(_extended).
These calls were removed in commit 4240e429d0
as part of a general refactoring and improvement of DDL locking.  However,
there's a problem not solved by the rewrite, which is that GRANT/REVOKE
update pg_class.relacl without taking any particular lock on the target
table as such.  If another backend fails to do AcceptInvalidationMessages,
it won't notice a recently-committed change in ACLs.  Bug #7557 from Piotr
Czachur demonstrates that there's at least one code path in 9.2.0 in which
a command fails to do any AcceptInvalidationMessages calls at all, if the
current transaction already holds all the locks it will need.

Since we're hard up against the release deadline for 9.2.1, fix this by
putting back the AcceptInvalidationMessages calls in heap_openrv and
heap_openrv_extended, thereby restoring the historical behavior in this
area.  We ought to look for a more elegant and perhaps more bulletproof
solution, but there's no time for that right now.
2012-09-19 17:10:37 -04:00
Tom Lane
807a40c551 Fix planning of btree index scans using ScalarArrayOpExpr quals.
In commit 9e8da0f757, I improved btree
to handle ScalarArrayOpExpr quals natively, so that constructs like
"indexedcol IN (list)" could be supported by index-only scans.  Using
such a qual results in multiple scans of the index, under-the-hood.
I went to some lengths to ensure that this still produces rows in index
order ... but I failed to recognize that if a higher-order index column
is lacking an equality constraint, rescans can produce out-of-order
data from that column.  Tweak the planner to not expect sorted output
in that case.  Per trouble report from Robert McGehee.
2012-09-18 12:20:34 -04:00
Tom Lane
3f828fae62 Fix array_typanalyze to work for domains over arrays.
Not sure how we missed this case, but we did.  Per bug #7551 from
Diego de Lima.
2012-09-18 00:31:40 -04:00
Tom Lane
3b8968f252 Rethink heuristics for choosing index quals for parameterized paths.
Some experimentation with examples similar to bug #7539 has convinced me
that indxpath.c's original implementation of parameterized-path generation
was several bricks shy of a load.  In general, if we are relying on a
particular outer rel or set of outer rels for a parameterized path, the
path should use every indexable join clause that's available from that rel
or rels.  Any join clauses that get left out of the indexqual will end up
getting applied as plain filter quals (qpquals), and that's generally a
significant loser compared to having the index AM enforce them.  (This is
particularly true with btree, which can skip the index scan entirely if
it can see that the given indexquals are mutually contradictory.)  The
original heuristics failed to ensure this, though, and were overly
complicated anyway.  Rewrite to make the code explicitly identify each
useful set of outer rels and then select all applicable join clauses for
each one.  The one plan that changes in the regression tests is in fact
for the better according to the planner's cost estimates.

(Note: this is not a correctness issue but just a matter of plan quality.
I don't yet know what is going on in bug #7539, but I don't expect this
change to fix that.)
2012-09-16 17:58:09 -04:00
Simon Riggs
64e196b6ef Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown checkpoint.
Recovery code documents clearly that a shutdown checkpoint is executed at
end of recovery - a shutdown checkpoint WAL record is written but the buffer
manager had been altered to treat end of recovery as a normal checkpoint.
This bug exacerbates the bufmgr relpersistence bug.

Bug spotted by Andres Freund, patch by me.
2012-09-16 19:53:34 +01:00
Robert Haas
beb850e1d8 Properly set relpersistence for fake relcache entries.
This can result in buffers failing to be properly flushed at
checkpoint time, leading to data loss.

Report, diagnosis, and patch by Jeff Davis.
2012-09-14 09:35:07 -04:00
Tom Lane
a20993608a Fix case of window function + aggregate + GROUP BY expression.
In commit 1bc16a9460 I added a minor
optimization to drop the component variables of a GROUP BY expression from
the target list computed at the aggregation level of a query, if those Vars
weren't referenced elsewhere in the tlist.  However, I overlooked that the
window-function planning code would deconstruct such expressions and thus
need to have access to their component variables.  Fix it to not do that.

While at it, I removed the distinction between volatile and nonvolatile
window partition/order expressions: the code now computes all of them
at the aggregation level.  This saves a relatively expensive check for
volatility, and it's unclear that the resulting plan isn't better anyway.

Per bug #7535 from Louis-David Mitterrand.  Back-patch to 9.2.
2012-09-13 11:32:25 -04:00
Tom Lane
9a93e71008 Fix a couple other leftover uses of 'conisonly' terminology. 2012-09-12 15:12:24 -04:00
Tom Lane
1faf866ace Fix logical errors in tsquery selectivity estimation for prefix queries.
I made multiple errors in commit 97532f7c29,
stemming mostly from failure to think about the available frequency data
as being element frequencies not value frequencies (so that occurrences of
different elements are not mutually exclusive).  This led to sillinesses
such as estimating that "word" would match more rows than "word:*".

The choice to clamp to a minimum estimate of DEFAULT_TS_MATCH_SEL also
seems pretty ill-considered in hindsight, as it would frequently result in
an estimate much larger than the available data suggests.  We do need some
sort of clamp, since a pattern not matching any of the MCELEMs probably
still needs a selectivity estimate of more than zero.  I chose instead to
clamp to at least what a non-MCELEM word would be estimated as, preserving
the property that "word:*" doesn't get an estimate less than plain "word",
whether or not the word appears in MCELEM.

Per investigation of a gripe from Bill Martin, though I suspect that his
example case actually isn't even reaching the erroneous code.

Back-patch to 9.1 where this code was introduced.
2012-09-11 21:23:20 -04:00
Tom Lane
d2286a98ef Allow embedded spaces without quoting in unix_socket_directories entries.
This fix removes an unnecessary incompatibility with the old behavior of
the unix_socket_directory parameter.  Since pathnames with embedded spaces
are fairly popular on some platforms, the incompatibility could be
significant in practice.  We'll still strip unquoted leading/trailing
spaces, however.

No docs update since the documentation already implied that it worked
like this.

Per bug #7514 from Murray Cumming.
2012-09-06 11:43:51 -04:00
Heikki Linnakangas
ab9a14e903 Fix WAL file replacement during cascading replication on Windows.
When the startup process restores a WAL file from the archive, it deletes
any old file with the same name and renames the new file in its place. On
Windows, however, when a file is deleted, it still lingers as long as a
process holds a file handle open on it. With cascading replication, a
walsender process can hold the old file open, so the rename() in the startup
process would fail. To fix that, rename the old file to a temporary name, to
make the original file name available for reuse, before deleting the old
file.
2012-09-05 18:52:12 -07:00
Tom Lane
2e0cc1f031 Fix inappropriate error messages for Hot Standby misconfiguration errors.
Give the correct name of the GUC parameter being complained of.
Also, emit a more suitable SQLSTATE (INVALID_PARAMETER_VALUE,
not the default INTERNAL_ERROR).

Gurjeet Singh, errcode adjustment by me
2012-09-05 21:49:08 -04:00
Tom Lane
46c508fbcf Fix PARAM_EXEC assignment mechanism to be safe in the presence of WITH.
The planner previously assumed that parameter Vars having the same absolute
query level, varno, and varattno could safely be assigned the same runtime
PARAM_EXEC slot, even though they might be different Vars appearing in
different subqueries.  This was (probably) safe before the introduction of
CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can
be executed during execution of some other subquery, causing the lifespan
of Params at the same syntactic nesting level as the CTE to overlap with
use of the same slots inside the CTE.  In 9.1 we created additional hazards
by using the same parameter-assignment technology for nestloop inner scan
parameters, but it was broken before that, as illustrated by the added
regression test.

To fix, restructure the planner's management of PlannerParamItems so that
items having different semantic lifespans are kept rigorously separated.
This will probably result in complex queries using more runtime PARAM_EXEC
slots than before, but the slots are cheap enough that this hardly matters.
Also, stop generating PlannerParamItems containing Params for subquery
outputs: all we really need to do is reserve the PARAM_EXEC slot number,
and that now only takes incrementing a counter.  The planning code is
simpler and probably faster than before, as well as being more correct.

Per report from Vik Reykja.

These changes will mostly also need to be made in the back branches, but
I'm going to hold off on that until after 9.2.0 wraps.
2012-09-05 12:55:01 -04:00
Alvaro Herrera
e20a90e188 Trim spgist_private.h inclusion
It doesn't really need rel.h; relcache.h is enough.
2012-09-05 11:06:51 -03:00
Heikki Linnakangas
358ff99d70 Fix compiler warnings about unused variables, caused by my previous commit.
Reported by Peter Eisentraut.
2012-09-04 22:07:35 -07:00
Heikki Linnakangas
c4c227477b Fix bugs in cascading replication with recovery_target_timeline='latest'
The cascading replication code assumed that the current RecoveryTargetTLI
never changes, but that's not true with recovery_target_timeline='latest'.
The obvious upshot of that is that RecoveryTargetTLI in shared memory needs
to be protected by a lock. A less obvious consequence is that when a
cascading standby is connected, and the standby switches to a new target
timeline after scanning the archive, it will continue to stream WAL to the
cascading standby, but from a wrong file, ie. the file of the previous
timeline. For example, if the standby is currently streaming from the middle
of file 000000010000000000000005, and the timeline changes, the standby
will continue to stream from that file. However, the WAL on the new
timeline is in file 000000020000000000000005, so the standby sends garbage
from 000000010000000000000005 to the cascading standby, instead of the
correct WAL from file 000000020000000000000005.

This also fixes a related bug where a partial WAL segment is restored from
the archive and streamed to a cascading standby. The code assumed that when
a WAL segment is copied from the archive, it can immediately be fully
streamed to a cascading standby. However, if the segment is only partially
filled, ie. has the right size, but only N first bytes contain valid WAL,
that's not safe. That can happen if a partial WAL segment is manually copied
to the archive, or if a partial WAL segment is archived because a server is
started up on a new timeline within that segment. The cascading standby will
get confused if the WAL it received is not valid, and will get stuck until
it's restarted. This patch fixes that problem by not allowing WAL restored
from the archive to be streamed to a cascading standby until it's been
replayed, and thus validated.
2012-09-04 19:33:21 -07:00
Kevin Grittner
cdf91edba9 Fix serializable mode with index-only scans.
Serializable Snapshot Isolation used for serializable transactions
depends on acquiring SIRead locks on all heap relation tuples which
are used to generate the query result, so that a later delete or
update of any of the tuples can flag a read-write conflict between
transactions.  This is normally handled in heapam.c, with tuple level
locking.  Since an index-only scan avoids heap access in many cases,
building the result from the index tuple, the necessary predicate
locks were not being acquired for all tuples in an index-only scan.

To prevent problems with tuple IDs which are vacuumed and re-used
while the transaction still matters, the xmin of the tuple is part of
the tag for the tuple lock.  Since xmin is not available to the
index-only scan for result rows generated from the index tuples, it
is not possible to acquire a tuple-level predicate lock in such
cases, in spite of having the tid.  If we went to the heap to get the
xmin value, it would no longer be an index-only scan.  Rather than
prohibit index-only scans under serializable transaction isolation,
we acquire an SIRead lock on the page containing the tuple, when it
was not necessary to visit the heap for other reasons.

Backpatch to 9.2.

Kevin Grittner and Tom Lane
2012-09-04 21:13:11 -05:00
Magnus Hagander
bd46b52199 Remove some useless trailing whitespace
Michael Paquier
2012-09-04 09:17:14 +02:00
Bruce Momjian
015722fb36 Fix to_date() and to_timestamp() to allow specification of the day of
the week via ISO or Gregorian designations.  The fix is to store the
day-of-week consistently as 1-7, Sunday = 1.

Fixes bug reported by Marc Munro
2012-09-03 22:52:44 -04:00
Tom Lane
2a2352e07d Replace memcpy() calls in xlog.c critical sections with struct assignments.
This gets rid of a dangerous-looking use of the not-volatile XLogCtl
pointer in a couple of spinlock-protected sections, where the normal
coding rule is that you should only access shared memory through a
pointer-to-volatile.  I think the risk is only hypothetical not actual,
since for there to be a bug the compiler would have to move the spinlock
acquire or release across the memcpy() call, which one sincerely hopes
it will not.  Still, it looks cleaner this way.

Per comment from Daniel Farina and subsequent discussion.
2012-09-03 15:39:15 -04:00
Tom Lane
6d2c8c0e2a Drop cheap-startup-cost paths during add_path() if we don't need them.
We can detect whether the planner top level is going to care at all about
cheap startup cost (it will only do so if query_planner's tuple_fraction
argument is greater than zero).  If it isn't, we might as well discard
paths immediately whose only advantage over others is cheap startup cost.
This turns out to get rid of quite a lot of paths in complex queries ---
I saw planner runtime reduction of more than a third on one large query.

Since add_path isn't currently passed the PlannerInfo "root", the easiest
way to tell it whether to do this was to add a bool flag to RelOptInfo.
That's a bit redundant, since all relations in a given query level will
have the same setting.  But in the future it's possible that we'd refine
the control decision to work on a per-relation basis, so this seems like
a good arrangement anyway.

Per my suggestion of a few months ago.
2012-09-01 18:16:24 -04:00
Tom Lane
4da6439bd8 Fix mark_placeholder_maybe_needed to handle LATERAL references.
If a PlaceHolderVar contains a pulled-up LATERAL reference, its minimum
possible evaluation level might be higher in the join tree than its
original syntactic location.  That in turn affects the ph_needed level for
any contained PlaceHolderVars (that is, those PHVs had better propagate up
the join tree at least to the evaluation level of the outer PHV).  We got
this mostly right, but mark_placeholder_maybe_needed() failed to account
for the effect, and in consequence could leave the inner PHVs with
ph_may_need less than what their ultimate ph_needed value will be.  That's
bad because it could lead to failure to select a join order that will allow
evaluation of the inner PHV at a valid location.  Fix that, and add an
Assert that checks that we don't ever set ph_needed to more than
ph_may_need.
2012-09-01 13:56:46 -04:00
Tom Lane
c97a547a4a Partially restore qual scope checks in distribute_qual_to_rels().
The LATERAL implementation is now basically complete, and I still don't
see a cost-effective way to make an exact qual scope cross-check in the
presence of LATERAL.  However, I did add a PlannerInfo.hasLateralRTEs flag
along the way, so it's easy to make the check only when not hasLateralRTEs.
That seems to still be useful, and it beats having no check at all.
2012-08-31 18:57:12 -04:00
Tom Lane
da3df99870 Fix LATERAL references to join alias variables.
I had thought this case worked already, but perhaps I didn't re-test it
after adding extract_lateral_references() ...
2012-08-31 17:44:31 -04:00
Tom Lane
58a031f920 Make configure probe for mbstowcs_l as well as wcstombs_l.
We previously supposed that any given platform would supply both or neither
of these functions, so that one configure test would be sufficient.  It now
appears that at least on AIX this is not the case ... which is likely an
AIX bug, but nonetheless we need to cope with it.  So use separate tests.
Per bug #6758; thanks to Andrew Hastie for doing the followup testing
needed to confirm what was happening.

Backpatch to 9.1, where we began using these functions.
2012-08-31 14:17:56 -04:00
Heikki Linnakangas
fe811ae810 Fix typos in README. 2012-08-31 11:30:11 +03:00
Tom Lane
e5db11c558 Improve coding of gistchoose and gistRelocateBuildBuffersOnSplit.
This is mostly cosmetic, but it does eliminate a speculative portability
issue.  The previous coding ignored the fact that sum_grow could easily
overflow (in fact, it could be summing multiple IEEE float infinities).
On a platform where that didn't guarantee to produce a positive result,
the code would misbehave.  In any case, it was less than readable.
2012-08-30 22:53:17 -04:00
Alvaro Herrera
c219d9b0a5 Split tuple struct defs from htup.h to htup_details.h
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.

I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h.  In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
2012-08-30 16:52:35 -04:00
Bruce Momjian
381a9ed66d Remove configure flag --disable-shared, as it is no longer used by any
port.  The last use was QNX, per Peter Eisentraut.
2012-08-30 16:26:53 -04:00
Tom Lane
77387f0ac8 Suppress creation of backwardly-indexed paths for LATERAL join clauses.
Given a query such as

SELECT * FROM foo JOIN LATERAL (SELECT foo.var1) ss(x) ON ss.x = foo.var2

the existence of the join clause "ss.x = foo.var2" encourages indxpath.c to
build a parameterized path for foo using any index available for foo.var2.
This is completely useless activity, though, since foo has got to be on the
outside not the inside of any nestloop join with ss.  It's reasonably
inexpensive to add tests that prevent creation of such paths, so let's do
that.
2012-08-30 14:33:00 -04:00
Heikki Linnakangas
3e6eb0dd0a Fix division by zero in the new range type histogram creation.
Report and analysis by Matthias.
2012-08-30 20:29:11 +03:00
Robert Haas
a66fca3f0c Add missing period to detail message.
Per note from Peter Eisentraut.
2012-08-30 13:26:45 -04:00
Robert Haas
c8ba697a4b Fix logic bug in gistchoose and gistRelocateBuildBuffersOnSplit.
Every time the best-tuple-found-so-far changes, we need to reset all
the penalty values in which_grow[] to the penalties for the new best
tuple.  The old code failed to do this, resulting in inferior index
quality.

The original patch from Alexander Korotkov was just two lines; I took
the liberty of fleshing that out by adding a bunch of comments that I
hope will make this logic easier for others to understand than it was
for me.
2012-08-30 13:09:07 -04:00
Tom Lane
d1a4db8d25 Improve EXPLAIN's ability to cope with LATERAL references in plans.
push_child_plan/pop_child_plan didn't bother to adjust the "ancestors"
list of parent plan nodes when descending to a child plan node.  I think
this was okay when it was written, but it's not okay in the presence of
LATERAL references, since a subplan node could easily be returning a
LATERAL value back up to the same nestloop node that provides the value.
Per changed regression test results, the omission led to failure to
interpret Param nodes that have perfectly good interpretations.
2012-08-30 12:56:50 -04:00
Robert Haas
e1a6375d8f Comment fixes.
Jeff Davis, somewhat edited by me
2012-08-30 10:42:28 -04:00
Tom Lane
e83bb10d6d Adjust definition of cheapest_total_path to work better with LATERAL.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths.  It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization.  (The first rule is actually
a special case of the second.)

This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available.  This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.

Along the way, simplify management of parameterized paths in add_path()
and friends.  In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible.  But we didn't take that reasoning as far as
we should have.  Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well.  Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels.  So this
should be a bit faster as well as simpler.
2012-08-29 22:06:07 -04:00
Bruce Momjian
3825963e7f Report postmaster.pid file as empty if it is empty, rather than
reporting in contains invalid data.
2012-08-29 17:05:22 -04:00
Heikki Linnakangas
c82dedb7a8 Optimize SP-GiST insertions.
This includes two micro-optimizations to the tight inner loop in descending
the SP-GiST tree: 1. avoid an extra function call to index_getprocinfo when
calling user-defined choose function, and 2. avoid a useless palloc+pfree
when node labels are not used.
2012-08-29 09:21:20 +03:00
Alvaro Herrera
21c09e99dc Split heapam_xlog.h from heapam.h
The heapam XLog functions are used by other modules, not all of which
are interested in the rest of the heapam API.  With this, we let them
get just the XLog stuff in which they are interested and not pollute
them with unrelated includes.

Also, since heapam.h no longer requires xlog.h, many files that do
include heapam.h no longer get xlog.h automatically, including a few
headers.  This is useful because heapam.h is getting pulled in by
execnodes.h, which is in turn included by a lot of files.
2012-08-28 19:02:00 -04:00
Alvaro Herrera
fda0594fc2 remove catcache.h from syscache.h
Instead, place a forward struct declaration for struct catclist in
syscache.h.  This reduces header proliferation somewhat.
2012-08-28 18:36:39 -04:00
Alvaro Herrera
45326c5a11 Split resowner.h
This lets files that are mere users of ResourceOwner not automatically
include the headers for stuff that is managed by the resowner mechanism.
2012-08-28 18:02:07 -04:00
Tom Lane
e323c55301 Fix DROP INDEX CONCURRENTLY IF EXISTS.
This threw ERROR, not the expected NOTICE, if the index didn't exist.
The bug was actually visible in not-as-expected regression test output,
so somebody wasn't paying too close attention in commit
8cb53654db.
Per report from Brendan Byrd.
2012-08-27 12:45:43 -04:00
Heikki Linnakangas
918eee0c49 Collect and use histograms of lower and upper bounds for range types.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.

The fraction of empty ranges is also calculated and used in estimation.

Alexander Korotkov, heavily modified by me.
2012-08-27 15:58:46 +03:00
Tom Lane
9ff79b9d4e Fix up planner infrastructure to support LATERAL properly.
This patch takes care of a number of problems having to do with failure
to choose valid join orders and incorrect handling of lateral references
pulled up from subqueries.  Notable changes:

* Add a LateralJoinInfo data structure similar to SpecialJoinInfo, to
represent join ordering constraints created by lateral references.
(I first considered extending the SpecialJoinInfo structure, but the
semantics are different enough that a separate data structure seems
better.)  Extend join_is_legal() and related functions to prevent trying
to form unworkable joins, and to ensure that we will consider joins that
satisfy lateral references even if the joins would be clauseless.

* Fill in the infrastructure needed for the last few types of relation scan
paths to support parameterization.  We'd have wanted this eventually
anyway, but it is necessary now because a relation that gets pulled up out
of a UNION ALL subquery may acquire a reltargetlist containing lateral
references, meaning that its paths *have* to be parameterized whether or
not we have any code that can push join quals down into the scan.

* Compute data about lateral references early in query_planner(), and save
in RelOptInfo nodes, to avoid repetitive calculations later.

* Assorted corner-case bug fixes.

There's probably still some bugs left, but this is a lot closer to being
real than it was before.
2012-08-26 22:50:23 -04:00
Bruce Momjian
3e1a373e2b Allow text timezone designations, e.g. "America/Chicago", when using the
ISO "T" timestamptz format.
2012-08-25 17:44:53 -04:00
Tom Lane
7abaa6b9d3 Fix issues with checks for unsupported transaction states in Hot Standby.
The GUC check hooks for transaction_read_only and transaction_isolation
tried to check RecoveryInProgress(), so as to disallow setting read/write
mode or serializable isolation level (respectively) in hot standby
sessions.  However, GUC check hooks can be called in many situations where
we're not connected to shared memory at all, resulting in a crash in
RecoveryInProgress().  Among other cases, this results in EXEC_BACKEND
builds crashing during child process start if default_transaction_isolation
is serializable, as reported by Heikki Linnakangas.  Protect those calls
by silently allowing any setting when not inside a transaction; which is
okay anyway since these GUCs are always reset at start of transaction.

Also, add a check to GetSerializableTransactionSnapshot() to complain
if we are in hot standby.  We need that check despite the one in
check_XactIsoLevel() because default_transaction_isolation could be
serializable.  We don't want to complain any sooner than this in such
cases, since that would prevent running transactions at all in such a
state; but a transaction can be run, if SET TRANSACTION ISOLATION is done
before setting a snapshot.  Per report some months ago from Robert Haas.

Back-patch to 9.1, since these problems were introduced by the SSI patch.

Kevin Grittner and Tom Lane, with ideas from Heikki Linnakangas
2012-08-24 13:09:04 -04:00
Tom Lane
ec8a0135c3 Fix cascading privilege revoke to notice when privileges are still held.
If we revoke a grant option from some role X, but X still holds the option
via another grant, we should not recursively revoke the privilege from
role(s) Y that X had granted it to.  This was supposedly fixed as one
aspect of commit 4b2dafcc0b, but I must not
have tested it, because in fact that code never worked: it forgot to shift
the grant-option bits back over when masking the bits being revoked.

Per bug #6728 from Daniel German.  Back-patch to all active branches,
since this has been wrong since 8.0.
2012-08-23 17:25:10 -04:00
Tom Lane
10685ec082 Avoid somewhat-theoretical overflow risks in RecordIsValid().
This improves on commit 51fed14d73 by
eliminating the assumption that we can form <some pointer value> +
<some offset> without overflow.  The entire point of those tests is that
we don't trust the offset value, so coding them in a way that could wrap
around if the buffer happens to be near the top of memory doesn't seem
sound.  Instead, track the remaining space as a size_t variable and
compare offsets against that.

Also, improve comment about why we need the extra early check on
xl_tot_len.
2012-08-21 18:41:52 -04:00
Robert Haas
4b373e42d1 Improve C comments in GetSnapshotData.
Move discussion of why our algorithm for taking snapshots in recovery
to a more appropriate location in the function, and delete incorrect
mention of taking a lock.
2012-08-21 11:47:10 -04:00
Peter Eisentraut
ffdd5a0ee3 Remove external PID file on postmaster exit 2012-08-20 23:47:11 -04:00
Heikki Linnakangas
51fed14d73 Don't get confused if a WAL partial record header has xl_tot_len == 0.
If a WAL record header was split across pages, but xl_tot_len was 0, we
would get confused and conclude that we had already read the whole record,
and proceed to CRC check it. That can lead to a crash in RecordIsValid(),
which isn't careful to not read beyond end-of-record, as defined by
xl_tot_len.

Add an explicit sanity check for xl_tot_len <= SizeOfXlogRecord. Also,
make RecordIsValid() more robust by checking in each step that it doesn't
try to access memory beyond end of record, even if a length field in the
record's or a backup block's header is bogus.

Per report and analysis by Tom Lane.
2012-08-20 19:58:21 +03:00
Tom Lane
a91f885f11 Remove obsolete comment. 2012-08-19 15:25:43 -04:00
Tom Lane
092d7ded29 Allow OLD and NEW in multi-row VALUES within rules.
Now that we have LATERAL, it's fairly painless to allow this case, which
was left as a TODO in the original multi-row VALUES implementation.
2012-08-19 14:12:16 -04:00
Tom Lane
c246eb5aaf Make use of LATERAL in information_schema.sequences view.
It said "XXX: The following could be improved if we had LATERAL" ...
so let's do that.

No catversion bump since either version of the view works fine.
2012-08-18 16:14:57 -04:00
Tom Lane
084a29c94f Another round of planner fixes for LATERAL.
Formerly, subquery pullup had no need to examine other entries in the range
table, since they could not contain any references to the subquery being
pulled up.  That's no longer true with LATERAL, so now we need to be able
to visit rangetable subexpressions to replace Vars referencing the
pulled-up subquery.  Also, this means that extract_lateral_references must
be unsurprised at encountering lateral PlaceHolderVars, since such might be
created when pulling up a subquery that's underneath an outer join with
respect to the lateral reference.
2012-08-18 14:10:17 -04:00
Tom Lane
470d0b9789 Check LIBXML_VERSION instead of testing in configure script.
We had put a test for libxml2's xmlStructuredErrorContext variable in
configure, but of course that doesn't work on Windows builds.  The next
best alternative seems to be to test the LIBXML_VERSION symbol provided
by xmlversion.h.

Per report from Talha Bin Rizwan, though this fixes it in a different way
than his proposed patch.
2012-08-17 00:05:26 -04:00
Bruce Momjian
3325957656 Delete inaccurate C comment about FSM and adding pages, per Robert Haas. 2012-08-16 19:02:58 -04:00
Tom Lane
f5983923d8 Allow create_index_paths() to consider multiple join bitmapscan paths.
In the initial cut at the "parameterized paths" feature, I'd simplified
create_index_paths() to the point where it would only generate a single
parameterized bitmap path per relation.  Experimentation with an example
supplied by Josh Berkus convinces me that that's not good enough: we really
need to consider a bitmap path for each possible outer relation.  Otherwise
we have regressions relative to pre-9.2 versions, in which the planner
picks a plain indexscan where it should have used a bitmap scan in queries
involving three or more tables.  Indeed, after fixing this, several queries
in the regression tests show improved plans as a result of using bitmap not
plain indexscans.
2012-08-16 13:03:54 -04:00
Tom Lane
56ba337e6f Suppress possibly-uninitialized-variable warning. 2012-08-16 12:04:07 -04:00
Heikki Linnakangas
317dd55a9c Add SP-GiST support for range types.
The implementation is a quad-tree, largely copied from the quad-tree
implementation for points. The lower and upper bound of ranges are the 2d
coordinates, with some extra code to handle empty ranges.

I left out the support for adjacent operator, -|-, from the original patch.
Not because there was necessarily anything wrong with it, but it was more
complicated than the other operators, and I only have limited time for
reviewing. That will follow as a separate patch.

Alexander Korotkov, reviewed by Jeff Davis and me.
2012-08-16 14:30:45 +03:00
Heikki Linnakangas
89911b3ab8 Fix GiST buffering build bug, which caused "failed to re-find parent" errors.
We use a hash table to track the parents of inner pages, but when inserting
to a leaf page, the caller of gistbufferinginserttuples() must pass a
correct block number of the leaf's parent page. Before gistProcessItup()
descends to a child page, it checks if the downlink needs to be adjusted to
accommodate the new tuple, and updates the downlink if necessary. However,
updating the downlink might require splitting the page, which might move the
downlink to a page to the right. gistProcessItup() doesn't realize that, so
when it descends to the leaf page, it might pass an out-of-date parent block
number as a result. Fix that by returning the block a tuple was inserted to
from gistbufferinginserttuples().

This fixes the bug reported by Zdeněk Jílovec.
2012-08-16 12:56:24 +03:00
Bruce Momjian
41fa3dfb0a Update C comment to NOTICE to reflect previous commit changing the error
level, per report from Tom.
2012-08-15 19:09:37 -04:00
Tom Lane
4c5316931f Fix rescan logic in nodeCtescan.
The previous coding essentially assumed that nodes would be rescanned in
the same order they were initialized in; or at least that the "leader" of
a group of CTEscans would be rescanned before any others were required to
execute.  Unfortunately, that isn't even a little bit true.  It's possible
to devise queries in which the leader isn't rescanned until other CTEscans
on the same CTE have run to completion, or even in which the leader never
gets a rescan call at all.

The fix makes the leader specially responsible only for initial creation
and final destruction of the tuplestore; rescan resets are now a
symmetrically shared responsibility.  This means that we might reset the
tuplestore multiple times when restarting a plan subtree containing
multiple CTEscans; but resetting an already-empty tuplestore is cheap
enough that that doesn't seem like a problem.

Per report from Adam Mackler; the new regression test cases are based on
his example query.

Back-patch to 8.4 where CTE scans were introduced.
2012-08-15 19:02:33 -04:00
Bruce Momjian
083b9133aa On second thought, explain why date_trunc("week") on interval values is
not supported in the error message, rather than the docs.
2012-08-15 16:48:05 -04:00
Tom Lane
4d642b5941 Disallow extensions from owning the schema they are assigned to.
This situation creates a dependency loop that confuses pg_dump and probably
other things.  Moreover, since the mental model is that the extension
"contains" schemas it owns, but "is contained in" its extschema (even
though neither is strictly true), having both true at once is confusing for
people too.  So prevent the situation from being set up.

Reported and patched by Thom Brown.  Back-patch to 9.1 where extensions
were added.
2012-08-15 11:28:03 -04:00
Bruce Momjian
a973296598 Properly escape usernames in initdb, so names with single-quotes are
supported.  Also add assert to catch future breakage.

Also, improve documentation that "double"-quotes must be used in
pg_hba.conf (not single quotes).
2012-08-15 11:23:15 -04:00
Tom Lane
eb919e8fde Resurrect the "last ditch" code path in join_search_one_level().
This essentially reverts commit e54b10a62d,
in which I'd decided that the "last ditch" join logic was useless.  The
folly of that is now exposed by a report from Pavel Stehule: although the
function should always find at least one join in a self-contained join
problem, it can still fail to do so in a sub-problem created by artificial
from_collapse_limit or join_collapse_limit constraints.  Adjust the
comments to describe this, and simplify the code a bit to match the new
coding of the earlier loop in the function.

I'm not terribly happy about this: I still subscribe to the opinion stated
in the previous commit message that the "last ditch" code can obscure logic
bugs elsewhere.  But the alternative seems to be to complicate the earlier
tests for does-this-relation-have-a-join-clause to the point where they can
tell whether the join clauses link outside the current join sub-problem.
And that looks messy, slow, and possibly a source of bugs in itself.
In any case, now is not the time to be inserting experimental code into
9.2, so let's just go back to the time-tested solution.
2012-08-15 00:08:13 -04:00
Tom Lane
17351fce4e Prevent access to external files/URLs via XML entity references.
xml_parse() would attempt to fetch external files or URLs as needed to
resolve DTD and entity references in an XML value, thus allowing
unprivileged database users to attempt to fetch data with the privileges
of the database server.  While the external data wouldn't get returned
directly to the user, portions of it could be exposed in error messages
if the data didn't parse as valid XML; and in any case the mere ability
to check existence of a file might be useful to an attacker.

The ideal solution to this would still allow fetching of references that
are listed in the host system's XML catalogs, so that documents can be
validated according to installed DTDs.  However, doing that with the
available libxml2 APIs appears complex and error-prone, so we're not going
to risk it in a security patch that necessarily hasn't gotten wide review.
So this patch merely shuts off all access, causing any external fetch to
silently expand to an empty string.  A future patch may improve this.

In HEAD and 9.2, also suppress warnings about undefined entities, which
would otherwise occur as a result of not loading referenced DTDs.  Previous
branches don't show such warnings anyway, due to different error handling
arrangements.

Credit to Noah Misch for first reporting the problem, and for much work
towards a solution, though this simplistic approach was not his preference.
Also thanks to Daniel Veillard for consultation.

Security: CVE-2012-3489
2012-08-14 18:31:16 -04:00
Bruce Momjian
03bda4535e Revert "commit_delay" change; just add comment that we don't have
a microsecond specification.
2012-08-14 16:26:08 -04:00
Bruce Momjian
e74727440c Add pg_settings units display for "commit_delay" (ms).
Also remove unnecessary units designation in postgresql.conf.sample.
2012-08-14 16:16:45 -04:00
Tom Lane
c1774d2c81 More fixes for planner's handling of LATERAL.
Re-allow subquery pullup for LATERAL subqueries, except when the subquery
is below an outer join and contains lateral references to relations outside
that outer join.  If we pull up in such a case, we risk introducing lateral
cross-references into outer joins' ON quals, which is something the code is
entirely unprepared to cope with right now; and I'm not sure it'll ever be
worth coping with.

Support lateral refs in VALUES (this seems to be the only additional path
type that needs such support as a consequence of re-allowing subquery
pullup).

Put in a slightly hacky fix for joinpath.c's refusal to consider
parameterized join paths even when there cannot be any unparameterized
ones.  This was causing "could not devise a query plan for the given query"
failures in queries involving more than two FROM items.

Put in an even more hacky fix for distribute_qual_to_rels() being unhappy
with join quals that contain references to rels outside their syntactic
scope; which is to say, disable that test altogether.  Need to think about
how to preserve some sort of debugging cross-check here, while not
expending more cycles than befits a debugging cross-check.
2012-08-12 16:01:26 -04:00
Tom Lane
e76af54137 Fix some issues with LATERAL(SELECT UNION ALL SELECT).
The LATERAL marking has to be propagated down to the UNION leaf queries
when we pull them up.  Also, fix the formerly stubbed-off
set_append_rel_pathlist().  It does already have enough smarts to cope with
making a parameterized Append path at need; it just has to not assume that
there *must* be an unparameterized path.
2012-08-11 18:42:56 -04:00
Tom Lane
b53800355f Fix dependencies generated during ALTER TABLE ADD CONSTRAINT USING INDEX.
This command generated new pg_depend entries linking the index to the
constraint and the constraint to the table, which match the entries made
when a unique or primary key constraint is built de novo.  However, it did
not bother to get rid of the entries linking the index directly to the
table.  We had considered the issue when the ADD CONSTRAINT USING INDEX
patch was written, and concluded that we didn't need to get rid of the
extra entries.  But this is wrong: ALTER COLUMN TYPE wasn't expecting such
redundant dependencies to exist, as reported by Hubert Depesz Lubaczewski.
On reflection it seems rather likely to break other things as well, since
there are many bits of code that crawl pg_depend for one purpose or
another, and most of them are pretty naive about what relationships they're
expecting to find.  Fortunately it's not that hard to get rid of the extra
dependency entries, so let's do that.

Back-patch to 9.1, where ALTER TABLE ADD CONSTRAINT USING INDEX was added.
2012-08-11 12:51:24 -04:00
Tom Lane
a67d6d9a78 Update overlooked comment. 2012-08-10 17:36:54 -04:00
Tom Lane
c9b0cbe98b Support having multiple Unix-domain sockets per postmaster.
Replace unix_socket_directory with unix_socket_directories, which is a list
of socket directories, and adjust postmaster's code to allow zero or more
Unix-domain sockets to be created.

This is mostly a straightforward change, but since the Unix sockets ought
to be created after the TCP/IP sockets for safety reasons (better chance
of detecting a port number conflict), AddToDataDirLockFile needs to be
fixed to support out-of-order updates of data directory lockfile lines.
That's a change that had been foreseen to be necessary someday anyway.

Honza Horak, reviewed and revised by Tom Lane
2012-08-10 17:27:15 -04:00
Tom Lane
eaccfded98 Centralize the logic for detecting misplaced aggregates, window funcs, etc.
Formerly we relied on checking after-the-fact to see if an expression
contained aggregates, window functions, or sub-selects when it shouldn't.
This is grotty, easily forgotten (indeed, we had forgotten to teach
DefineIndex about rejecting window functions), and none too efficient
since it requires extra traversals of the parse tree.  To improve matters,
define an enum type that classifies all SQL sub-expressions, store it in
ParseState to show what kind of expression we are currently parsing, and
make transformAggregateCall, transformWindowFuncCall, and transformSubLink
check the expression type and throw error if the type indicates the
construct is disallowed.  This allows removal of a large number of ad-hoc
checks scattered around the code base.  The enum type is sufficiently
fine-grained that we can still produce error messages of at least the
same specificity as before.

Bringing these error checks together revealed that we'd been none too
consistent about phrasing of the error messages, so standardize the wording
a bit.

Also, rewrite checking of aggregate arguments so that it requires only one
traversal of the arguments, rather than up to three as before.

In passing, clean up some more comments left over from add_missing_from
support, and annotate some tests that I think are dead code now that that's
gone.  (I didn't risk actually removing said dead code, though.)
2012-08-10 11:36:15 -04:00
Magnus Hagander
b3055ab4fb Fix upper limit of superuser_reserved_connections, add limit for wal_senders
Should be limited to the maximum number of connections excluding
autovacuum workers, not including.

Add similar check for max_wal_senders, which should never be higher than
max_connections.
2012-08-10 14:50:45 +02:00
Simon Riggs
da4efa13d8 Turn off WalSender keepalives by default, users can enable if desired 2012-08-09 17:07:03 +01:00
Simon Riggs
87d8bd7c9f Ensure all replication message info is available and correct via WalRcv 2012-08-09 17:03:59 +01:00
Alvaro Herrera
92ec0370eb Fix typo in comment 2012-08-08 17:42:38 -04:00
Tom Lane
f630157496 Merge parser's p_relnamespace and p_varnamespace lists into a single list.
Now that we are storing structs in these lists, the distinction between
the two lists can be represented with a couple of extra flags while using
only a single list.  This simplifies the code and should save a little
bit of palloc traffic, since the majority of RTEs are represented in both
lists anyway.
2012-08-08 16:41:31 -04:00
Simon Riggs
8143a56854 Fix minor bug in XLogFileRead() that accidentally worked.
Cascading replication copied the incoming file into pg_xlog but
didn't set path correctly, so the first attempt to open file failed
causing it to loop around and look for file in pg_xlog. So the
earlier coding worked, but accidentally rather than by design.

Spotted by Fujii Masao, fix by Fujii Masao and Simon Riggs
2012-08-08 21:25:23 +01:00
Robert Haas
21786db81f Fix cache flush hazard in event trigger cache.
Bug spotted by Jeff Davis using -DCLOBBER_CACHE_ALWAYS.
2012-08-08 16:38:37 -04:00
Bruce Momjian
2751740ab5 Add additional C comments for to_date/to_char() fixes. 2012-08-08 13:27:01 -04:00
Tom Lane
db108349bf Fix TwoPhaseGetDummyBackendId().
This was broken in commit ed0b409d22,
which revised the GlobalTransactionData struct to not include the
associated PGPROC as its first member, but overlooked one place where
a cast was used in reliance on that equivalence.

The most effective way of fixing this seems to be to create a new function
that looks up the GlobalTransactionData struct given the XID, and make
both TwoPhaseGetDummyBackendId and TwoPhaseGetDummyProc rely on that.

Per report from Robert Ross.
2012-08-08 11:52:02 -04:00
Tom Lane
5ebaaa4944 Implement SQL-standard LATERAL subqueries.
This patch implements the standard syntax of LATERAL attached to a
sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM,
since set-returning function calls are expected to be one of the principal
use-cases.

The main change here is a rewrite of the mechanism for keeping track of
which relations are visible for column references while the FROM clause is
being scanned.  The parser "namespace" lists are no longer lists of bare
RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE
pointer as well as some visibility-controlling flags.  Aside from
supporting LATERAL correctly, this lets us get rid of the ancient hacks
that required rechecking subqueries and JOIN/ON and function-in-FROM
expressions for invalid references after they were initially parsed.
Invalid column references are now always correctly detected on sight.

In passing, remove assorted parser error checks that are now dead code by
virtue of our having gotten rid of add_missing_from, as well as some
comments that are obsolete for the same reason.  (It was mainly
add_missing_from that caused so much fudging here in the first place.)

The planner support for this feature is very minimal, and will be improved
in future patches.  It works well enough for testing purposes, though.

catversion bump forced due to new field in RangeTblEntry.
2012-08-07 19:02:54 -04:00
Robert Haas
eea65943c6 Fix memory leaks in event trigger code.
Spotted by Jeff Davis.
2012-08-07 17:00:16 -04:00
Bruce Momjian
ac78c4178b Fix to_char(), to_date(), and to_timestamp() to handle negative/BC
century specifications just like positive/AD centuries.  Previously the
behavior was either wrong or inconsistent with positive/AD handling.

Centuries without years now always assume the first year of the century,
which is now documented.
2012-08-07 13:34:44 -04:00
Alvaro Herrera
3a42a3ffd8 Fix redundant wording 2012-08-07 11:43:51 -04:00
Simon Riggs
0f04fc67f7 fsync backup_label after pg_start_backup()
Dave Kerr
2012-08-07 16:19:13 +01:00
Alvaro Herrera
f5f8e7169f Make strings identical 2012-08-06 12:45:08 -04:00
Tom Lane
3152bf722f Fix bugs with parsing signed hh:mm and hh:mm:ss fields in interval input.
DecodeInterval() failed to honor the "range" parameter (the special SQL
syntax for indicating which fields appear in the literal string) if the
time was signed.  This seems inappropriate, so make it work like the
not-signed case.  The inconsistency was introduced in my commit
f867339c01, which as noted in its log message
was only really focused on making SQL-compliant literals work per spec.
Including a sign here is not per spec, but if we're going to allow it
then it's reasonable to expect it to work like the not-signed case.

Also, remove bogus setting of tmask, which caused subsequent processing to
think that what had been given was a timezone and not an hh:mm(:ss) field,
thus confusing checks for redundant fields.  This seems to be an aboriginal
mistake in Lockhart's commit 2cf1642461.

Add regression test cases to illustrate the changed behaviors.

Back-patch as far as 8.4, where support for spec-compliant interval
literals was added.

Range problem reported and diagnosed by Amit Kapila, tmask problem by me.
2012-08-03 17:40:43 -04:00
Tom Lane
f786e91a75 Improve underdocumented btree_xlog_delete_get_latestRemovedXid() code.
As noted by Noah Misch, btree_xlog_delete_get_latestRemovedXid is
critically dependent on the assumption that it's examining a consistent
state of the database.  This was undocumented though, so the
seemingly-unrelated check for no active HS sessions might be thought to be
merely an optional optimization.  Improve comments, and add an explicit
check of reachedConsistency just to be sure.

This function returns InvalidTransactionId (thereby killing all HS
transactions) in several cases that are not nearly unlikely enough for my
taste.  This commit doesn't attempt to fix those deficiencies, just
document them.

Back-patch to 9.2, not from any real functional need but just to keep the
branches more closely synced to simplify possible future back-patching.
2012-08-03 15:41:18 -04:00
Tom Lane
c1793f2e0c In SPGiST replay, do conflict resolution before modifying the page.
In yesterday's commit 962e0cc71e, I added the
ResolveRecoveryConflictWithSnapshot call in the wrong place.  I correctly
put it before spgRedoVacuumRedirect itself would modify the index page ---
but not before RestoreBkpBlocks, so replay of a record with a full-page
image would modify the page before kicking off any conflicting HS
transactions.  Oops.
2012-08-03 15:23:14 -04:00
Tom Lane
962e0cc71e Fix race conditions associated with SPGiST redirection tuples.
The correct test for whether a redirection tuple is removable is whether
tuple's xid < RecentGlobalXmin, not OldestXmin; the previous coding
failed to protect index searches being done in concurrent transactions that
have no XID.  This mirrors the recent fix in btree's page recycling logic
made in commit d3abbbebe5.

Also, WAL-log the newest XID of any removed redirection tuple on an index
page, and apply ResolveRecoveryConflictWithSnapshot during InHotStandby WAL
replay.  This protects against concurrent Hot Standby transactions possibly
needing to see the redirection tuple(s).

Per my query of 2012-03-12 and subsequent discussion.
2012-08-02 15:34:14 -04:00
Tom Lane
f6ce81f55a Fix WITH attached to a nested set operation (UNION/INTERSECT/EXCEPT).
Parse analysis neglected to cover the case of a WITH clause attached to an
intermediate-level set operation; it only handled WITH at the top level
or WITH attached to a leaf-level SELECT.  Per report from Adam Mackler.

In HEAD, I rearranged the order of SelectStmt's fields to put withClause
with the other fields that can appear on non-leaf SelectStmts.  In back
branches, leave it alone to avoid a possible ABI break for third-party
code.

Back-patch to 8.4 where WITH support was added.
2012-07-31 17:56:21 -04:00
Tom Lane
b76356ac22 Fix syslogger so that log_truncate_on_rotation works in the first rotation.
In the original coding of the log rotation stuff, we did not bother to make
the truncation logic work for the very first rotation after postmaster
start (or after a syslogger crash and restart).  It just always appended
in that case.  It did not seem terribly important at the time, but we've
recently had two separate complaints from people who expected it to work
unsurprisingly.  (Both users tend to restart the postmaster about as often
as a log rotation is configured to happen, which is maybe not typical use,
but still...)  Since the initial log file is opened in the postmaster,
fixing this requires passing down some more state to the syslogger child
process.

It's always been like this, so back-patch to all supported branches.
2012-07-31 14:36:54 -04:00
Tom Lane
26b438694c Only allow autovacuum to be auto-canceled by a directly blocked process.
In the original coding of the autovacuum cancel feature, commit
acac68b2bc, an autovacuum process was
considered a target for cancellation if it was found to hard-block any
process examined in the deadlock search.  This patch tightens the test so
that the autovacuum must directly hard-block the current process.  This
should make the behavior more predictable in general, and in particular
it ensures that an autovacuum will not be canceled with less than
deadlock_timeout grace period.  In the old coding, it was possible for an
autovacuum to be canceled almost instantly, given unfortunate timing of two
or more other processes' lock attempts.

This also justifies the logging methodology in the recent commit
d7318d43d891bd63e82dcfc27948113ed7b1db80; without this restriction, that
patch isn't providing enough information to see the connection of the
canceling process to the autovacuum.  Like that one, patch all the way
back.
2012-07-26 14:29:22 -04:00
Robert Haas
d7318d43d8 Log a better message when canceling autovacuum.
The old message was at DEBUG2, so typically it didn't show up in the
log at all.  As a result, in most cases where autovacuum was canceled,
the only information that was logged was the table being vacuumed,
with no indication as to what problem caused the cancel.  Crank up
the level to LOG and add some more details to assist with debugging.

Back-patch all the way, per discussion on pgsql-hackers.
2012-07-26 09:19:03 -04:00
Tom Lane
af026b5d9b Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.

More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence.  (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.)  The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record.  However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false).  Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record.  A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.

To fix, get rid of the idea of logging an is_called status different from
reality.  This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible.  In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results.  This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.

In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details.  It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.

Back-patch to all supported branches.
2012-07-25 17:42:23 -04:00
Alvaro Herrera
d7b47e5155 Change syntax of new CHECK NO INHERIT constraints
The initially implemented syntax, "CHECK NO INHERIT (expr)" was not
deemed very good, so switch to "CHECK (expr) NO INHERIT" instead.  This
way it looks similar to SQL-standards compliant constraint attribute.

Backport to 9.2 where the new syntax and feature was introduced.

Per discussion.
2012-07-24 16:01:32 -04:00
Peter Eisentraut
d61d9aa750 Update information schema to SQL:2011
This is just a section renumbering for now.  Some details might be
filled in later.
2012-07-23 22:32:56 +03:00
Tom Lane
2d46a57ddc Improve copydir() code for the case that fsync is off.
We should avoid calling sync_file_range or posix_fadvise in this case,
since (a) we don't really care if the data gets synced, and might as
well save the kernel calls; (b) at least on Linux we know that the
kernel might block us until it's scheduled the write.

Also, avoid making a useless second traversal of the directory tree
if we're not actually going to call fsync(2) after all.
2012-07-21 20:10:29 -04:00
Tom Lane
31c7c642b6 Account for SRFs in targetlists in planner rowcount estimates.
We made use of the ROWS estimate for set-returning functions used in FROM,
but not for those used in SELECT targetlists; which is a bit of an
oversight considering there are common usages that require the latter
approach.  Improve that.  (I had initially thought it might be worth
folding this into cost_qual_eval, but after investigation concluded that
that wouldn't be very helpful, so just do it separately.)  Per complaint
from David Johnston.

Back-patch to 9.2, but not further, for fear of destabilizing plan choices
in existing releases.
2012-07-21 17:45:07 -04:00
Alvaro Herrera
f5bcd398ad connoinherit may be true only for CHECK constraints
The code was setting it true for other constraints, which is
bogus.  Doing so caused bogus catalog entries for such constraints, and
in particular caused an error to be raised when trying to drop a
constraint of types other than CHECK from a table that has children,
such as reported in bug #6712.

In 9.2, additionally ignore connoinherit=true for other constraint
types, to avoid having to force initdb; existing databases might already
contain bogus catalog entries.

Includes a catversion bump (in HEAD only).

Bug report from Miroslav Šulc
Analysis from Amit Kapila and Noah Misch; Amit also contributed the patch.
2012-07-20 14:08:07 -04:00
Tom Lane
8e617e29aa Fix whole-row Var evaluation to cope with resjunk columns (again).
When a whole-row Var is reading the result of a subquery, we need it to
ignore any "resjunk" columns that the subquery might have evaluated for
GROUP BY or ORDER BY purposes.  We've hacked this area before, in commit
68e40998d0, but that fix only covered
whole-row Vars of named composite types, not those of RECORD type; and it
was mighty klugy anyway, since it just assumed without checking that any
extra columns in the result must be resjunk.  A proper fix requires getting
hold of the subquery's targetlist so we can actually see which columns are
resjunk (whereupon we can use a JunkFilter to get rid of them).  So bite
the bullet and add some infrastructure to make that possible.

Per report from Andrew Dunstan and additional testing by Merlin Moncure.
Back-patch to all supported branches.  In 8.3, also back-patch commit
292176a118, which for some reason I had
not done at the time, but it's a prerequisite for this change.
2012-07-20 13:10:58 -04:00
Robert Haas
3a0e4d36eb Make new event trigger facility actually do something.
Commit 3855968f32 added syntax, pg_dump,
psql support, and documentation, but the triggers didn't actually fire.
With this commit, they now do.  This is still a pretty basic facility
overall because event triggers do not get a whole lot of information
about what the user is trying to do unless you write them in C; and
there's still no option to fire them anywhere except at the very
beginning of the execution sequence, but it's better than nothing,
and a good building block for future work.

Along the way, add a regression test for ALTER LARGE OBJECT, since
testing of event triggers reveals that we haven't got one.

Dimitri Fontaine and Robert Haas
2012-07-20 11:39:01 -04:00
Tom Lane
be86e3dd5b Rethink checkpointer's fsync-request table representation.
Instead of having one hash table entry per relation/fork/segment, just have
one per relation, and use bitmapsets to represent which specific segments
need to be fsync'd.  This eliminates the need to scan the whole hash table
to implement FORGET_RELATION_FSYNC, which fixes the O(N^2) behavior
recently demonstrated by Jeff Janes for cases involving lots of TRUNCATE or
DROP TABLE operations during a single checkpoint cycle.  Per an idea from
Robert Haas.

(FORGET_DATABASE_FSYNC still sucks, but since dropping a database is a
pretty expensive operation anyway, we'll live with that.)

In passing, improve the delayed-unlink code: remove the pass over the list
in mdpreckpt, since it wasn't doing anything for us except supporting a
useless Assert in mdpostckpt, and fix mdpostckpt so that it will absorb
fsync requests every so often when clearing a large backlog of deletion
requests.
2012-07-19 19:28:22 -04:00
Tom Lane
3072b7bade Send only one FORGET_RELATION_FSYNC request when dropping a relation.
We were sending one per fork, but a little bit of refactoring allows us
to send just one request with forknum == InvalidForkNumber.  This not only
reduces pressure on the shared-memory request queue, but saves repeated
traversals of the checkpointer's hash table.
2012-07-19 13:07:33 -04:00
Heikki Linnakangas
a7a4add6c4 Refactor the way code is shared between some range type functions.
Functions like range_eq, range_before etc. are exposed at the SQL-level, but
they're also used internally by the GiST consistent support function. The
code sharing was done by a hack, TrickFunctionCall2, which relied on the
knowledge that all the functions used fn_extra the same way. This commit
splits the functions into internal versions that take a TypeCacheEntry as
argument, and thin wrappers to expose the functions at the SQL-level. The
internal versions can then be called directly and in a less hacky way from
the GiST consistent function.

This is just cosmetic, but backpatch to 9.2 anyway, to avoid having a
different version of this code in the 9.2 branch. That would make
backpatching fixes in this area more difficult.

Alexander Korotkov
2012-07-18 23:14:56 +03:00
Tom Lane
80e373c3a8 Fix statistics breakage from bgwriter/checkpointer process split.
ForwardFsyncRequest() supposed that it could only be called in regular
backends, which used to be true; but since the splitup of bgwriter and
checkpointer, it is also called in the bgwriter.  We do not want to count
such calls in pg_stat_bgwriter.buffers_backend statistics, so fix things
so that they aren't.

(It's worth noting here that this implies an alarmingly large increase in
the expected amount of cross-process fsync request traffic, which may well
mean that the process splitup was not such a hot idea.)
2012-07-18 15:40:31 -04:00
Tom Lane
4a9c30a8a1 Fix management of pendingOpsTable in auxiliary processes.
mdinit() was misusing IsBootstrapProcessingMode() to decide whether to
create an fsync pending-operations table in the current process.  This led
to creating a table not only in the startup and checkpointer processes as
intended, but also in the bgwriter process, not to mention other auxiliary
processes such as walwriter and walreceiver.  Creation of the table in the
bgwriter is fatal, because it absorbs fsync requests that should have gone
to the checkpointer; instead they just sit in bgwriter local memory and are
never acted on.  So writes performed by the bgwriter were not being fsync'd
which could result in data loss after an OS crash.  I think there is no
live bug with respect to walwriter and walreceiver because those never
perform any writes of shared buffers; but the potential is there for
future breakage in those processes too.

To fix, make AuxiliaryProcessMain() export the current process's
AuxProcType as a global variable, and then make mdinit() test directly for
the types of aux process that should have a pendingOpsTable.  Having done
that, we might as well also get rid of the random bool flags such as
am_walreceiver that some of the aux processes had grown.  (Note that we
could not have fixed the bug by examining those variables in mdinit(),
because it's called from BaseInit() which is run by AuxiliaryProcessMain()
before entering any of the process-type-specific code.)

Back-patch to 9.2, where the problem was introduced by the split-up of
bgwriter and checkpointer processes.  The bogus pendingOpsTable exists
in walwriter and walreceiver processes in earlier branches, but absent
any evidence that it causes actual problems there, I'll leave the older
branches alone.
2012-07-18 15:28:10 -04:00
Robert Haas
3855968f32 Syntax support and documentation for event triggers.
They don't actually do anything yet; that will get fixed in a
follow-on commit.  But this gets the basic infrastructure in place,
including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT,
SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER;
pg_dump and psql support; and documentation for the anticipated
initial feature set.

Dimitri Fontaine, with review and a bunch of additional hacking by me.
Thom Brown extensively reviewed earlier versions of this patch set,
but there's not a whole lot of that code left in this commit, as it
turns out.
2012-07-18 10:16:16 -04:00
Tom Lane
73b796a52c Improve coding around the fsync request queue.
In all branches back to 8.3, this patch fixes a questionable assumption in
CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are
no uninitialized pad bytes in the request queue structs.  This would only
cause trouble if (a) there were such pad bytes, which could happen in 8.4
and up if the compiler makes enum ForkNumber narrower than 32 bits, but
otherwise would require not-currently-planned changes in the widths of
other typedefs; and (b) the kernel has not uniformly initialized the
contents of shared memory to zeroes.  Still, it seems a tad risky, and we
can easily remove any risk by pre-zeroing the request array for ourselves.
In addition to that, we need to establish a coding rule that struct
RelFileNode can't contain any padding bytes, since such structs are copied
into the request array verbatim.  (There are other places that are assuming
this anyway, it turns out.)

In 9.1 and up, the risk was a bit larger because we were also effectively
assuming that struct RelFileNodeBackend contained no pad bytes, and with
fields of different types in there, that would be much easier to break.
However, there is no good reason to ever transmit fsync or delete requests
for temp files to the bgwriter/checkpointer, so we can revert the request
structs to plain RelFileNode, getting rid of the padding risk and saving
some marginal number of bytes and cycles in fsync queue manipulation while
we are at it.  The savings might be more than marginal during deletion of
a temp relation, because the old code transmitted an entirely useless but
nonetheless expensive-to-process ForgetRelationFsync request to the
background process, and also had the background process perform the file
deletion even though that can safely be done immediately.

In addition, make some cleanup of nearby comments and small improvements to
the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.
2012-07-17 16:56:54 -04:00
Tom Lane
57b9bdda39 Put back storage/proc.h in postmaster.c.
I took this out thinking it wasn't needed anymore, but the EXEC_BACKEND
code still needs it.  Per buildfarm.
2012-07-17 10:14:06 -04:00
Alvaro Herrera
f34c68f096 Introduce timeout handling framework
Management of timeouts was getting a little cumbersome; what we
originally had was more than enough back when we were only concerned
about deadlocks and query cancel; however, when we added timeouts for
standby processes, the code got considerably messier.  Since there are
plans to add more complex timeouts, this seems a good time to introduce
a central timeout handling module.

External modules register their timeout handlers during process
initialization, and later enable and disable them as they see fit using
a simple API; timeout.c is in charge of keeping track of which timeouts
are in effect at any time, installing a common SIGALRM signal handler,
and calling setitimer() as appropriate to ensure timely firing of
external handlers.

timeout.c additionally supports pluggable modules to add their own
timeouts, though this capability isn't exercised anywhere yet.

Additionally, as of this commit, walsender processes are aware of
timeouts; we had a preexisting bug there that made those ignore SIGALRM,
thus being subject to unhandled deadlocks, particularly during the
authentication phase.  This has already been fixed in back branches in
commit 0bf8eb2a, which see for more details.

Main author: Zoltán Böszörményi
Some review and cleanup by Álvaro Herrera
Extensive reworking by Tom Lane
2012-07-16 22:55:33 -04:00
Peter Eisentraut
dd16f9480a Remove unreachable code
The Solaris Studio compiler warns about these instances, unlike more
mainstream compilers such as gcc.  But manual inspection showed that
the code is clearly not reachable, and we hope no worthy compiler will
complain about removing this code.
2012-07-16 22:15:03 +03:00
Tom Lane
c92be3c059 Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index.  This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.

To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index.  (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.)  Now we don't need preassignment of index
names in any situation.

I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.

Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches.  The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 13:25:18 -04:00
Tom Lane
54fd196ffc Prevent corner-case core dump in rfree().
rfree() failed to cope with the case that pg_regcomp() had initialized the
regex_t struct but then failed to allocate any memory for re->re_guts (ie,
the first malloc call in pg_regcomp() failed).  It would try to touch the
guts struct anyway, and thus dump core.  This is a sufficiently narrow
corner case that it's not surprising it's never been seen in the field;
but still a bug is a bug, so patch all active branches.

Noted while investigating whether we need to call pg_regfree after a
failure return from pg_regcomp.  Other than this bug, it turns out we
don't, so adjust comments appropriately.
2012-07-15 13:27:54 -04:00
Heikki Linnakangas
2686da9db2 Don't initialize TLI variable to -1, as TimeLineID is unsigned.
This was causing a compiler warning with Solaris compiler. Use 0 instead.
The variable is initialized just for the sake of tidyness  and/or debugging,
it's not used for anything before setting it to a real value.

Per report and suggestion from Peter Eisentraut.
2012-07-14 21:04:53 +03:00
Tom Lane
b966dd6c42 Add fsync capability to initdb, and use sync_file_range() if available.
Historically we have not worried about fsync'ing anything during initdb
(in fact, initdb intentionally passes -F to each backend launch to prevent
it from fsync'ing).  But with filesystems getting more aggressive about
caching data, that's not such a good plan anymore.  Make initdb do a pass
over the finished data directory tree to fsync everything.  For testing
purposes, the -N/--nosync flag can be used to restore the old behavior.

Also, testing shows that on Linux, sync_file_range() is much faster than
posix_fadvise() for hinting to the kernel that an fsync is coming,
apparently because the latter blocks on a rather small request queue while
the former doesn't.  So use this function if available in initdb, and also
in the backend's pg_flush_data() (where it currently will affect only the
speed of CREATE DATABASE's cloning step).

We will later make pg_regress invoke initdb with the --nosync flag
to avoid slowing down cases such as "make check" in contrib.  But
let's not do so until we've shaken out any portability issues in this
patch.

Jeff Davis, reviewed by Andres Freund
2012-07-13 17:16:58 -04:00
Tom Lane
1a9405d265 Cosmetic cleanup of ginInsertValue().
Make it clearer that the passed stack mustn't be empty, and that we
are not supposed to fall off the end of the stack in the main loop.
Tighten the loop that extracts the root block number, too.

Markus Wanner and Tom Lane
2012-07-13 11:37:39 -04:00
Peter Eisentraut
a84bf4922e Avoid extra newlines in XML mapping in table forest mode
found by P. Broennimann
2012-07-12 23:52:50 +03:00
Tom Lane
a36088bcfa Skip text->binary conversion of unnecessary columns in contrib/file_fdw.
When reading from a text- or CSV-format file in file_fdw, the datatype
input routines can consume a significant fraction of the runtime.
Often, the query does not need all the columns, so we can get a useful
speed boost by skipping I/O conversion for unnecessary columns.

To support this, add a "convert_selectively" option to the core COPY code.
This is undocumented and not accessible from SQL (for now, anyway).

Etsuro Fujita, reviewed by KaiGai Kohei
2012-07-12 16:26:59 -04:00
Tom Lane
84a42560c8 Add array_remove() and array_replace() functions.
These functions support removing or replacing array element value(s)
matching a given search value.  Although intended mainly to support a
future array-foreign-key feature, they seem useful in their own right.

Marco Nenciarini and Gabriele Bartolini, reviewed by Alex Hunsaker
2012-07-11 13:59:35 -04:00
Tom Lane
60e9c224a1 Fix ASCII case in pg_wchar2mule_with_len.
Also some cosmetic improvements for wchar-to-mblen patch.
2012-07-10 15:59:39 -04:00
Tom Lane
628cbb50ba Re-implement extraction of fixed prefixes from regular expressions.
To generate btree-indexable conditions from regex WHERE conditions (such as
WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed
prefix that a regex might have; that is, find any string that must be a
prefix of all strings satisfying the regex.  We used to do that with
entirely ad-hoc code that looked at the source text of the regex.  It
didn't know very much about regex syntax, which mostly meant that it would
fail to identify some optimizable cases; but Viktor Rosenfeld reported that
it would produce actively wrong answers for quantified parenthesized
subexpressions, such as '^(foo)?bar'.  Rather than trying to extend the
ad-hoc code to cover this, let's get rid of it altogether in favor of
identifying prefixes by examining the compiled form of a regex.

To do this, I've added a new entry point "pg_regprefix" to the regex library;
hopefully it is defined in a sufficiently general fashion that it can remain
in the library when/if that code gets split out as a standalone project.

Since this bug has been there for a very long time, this fix needs to get
back-patched.  However it depends on some other recent commits (particularly
the addition of wchar-to-database-encoding conversion), so I'll commit this
separately and then go to work on back-porting the necessary fixes.
2012-07-10 14:54:37 -04:00
Tom Lane
00dac6000d Refactor pattern_fixed_prefix() to avoid dealing in incomplete patterns.
Previously, pattern_fixed_prefix() was defined to return whatever fixed
prefix it could extract from the pattern, plus the "rest" of the pattern.
That definition was sensible for LIKE patterns, but not so much for
regexes, where reconstituting a valid pattern minus the prefix could be
quite tricky (certainly the existing code wasn't doing that correctly).
Since the only thing that callers ever did with the "rest" of the pattern
was to pass it to like_selectivity() or regex_selectivity(), let's cut out
the middle-man and just have pattern_fixed_prefix's subroutines do this
directly.  Then pattern_fixed_prefix can return a simple selectivity
number, and the question of how to cope with partial patterns is removed
from its API specification.

While at it, adjust the API spec so that callers who don't actually care
about the pattern's selectivity (which is a lot of them) can pass NULL for
the selectivity pointer to skip doing the work of computing a selectivity
estimate.

This patch is only an API refactoring that doesn't actually change any
processing, other than allowing a little bit of useless work to be skipped.
However, it's necessary infrastructure for my upcoming fix to regex prefix
extraction, because after that change there won't be any simple way to
identify the "rest" of the regex, not even to the low level of fidelity
needed by regex_selectivity.  We can cope with that if regex_fixed_prefix
and regex_selectivity communicate directly, but not if we have to work
within the old API.  Hence, back-patch to all active branches.
2012-07-09 23:22:55 -04:00
Tom Lane
e7ef6d7e24 Fix planner to pass correct collation to operator selectivity estimators.
We can do this without creating an API break for estimation functions
by passing the collation using the existing fmgr functionality for
passing an input collation as a hidden parameter.

The need for this was foreseen at the outset, but we didn't get around to
making it happen in 9.1 because of the decision to sort all pg_statistic
histograms according to the database's default collation.  That meant that
selectivity estimators generally need to use the default collation too,
even if they're estimating for an operator that will do something
different.  The reason it's suddenly become more interesting is that
regexp interpretation also uses a collation (for its LC_TYPE not LC_COLLATE
property), and we no longer want to use the wrong collation when examining
regexps during planning.  It's not that the selectivity estimate is likely
to change much from this; rather that we are thinking of caching compiled
regexps during planner estimation, and we won't get the intended benefit
if we cache them with a different collation than the executor will use.

Back-patch to 9.1, both because the regexp change is likely to get
back-patched and because we might as well get this right in all
collation-supporting branches, in case any third-party code wants to
rely on getting the collation.  The patch turns out to be minuscule
now that I've done it ...
2012-07-08 23:51:08 -04:00
Tom Lane
c6aae3042b Simplify and document regex library's compact-NFA representation.
The previous coding abused the first element of a cNFA state's arcs list
to hold a per-state flag bit, which was confusing, undocumented, and not
even particularly efficient.  Get rid of that in favor of a separate
"stflags" vector.  Since there's only one bit in use, I chose to allocate a
char per state; we could possibly replace this with a bitmap at some point,
but that would make accesses a little slower.  It's already about 8X
smaller than before, so let's not get overly tense.

Also document the representation better than it was before, which is to say
not at all.

This patch is a byproduct of investigations towards extracting a "fixed
prefix" string from the compact-NFA representation of regex patterns.
Might need to back-patch it if we decide to back-patch that fix, but for
now it's just code cleanup so I'll just put it in HEAD.
2012-07-07 17:39:50 -04:00
Bruce Momjian
3c9b406420 Run updated copyright.pl on HEAD and 9.2 trees, updating the psql
\copyright output to 2012.

Backpatch to 9.2.
2012-07-06 12:28:18 -04:00
Robert Haas
f6a05fd973 Fix failure of new wchar->mb functions to advance from pointer.
Bug spotted by Tom Lane.
2012-07-05 23:47:53 -04:00
Tom Lane
fc548b2296 Remove support for using wait3() in place of waitpid().
All Unix-oid platforms that we currently support should have waitpid(),
since it's in V2 of the Single Unix Spec.  Our git history shows that
the wait3 code was added to support NextStep, which we officially dropped
support for as of 9.2.  So get rid of the configure test, and simplify the
macro spaghetti in reaper().  Per suggestion from Fujii Masao.
2012-07-05 14:00:40 -04:00
Bruce Momjian
042d9ffc28 Run newly-configured perltidy script on Perl files.
Run on HEAD and 9.2.
2012-07-04 21:47:49 -04:00
Robert Haas
d7c734841b Reduce messages about implicit indexes and sequences to DEBUG1.
Per recent discussion on pgsql-hackers, these messages are too
chatty for most users.
2012-07-04 20:35:29 -04:00
Robert Haas
72dd6291f2 Add wchar -> mb conversion routines.
This is infrastructure for Alexander Korotkov's work on indexing regular
expression searches.

Alexander Korotkov, with a bit of further hackery on the MULE conversion
by me
2012-07-04 17:10:10 -04:00
Magnus Hagander
10e0dd8f91 Remove duplicate, unnecessary, variable declaration 2012-07-04 16:17:30 +02:00
Magnus Hagander
0c4b468692 Always treat a standby returning an an invalid flush location as async
This ensures that a standby such as pg_receivexlog will not be selected
as sync standby - which would cause the master to block waiting for
a location that could never happen.

Fujii Masao
2012-07-04 15:14:42 +02:00
Tom Lane
09022de1f5 Improve documentation about MULE encoding.
This commit improves the comments in pg_wchar.h and creates #define symbols
for some formerly hard-coded values.  No substantive code changes.

Tatsuo Ishii and Tom Lane
2012-07-04 00:29:57 -04:00
Alvaro Herrera
47a2adc83c Forgot an #include in the previous patch :-( 2012-07-03 16:40:15 -04:00
Alvaro Herrera
0c7b9dc7d0 Have REASSIGN OWNED work on extensions, too
Per bug #6593, REASSIGN OWNED fails when the affected role has created
an extension.  Even though the user related to the extension is not
nominally the owner, its OID appears on pg_shdepend and thus causes
problems when the user is to be dropped.

This commit adds code to change the "ownership" of the extension itself,
not of the contained objects.  This is fine because it's currently only
called from REASSIGN OWNED, which would also modify the ownership of the
contained objects.  However, this is not sufficient for a working ALTER
OWNER implementation extension.

Back-patch to 9.1, where extensions were introduced.

Bug #6593 reported by Emiliano Leporati.
2012-07-03 15:09:59 -04:00
Robert Haas
6a77bff086 Remove misleading hints about reducing the System V request size.
Since the request size will now be ~48 bytes regardless of how
shared_buffers et. al. are set, much of this advice is no longer
relevant.
2012-07-03 10:07:47 -04:00
Robert Haas
3cf39e6ddb Fix a stupid bug I introduced into XLogFlush().
Commit f11e8be3e8 broke this; it was right
in Peter's original patch, but I messed it up before committing.
2012-07-02 15:33:59 -04:00
Robert Haas
3bb592bb20 Fix position of WalSndWakeupRequest call.
This avoids discriminating against wal_sync_method = open_sync or
open_datasync.

Fujii Masao, reviewed by Andres Freund
2012-07-02 14:44:10 -04:00
Peter Eisentraut
2b44306315 Assorted message style improvements 2012-07-02 21:12:46 +03:00
Tom Lane
41f4a0ab78 Fix to_date's handling of year 519.
A thinko in commit 029dfdf115 caused the year
519 to be handled differently from either adjacent year, which was not the
intention AFAICS.  Report and diagnosis by Marc Cousin.

In passing, remove redundant re-tests of year value.
2012-07-02 11:35:35 -04:00
Robert Haas
82cdd2df75 Work a little harder on comments for walsender wakeup patch.
Per gripe from Tom Lane.
2012-07-02 11:28:53 -04:00
Robert Haas
f11e8be3e8 Make commit_delay much smarter.
Instead of letting every backend participating in a group commit wait
independently, have the first one that becomes ready to flush WAL wait
for the configured delay, and let all the others wait just long enough
for that first process to complete its flush.  This greatly increases
the chances of being able to configure a commit_delay setting that
actually improves performance.

As a side consequence of this change, commit_delay now affects all WAL
flushes, rather than just commits.  There was some discussion on
pgsql-hackers about whether to rename the GUC to, say, wal_flush_delay,
but in the absence of consensus I am leaving it alone for now.

Peter Geoghegan, with some changes, mostly to the documentation, by me.
2012-07-02 10:26:31 -04:00
Robert Haas
f83b59997d Make walsender more responsive.
Per testing by Andres Freund, this improves replication performance
and reduces replication latency and latency jitter.  I was a bit
concerned about moving more work into XLogInsert, but testing seems
to show that it's not a problem in practice.

Along the way, improve comments for WaitLatchOrSocket.

Andres Freund.  Review and stylistic cleanup by me.
2012-07-02 09:41:01 -04:00
Tom Lane
9ad45c18b6 Fix race condition in enum value comparisons.
When (re) loading the typcache comparison cache for an enum type's values,
use an up-to-date MVCC snapshot, not the transaction's existing snapshot.
This avoids problems if we encounter an enum OID that was created since our
transaction started.  Per report from Andres Freund and diagnosis by Robert
Haas.

To ensure this is safe even if enum comparison manages to get invoked
before we've set a transaction snapshot, tweak GetLatestSnapshot to
redirect to GetTransactionSnapshot instead of throwing error when
FirstSnapshotSet is false.  The existing uses of GetLatestSnapshot (in
ri_triggers.c) don't care since they couldn't be invoked except in a
transaction that's already done some work --- but it seems just conceivable
that this might not be true of enums, especially if we ever choose to use
enums in system catalogs.

Note that the comparable coding in enum_endpoint and enum_range_internal
remains GetTransactionSnapshot; this is perhaps debatable, but if we
changed it those functions would have to be marked volatile, which doesn't
seem attractive.

Back-patch to 9.1 where ALTER TYPE ADD VALUE was added.
2012-07-01 17:12:49 -04:00
Tom Lane
39bfc94c86 Suppress compiler warnings in readfuncs.c.
Commit 7357558fc8 introduced "(void) token;"
into the READ_TEMP_LOCALS() macro, to suppress complaints from gcc 4.6
when the value of token was not used anywhere in a particular node-read
function.  However, this just moved the warning around: inspection of
buildfarm results shows that some compilers are now complaining that token
is being read before it's set.  Revert the READ_TEMP_LOCALS() macro change
and instead put "(void) token;" into READ_NODE_FIELD(), which is the
principal culprit for cases where the warning might occur.  In principle we
might need the same in READ_BITMAPSET_FIELD() and/or READ_LOCATION_FIELD(),
but it seems unlikely that a node would consist only of such fields, so
I'll leave them alone for now.
2012-06-30 22:27:49 -04:00
Tom Lane
fa188b5ef5 Remove inappropriate semicolons after function definitions.
Solaris Studio warns about this, and some compilers might think it's an
outright syntax error.
2012-06-30 17:29:39 -04:00
Tom Lane
81e8264383 Declare AnonymousShmem pointer as "void *".
The original coding had it as "PGShmemHeader *", but that doesn't offer any
notational benefit because we don't dereference it.  And it was resulting
in compiler warnings on some platforms, notably buildfarm member
castoroides, where mmap() and munmap() are evidently declared to take and
return "char *".
2012-06-30 17:19:46 -04:00
Tom Lane
541ffa65c3 Prevent CREATE TABLE LIKE/INHERITS from (mis) copying whole-row Vars.
If a CHECK constraint or index definition contained a whole-row Var (that
is, "table.*"), an attempt to copy that definition via CREATE TABLE LIKE or
table inheritance produced incorrect results: the copied Var still claimed
to have the rowtype of the source table, rather than the created table.

For the LIKE case, it seems reasonable to just throw error for this
situation, since the point of LIKE is that the new table is not permanently
coupled to the old, so there's no reason to assume its rowtype will stay
compatible.  In the inheritance case, we should ideally allow such
constraints, but doing so will require nontrivial refactoring of CREATE
TABLE processing (because we'd need to know the OID of the new table's
rowtype before we adjust inherited CHECK constraints).  In view of the lack
of previous complaints, that doesn't seem worth the risk in a back-patched
bug fix, so just make it throw error for the inheritance case as well.

Along the way, replace change_varattnos_of_a_node() with a more robust
function map_variable_attnos(), which is capable of being extended to
handle insertion of ConvertRowtypeExpr whenever we get around to fixing
the inheritance case nicely, and in the meantime it returns a failure
indication to the caller so that a helpful message with some context can be
thrown.  Also, this code will do the right thing with subselects (if we
ever allow them in CHECK or indexes), and it range-checks varattnos before
using them to index into the map array.

Per report from Sergey Konoplev.  Back-patch to all supported branches.
2012-06-30 16:45:14 -04:00
Heikki Linnakangas
567787f216 Validate xlog record header before enlarging the work area to store it.
If the record header is garbled, we're now quite likely to notice it before
we try to make a bogus memory allocation and run out of memory. That can
still happen, if the xlog record is split across pages (we cannot verify
the record header until reading the next page in that scenario), but this
reduces the chances. An out-of-memory is treated as a corrupt record
anyway, so this isn't a correctness issue, just a case of giving a better
error message.

Per Amit Kapila's suggestion.
2012-06-30 23:14:35 +03:00
Tom Lane
42e2ce6ae3 Fix confusion between "size" and "AnonymousShmemSize".
Noted by Andres Freund.  Also improve a couple of comments.
2012-06-29 15:12:10 -04:00
Heikki Linnakangas
7a5c9ca93a Initialize shared memory copy of ckptXidEpoch correctly when not in recovery.
This bug was introduced by commit 20d98ab6e4,
so backpatch this to 9.0-9.2 like that one.

This fixes bug #6710, reported by Tarvi Pillessaar
2012-06-29 19:32:15 +03:00
Tom Lane
ae90128dc5 Fix NOTIFY to cope with I/O problems, such as out-of-disk-space.
The LISTEN/NOTIFY subsystem got confused if SimpleLruZeroPage failed,
which would typically happen as a result of a write() failure while
attempting to dump a dirty pg_notify page out of memory.  Subsequently,
all attempts to send more NOTIFY messages would fail with messages like
"Could not read from file "pg_notify/nnnn" at offset nnnnn: Success".
Only restarting the server would clear this condition.  Per reports from
Kevin Grittner and Christoph Berg.

Back-patch to 9.0, where the problem was introduced during the
LISTEN/NOTIFY rewrite.
2012-06-29 00:51:34 -04:00
Tom Lane
c1494b7330 Provide MAP_FAILED if sys/mman.h doesn't.
On old HPUX this has to be #defined to -1.  It might be that other values
are required on other dinosaur systems, but we'll worry about that when
and if we get reports.
2012-06-28 14:19:20 -04:00
Heikki Linnakangas
8f85667a86 Update outdated commit; xlp_rem_len field is in page header now.
Spotted by Amit Kapila
2012-06-28 20:35:18 +03:00
Robert Haas
39715af23a Fix broken mmap failure-detection code, and improve error message.
Per an observation by Thom Brown that my previous commit made an
overly large shmem allocation crash the server, on Linux.
2012-06-28 12:57:22 -04:00
Robert Haas
b0fc0df936 Dramatically reduce System V shared memory consumption.
Except when compiling with EXEC_BACKEND, we'll now allocate only a tiny
amount of System V shared memory (as an interlock to protect the data
directory) and allocate the rest as anonymous shared memory via mmap.
This will hopefully spare most users the hassle of adjusting operating
system parameters before being able to start PostgreSQL with a
reasonable value for shared_buffers.

There are a bunch of documentation updates needed here, and we might
need to adjust some of the HINT messages related to shared memory as
well.  But it's not 100% clear how portable this is, so before we
write the documentation, let's give it a spin on the buildfarm and
see what turns red.
2012-06-28 11:05:16 -04:00
Robert Haas
c5b3451a8e Add missing space in event_source GUC description.
This has apparently been wrong since event_source was added.

Alexander Lakhin
2012-06-28 08:15:50 -04:00
Tom Lane
bde689f809 Make UtilityContainsQuery recurse until it finds a non-utility Query.
The callers of UtilityContainsQuery want it to return a non-utility Query
if it returns anything at all.  However, since we made CREATE TABLE
AS/SELECT INTO into a utility command instead of a variant of SELECT,
a command like "EXPLAIN SELECT INTO" results in two nested utility
statements.  So what we need UtilityContainsQuery to do is drill down
to the bottom non-utility Query.

I had thought of this possibility in setrefs.c, and fixed it there by
looping around the UtilityContainsQuery call; but overlooked that the call
sites in plancache.c have a similar issue.  In those cases it's
notationally inconvenient to provide an external loop, so let's redefine
UtilityContainsQuery as recursing down to a non-utility Query instead.

Noted by Rushabh Lathia.  This is a somewhat cleaned-up version of his
proposed patch.
2012-06-27 23:18:30 -04:00
Heikki Linnakangas
a8f97b39c7 Fix two more neglected comments, still referring to log/seg.
Fujii Masao
2012-06-27 19:11:26 +03:00
Heikki Linnakangas
ec786c6c81 I neglected many comments in the log+seg -> 64-bit segno patch. Fix.
Reported by Amit Kapila.
2012-06-27 17:53:53 +03:00
Robert Haas
c60ca19de9 Allow pg_terminate_backend() to be used on backends with matching role.
A similar change was made previously for pg_cancel_backend, so now it
all matches again.

Dan Farina, reviewed by Fujii Masao, Noah Misch, and Jeff Davis,
with slight kibitzing on the doc changes by me.
2012-06-26 16:16:52 -04:00
Robert Haas
b79ab00144 When LWLOCK_STATS is defined, count spindelays.
When LWLOCK_STATS is *not* defined, the only change is that
SpinLockAcquire now returns the number of delays.

Patch by me, review by Jeff Janes.
2012-06-26 16:06:07 -04:00
Tom Lane
757773602c Cope with smaller-than-normal BLCKSZ setting in SPGiST indexes on text.
The original coding failed miserably for BLCKSZ of 4K or less, as reported
by Josh Kupershmidt.  With the present design for text indexes, a given
inner tuple could have up to 256 labels (requiring either 3K or 4K bytes
depending on MAXALIGN), which means that we can't positively guarantee no
failures for smaller blocksizes.  But we can at least make it behave sanely
so long as there are few enough labels to fit on a page.  Considering that
btree is also more prone to "index tuple too large" failures when BLCKSZ is
small, it's not clear that we should expend more work than this on this
case.
2012-06-26 14:36:25 -04:00
Robert Haas
0caa0d04db Make DROP FUNCTION hint more informative.
If you decide you want to take the hint, this gives you something you
can paste right back to the server.

Dean Rasheed
2012-06-26 13:33:23 -04:00
Robert Haas
76837c1507 Reduce use of heavyweight locking inside hash AM.
Avoid using LockPage(rel, 0, lockmode) to protect against changes to
the bucket mapping.  Instead, an exclusive buffer content lock is now
viewed as sufficient permission to modify the metapage, and a shared
buffer content lock is used when such modifications need to be
prevented.  This more relaxed locking regimen makes it possible that,
when we're busy getting a heavyweight bucket on the bucket we intend
to search or insert into, a bucket split might occur underneath us.
To compenate for that possibility, we use a loop-and-retry system:
release the metapage content lock, acquire the heavyweight lock on the
target bucket, and then reacquire the metapage content lock and check
that the bucket mapping has not changed.   Normally it hasn't, and
we're done.  But if by chance it has, we simply unlock the metapage,
release the heavyweight lock we acquired previously, lock the new
bucket, and loop around again.  Even in the worst case we cannot loop
very many times here, since we don't split the same bucket again until
we've split all the other buckets, and 2^N gets big pretty fast.

This results in greatly improved concurrency, because we're
effectively replacing two lwlock acquire-and-release cycles in
exclusive mode (on one of the lock manager locks) with a single
acquire-and-release cycle in shared mode (on the metapage buffer
content lock).  Testing shows that it's still not quite as good as
btree; for that, we'd probably have to find some way of getting rid
of the heavyweight bucket locks as well, which does not appear
straightforward.

Patch by me, review by Jeff Janes.
2012-06-26 06:56:10 -04:00
Alvaro Herrera
77ed0c6950 Tighten up includes in sinvaladt.h, twophase.h, proc.h
Remove proc.h from sinvaladt.h and twophase.h; also replace xlog.h in
proc.h with xlogdefs.h.
2012-06-25 18:40:40 -04:00
Peter Eisentraut
eeece9e609 Unify calling conventions for postgres/postmaster sub-main functions
There was a wild mix of calling conventions: Some were declared to
return void and didn't return, some returned an int exit code, some
claimed to return an exit code, which the callers checked, but
actually never returned, and so on.

Now all of these functions are declared to return void and decorated
with attribute noreturn and don't return.  That's easiest, and most
code already worked that way.
2012-06-25 21:30:12 +03:00
Robert Haas
c7d47abd04 Fix typo in DEBUG message, introduced by recent WAL refactoring.
Fujii Masao
2012-06-25 14:00:35 -04:00
Peter Eisentraut
b8b2e3b2de Replace int2/int4 in C code with int16/int32
The latter was already the dominant use, and it's preferable because
in C the convention is that intXX means XX bits.  Therefore, allowing
mixed use of int2, int4, int8, int16, int32 is obviously confusing.

Remove the typedefs for int2 and int4 for now.  They don't seem to be
widely used outside of the PostgreSQL source tree, and the few uses
can probably be cleaned up by the time this ships.
2012-06-25 01:51:46 +03:00
Heikki Linnakangas
a218e23a08 Oops. Remove stray paren.
I didn't notice this on my laptop as I don't HAVE_FSYNC_WRITETHROUGH.
2012-06-24 20:03:57 +03:00
Heikki Linnakangas
0ab9d1c4b3 Replace XLogRecPtr struct with a 64-bit integer.
This simplifies code that needs to do arithmetic on XLogRecPtrs.

To avoid changing on-disk format of data pages, the LSN on data pages is
still stored in the old format. That should keep pg_upgrade happy. However,
we have XLogRecPtrs embedded in the control file, and in the structs that
are sent over the replication protocol, so this changes breaks compatibility
of pg_basebackup and server. I didn't do anything about this in this patch,
per discussion on -hackers, the right thing to do would to be to change the
replication protocol to be architecture-independent, so that you could use
a newer version of pg_receivexlog, for example, against an older server
version.
2012-06-24 19:19:45 +03:00
Heikki Linnakangas
061e7efb1b Allow WAL record header to be split across pages.
This saves a few bytes of WAL space, but the real motivation is to make it
predictable how much WAL space a record requires, as it no longer depends
on whether we need to waste the last few bytes at end of WAL page because
the header doesn't fit.

The total length field of WAL record, xl_tot_len, is moved to the beginning
of the WAL record header, so that it is still always found on the first page
where a WAL record begins.

Bump WAL version number again as this is an incompatible change.
2012-06-24 18:35:56 +03:00
Heikki Linnakangas
20ba5ca64c Move WAL continuation record information to WAL page header.
The continuation record only contained one field, xl_rem_len, so it makes
things simpler to just include it in the WAL page header. This wastes four
bytes on pages that don't begin with a continuation from previos page, plus
four bytes on every page, because of padding.

The motivation of this is to make it easier to calculate how much space a
WAL record needs. Before this patch, it depended on how many page boundaries
the record crosses. The motivation of that, in turn, is to separate the
allocation of space in the WAL from the copying of the record data to the
allocated space. Keeping the calculation of space required simple helps to
keep the critical section of allocating the space from WAL short. But that's
not included in this patch yet.

Bump WAL version number again, as this is an incompatible change.
2012-06-24 18:35:30 +03:00
Heikki Linnakangas
dfda6ebaec Don't waste the last segment of each 4GB logical log file.
The comments claimed that wasting the last segment made it easier to do
calculations with XLogRecPtrs, because you don't have problems representing
last-byte-position-plus-1 that way. In my experience, however, it only made
things more complicated, because the there was two ways to represent the
boundary at the beginning of a logical log file: logid = n+1 and xrecoff = 0,
or as xlogid = n and xrecoff = 4GB - XLOG_SEG_SIZE. Some functions were
picky about which representation was used.

Also, use a 64-bit segment number instead of the log/seg combination, to
point to a certain WAL segment. We assume that all platforms have a working
64-bit integer type nowadays.

This is an incompatible change in WAL format, so bumping WAL version number.
2012-06-24 18:35:29 +03:00
Tom Lane
d14241c2cf Fix memory leak in ARRAY(SELECT ...) subqueries.
Repeated execution of an uncorrelated ARRAY_SUBLINK sub-select (which
I think can only happen if the sub-select is embedded in a larger,
correlated subquery) would leak memory for the duration of the query,
due to not reclaiming the array generated in the previous execution.
Per bug #6698 from Armando Miraglia.  Diagnosis and fix idea by Heikki,
patch itself by me.

This has been like this all along, so back-patch to all supported versions.
2012-06-21 17:27:19 -04:00
Alvaro Herrera
68d0e3cbf9 Repair comment mangled by a pgindent run long ago 2012-06-21 15:37:05 -04:00
Heikki Linnakangas
eeb6f37d89 Add a small cache of locks owned by a resource owner in ResourceOwner.
This speeds up reassigning locks to the parent owner, when the transaction
holds a lot of locks, but only a few of them belong to the current resource
owner. This is particularly helps pg_dump when dumping a large number of
objects.

The cache can hold up to 15 locks in each resource owner. After that, the
cache is marked as overflowed, and we fall back to the old method of
scanning the whole local lock table. The tradeoff here is that the cache has
to be scanned whenever a lock is released, so if the cache is too large,
lock release becomes more expensive. 15 seems enough to cover pg_dump, and
doesn't have much impact on lock release.

Jeff Janes, reviewed by Amit Kapila and Heikki Linnakangas.
2012-06-21 15:30:26 +03:00
Tom Lane
dfd9c116cc Remove incomplete/incorrect support for zero-column foreign keys.
The original coding in ri_triggers.c had partial support for the concept of
zero-column foreign key constraints.  But this is not defined in the SQL
standard, nor was it ever allowed by any other part of Postgres, nor was it
very fully implemented even here (eg there was no support for preventing
PK-table deletions that would violate the constraint).  Doesn't seem very
useful to carry 100-plus lines of code for a corner case that no one is
interested in making work.  Instead, just add a check that the column list
read from pg_constraint is non-empty.
2012-06-20 20:15:02 -04:00
Tom Lane
0ce4459a36 Increase MAX_SYSCACHE_CALLBACKS from 20 to 32.
By my count there are 18 callers of CacheRegisterSyscacheCallback in the
core code in HEAD, so we are potentially leaving as few as 2 slots for any
add-on code to use (though possibly not all these callers would actually
activate in any particular session).  That doesn't seem like a lot of
headroom, so let's pump it up a little.
2012-06-20 19:47:37 -04:00
Tom Lane
45ba424f33 Cache the results of ri_FetchConstraintInfo in a backend-local cache.
Extracting data from pg_constraint turned out to take as much as 10% of the
runtime in a bulk-update case where the foreign key column wasn't changing,
because we did it over again for each tuple.  Fix that by maintaining a
backend-local cache of the results.  This is really a pretty small patch,
but converting the trigger functions to work with pointers rather than
local struct variables requires a lot of mechanical changes.
2012-06-20 17:24:14 -04:00
Tom Lane
cfa0f4255b Improve tests for whether we can skip queueing RI enforcement triggers.
During an update of a PK row, we can skip firing the RI trigger if any old
key value is NULL, because then the row could not have had any matching
rows in the FK table.  Conversely, during an update of an FK row, the
outcome is determined if any new key value is NULL.  In either case it
becomes unnecessary to compare individual key values.

This patch was inspired by discussion of Vik Reykja's patch to use IS NOT
DISTINCT semantics for the key comparisons.  In the event there is no need
for that and so this patch looks nothing like his, but he should still get
credit for having re-opened consideration of the trigger skip logic.
2012-06-19 20:07:33 -04:00
Tom Lane
fe3db74002 Share RI trigger code between NO ACTION and RESTRICT cases.
These triggers are identical except for whether ri_Check_Pk_Match is to be
called, so factor out the common code to save a couple hundred lines.

Also, eliminate null-column checks in ri_Check_Pk_Match, since they're
duplicate with the calling functions and require unnecessary complication
in its API statement.

Simplify the way code is shared between RI_FKey_check_ins and
RI_FKey_check_upd, too.
2012-06-19 14:31:54 -04:00
Tom Lane
48756be9cf Improve comments about why SET DEFAULT triggers must recheck for matches.
I was confused about this, so try to make it clearer for the next person.

(This seems like a fairly inefficient way of dealing with a corner case,
but I don't have a better idea offhand.  Maybe if there were a way to turn
off the RI_FKey_keyequal_upd_fk event filter temporarily?)
2012-06-18 22:45:07 -04:00
Tom Lane
e8c9fd5fdf Allow ON UPDATE/DELETE SET DEFAULT plans to be cached.
Once upon a time, somebody was worried that cached RI plans wouldn't get
remade with new default values after ALTER TABLE ... SET DEFAULT, so they
didn't allow caching of plans for ON UPDATE/DELETE SET DEFAULT actions.
That time is long gone, though (and even at the time I doubt this was the
greatest hazard posed by ALTER TABLE...).  So allow these triggers to cache
their plans just like the others.

The cache_plan argument to ri_PlanCheck is now vestigial, since there
are no callers that don't pass "true"; but I left it alone in case there
is any future need for it.
2012-06-18 19:37:23 -04:00
Tom Lane
03a5ba24b0 Remove derived fields from RI_QueryKey, and do a bit of other cleanup.
We really only need the foreign key constraint's OID and the query type
code to uniquely identify each plan we are caching for FK checks.  The
other stuff that was in the struct had no business being used as part of
a hash key, and was all just being copied from struct RI_ConstraintInfo
anyway.  Get rid of the unnecessary fields, and readjust various function
APIs to make them use RI_ConstraintInfo not RI_QueryKey as info source.

I'd be surprised if this makes any measurable performance difference,
but it certainly feels cleaner.
2012-06-18 18:50:29 -04:00
Tom Lane
f9429746c9 Update SQL spec references in ri_triggers code to match SQL:2008.
Now that what we're implementing isn't SQL92, we probably shouldn't cite
chapter and verse in that spec anymore.  Also fix some comments that
talked about MATCH FULL but in fact were in code that's also used for
MATCH SIMPLE.

No code changes in this commit, just comments.
2012-06-18 12:19:38 -04:00
Tom Lane
c75be2ad60 Change ON UPDATE SET NULL/SET DEFAULT referential actions to meet SQL spec.
Previously, when executing an ON UPDATE SET NULL or SET DEFAULT action for
a multicolumn MATCH SIMPLE foreign key constraint, we would set only those
referencing columns corresponding to referenced columns that were changed.
This is what the SQL92 standard said to do --- but more recent versions
of the standard say that all referencing columns should be set to null or
their default values, no matter exactly which referenced columns changed.
At least for SET DEFAULT, that is clearly saner behavior.  It's somewhat
debatable whether it's an improvement for SET NULL, but it appears that
other RDBMS systems read the spec this way.  So let's do it like that.

This is a release-notable behavioral change, although considering that
our documentation already implied it was done this way, the lack of
complaints suggests few people use such cases.
2012-06-18 12:12:52 -04:00
Tom Lane
f5297bdfe4 Refer to the default foreign key match style as MATCH SIMPLE internally.
Previously we followed the SQL92 wording, "MATCH <unspecified>", but since
SQL99 there's been a less awkward way to refer to the default style.

In addition to the code changes, pg_constraint.confmatchtype now stores
this match style as 's' (SIMPLE) rather than 'u' (UNSPECIFIED).  This
doesn't affect pg_dump or psql because they use pg_get_constraintdef()
to reconstruct foreign key definitions.  But other client-side code might
examine that column directly, so this change will have to be marked as
an incompatibility in the 9.3 release notes.
2012-06-17 20:16:44 -04:00
Peter Eisentraut
bb7520cc26 Make documentation of --help and --version options more consistent
Before, some places didn't document the short options (-? and -V),
some documented both, some documented nothing, and they were listed in
various orders.  Now this is hopefully more consistent and complete.
2012-06-18 02:46:59 +03:00
Tom Lane
9e18eacbdf Fix stats collector to recover nicely when system clock goes backwards.
Formerly, if the system clock went backwards, the stats collector would
fail to update the stats file any more until the clock reading again
exceeds whatever timestamp was last written into the stats file.  Such
glitches in the clock's behavior are not terribly unlikely on machines
not using NTP.  Such a scenario has been observed to cause regression test
failures in the buildfarm, and it could have bad effects on the behavior
of autovacuum, so it seems prudent to install some defenses.

We could directly detect the clock going backwards by adding
GetCurrentTimestamp calls in the stats collector's main loop, but that
would hurt performance on platforms where GetCurrentTimestamp is expensive.
To minimize the performance hit in normal cases, adopt a more complicated
scheme wherein backends check for clock skew when reading the stats file,
and if they see it, signal the stats collector by sending an extra stats
inquiry message.  The stats collector does an extra GetCurrentTimestamp
only when it receives an inquiry with an apparently out-of-order
timestamp.

To avoid unnecessary GetCurrentTimestamp calls, expand the inquiry messages
to carry the backend's current clock reading as well as its stats cutoff
time.  The latter, being intentionally slightly in-the-past, would trigger
more clock rechecks than we need if it were used for this purpose.

We might want to backpatch this change at some point, but let's let it
shake out in the buildfarm for awhile first.
2012-06-17 17:11:49 -04:00
Peter Eisentraut
15b1918e7d Improve reporting of permission errors for array types
Because permissions are assigned to element types, not array types,
complaining about permission denied on an array type would be
misleading to users.  So adjust the reporting to refer to the element
type instead.

In order not to duplicate the required logic in two dozen places,
refactor the permission denied reporting for types a bit.

pointed out by Yeb Havinga during the review of the type privilege
feature
2012-06-15 22:55:03 +03:00
Peter Eisentraut
d933092e0a Add more message pluralization
Even though we can't do much about the case with multiple plurals in
one sentence, we can fix the other cases.
2012-06-15 02:02:02 +03:00
Robert Haas
8507c2f856 Improve readability and error messages in pg_backup_start_time.
Gurjeet Singh, with corrections by me.
2012-06-14 15:20:08 -04:00
Robert Haas
68de499bda New SQL functons pg_backup_in_progress() and pg_backup_start_time()
Darold Gilles, reviewed by Gabriele Bartolini and others, rebased by
Marco Nenciarini.  Stylistic cleanup and OID fixes by me.
2012-06-14 13:25:43 -04:00
Robert Haas
cd80073445 During transaction cleanup, release locks before deleting files.
There's no need to hold onto the locks until the files are needed,
and by doing it this way, we reduce the impact on other backends who
may be awaiting locks we hold.

Noah Misch
2012-06-14 10:19:33 -04:00
Robert Haas
6cd015bea3 Add new function log_newpage_buffer.
When I implemented the ginbuildempty() function as part of
implementing unlogged tables, I falsified the note in the header
comment for log_newpage.  Although we could fix that up by changing
the comment, it seems cleaner to add a new function which is
specifically intended to handle this case.  So do that.
2012-06-14 10:11:16 -04:00
Robert Haas
a475c60367 Remove misplaced sanity check from heap_create().
Even when allow_system_table_mods is not set, we allow creation of any
type of SQL object in pg_catalog, except for relations.  And you can
get relations into pg_catalog, too, by initially creating them in some
other schema and then moving them with ALTER .. SET SCHEMA.  So this
restriction, which prevents relations (only) from being created in
pg_catalog directly, is fairly pointless.  If we need a safety mechanism
for this, it should be placed further upstream, so that it affects all
SQL objects uniformly, and picks up both CREATE and SET SCHEMA.

For now, just rip it out, per discussion with Tom Lane.
2012-06-14 09:58:53 -04:00
Robert Haas
d2c86a1ccd Remove RELKIND_UNCATALOGED.
This may have been important at some point in the past, but it no
longer does anything useful.

Review by Tom Lane.
2012-06-14 09:47:30 -04:00
Tom Lane
80edfd7659 Revisit error message details for JSON input parsing.
Instead of identifying error locations only by line number (which could
be entirely unhelpful with long input lines), provide a fragment of the
input text too, placing this info in a new CONTEXT entry.  Make the
error detail messages conform more closely to style guidelines, fix
failure to expose some of them for translation, ensure compiler can
check formats against supplied parameters.
2012-06-13 19:43:35 -04:00
Tom Lane
b8b69d8990 Revert "Reduce checkpoints and WAL traffic on low activity database server"
This reverts commit 18fb9d8d21.  Per
discussion, it does not seem like a good idea to allow committed changes to
go un-checkpointed indefinitely, as could happen in a low-traffic server;
that makes us entirely reliant on the WAL stream with no redundancy that
might aid data recovery in case of disk failure.

This re-introduces the original problem of hot-standby setups generating a
small continuing stream of WAL traffic even when idle, but there are other
ways to address that without compromising crash recovery, so we'll revisit
that issue in a future release cycle.
2012-06-13 18:48:44 -04:00
Tom Lane
c3bc76bdb0 Deprecate use of GLOBAL and LOCAL in temp table creation.
Aside from adjusting the documentation to say that these are deprecated,
we now report a warning (not an error) for use of GLOBAL, since it seems
fairly likely that we might change that to request SQL-spec-compliant temp
table behavior in the foreseeable future.  Although our handling of LOCAL
is equally nonstandard, there is no evident interest in ever implementing
SQL modules, and furthermore some other products interpret LOCAL as
behaving the same way we do.  So no expectation of change and no warning
for LOCAL; but it still seems a good idea to deprecate writing it.

Noah Misch
2012-06-13 17:48:42 -04:00
Tom Lane
93f4d7f806 Support Linux's oom_score_adj API as well as the older oom_adj API.
The simplest way to handle this is just to copy-and-paste the relevant
code block in fork_process.c, so that's what I did. (It's possible that
something more complicated would be useful to packagers who want to work
with either the old or the new API; but at this point the number of such
people is rapidly approaching zero, so let's just get the minimal thing
done.)  Update relevant documentation as well.
2012-06-13 15:35:52 -04:00
Peter Eisentraut
c0a6f9c84b Improve documentation of postgres -C option
Clarify help (s/return/print/), and explain that this option is for
use by other programs, not for user-facing use (it does not print
units).
2012-06-13 13:41:25 +03:00
Tom Lane
f871ef74a5 Minor code review for json.c.
Improve commenting, conform to project style for use of ++ etc.
No functional changes.
2012-06-12 16:23:45 -04:00
Robert Haas
36b7e3da17 Mark JSON error detail messages for translation.
Per gripe from Tom Lane.
2012-06-12 10:41:38 -04:00
Magnus Hagander
3595a71e9c Prevent non-streaming replication connections from being selected sync slave
This prevents a pg_basebackup backup session that just does a base
backup (no xlog involved at all) from becoming the synchronous slave
and thus blocking all access while it runs.

Also fixes the problem when a higher priority slave shows up it would
become the sync standby before it has reached the STREAMING state, by
making sure we can only switch to a walsender that's actually STREAMING.

Fujii Masao
2012-06-11 15:17:38 +02:00
Bruce Momjian
927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Simon Riggs
28ac797287 Revert error message on GLOBAL/LOCAL pending further discussion 2012-06-10 08:41:01 +01:00
Simon Riggs
72335a2015 Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented 2012-06-09 16:35:26 +01:00
Simon Riggs
3725570539 Fix bug in early startup of Hot Standby with subtransactions.
When HS startup is deferred because of overflowed subtransactions, ensure
that we re-initialize KnownAssignedXids for when both existing and incoming
snapshots have non-zero qualifying xids.

Fixes bug #6661 reported by Valentine Gogichashvili.

Analysis and fix by Andres Freund
2012-06-08 17:34:04 +01:00
Tom Lane
ece01aae47 Scan the buffer pool just once, not once per fork, during relation drop.
This provides a speedup of about 4X when NBuffers is large enough.
There is also a useful reduction in sinval traffic, since we
only do CacheInvalidateSmgr() once not once per fork.

Simon Riggs, reviewed and somewhat revised by Tom Lane
2012-06-07 17:43:11 -04:00
Tom Lane
e8d029a30b Do unlocked prechecks in bufmgr.c loops that scan the whole buffer pool.
DropRelFileNodeBuffers, DropDatabaseBuffers, FlushRelationBuffers, and
FlushDatabaseBuffers have to scan the whole shared_buffers pool because
we have no index structure that would find the target buffers any more
efficiently than that.  This gets expensive with large NBuffers.  We can
shave some cycles from these loops by prechecking to see if the current
buffer is interesting before we acquire the buffer header lock.
Ordinarily such a test would be unsafe, but in these cases it should be
safe because we are already assuming that the caller holds a lock that
prevents any new target pages from being loaded into the buffer pool
concurrently.  Therefore, no buffer tag should be changing to a value of
interest, only away from a value of interest.  So a false negative match
is impossible, while a false positive is safe because we'll recheck after
acquiring the buffer lock.  Initial testing says that this speeds these
loops by a factor of 2X to 3X on common Intel hardware.

Patch for DropRelFileNodeBuffers by Jeff Janes (based on an idea of
Heikki's); extended to the remaining sequential scans by Tom Lane
2012-06-07 16:46:26 -04:00
Simon Riggs
2c8a4e9be2 Wake WALSender to reduce data loss at failover for async commit.
WALSender now woken up after each background flush by WALwriter, avoiding
multi-second replication delay for an all-async commit workload.
Replication delay reduced from 7s with default settings to 200ms and often
much less, allowing significantly reduced data loss at failover.

Andres Freund and Simon Riggs
2012-06-07 19:22:47 +01:00
Robert Haas
b50991eedb Fix more crash-safe visibility map bugs, and improve comments.
In lazy_scan_heap, we could issue bogus warnings about incorrect
information in the visibility map, because we checked the visibility
map bit before locking the heap page, creating a race condition.  Fix
by rechecking the visibility map bit before we complain.  Rejigger
some related logic so that we rely on the possibly-outdated
all_visible_according_to_vm value as little as possible.

In heap_multi_insert, it's not safe to clear the visibility map bit
before beginning the critical section.  The visibility map is not
crash-safe unless we treat clearing the bit as a critical operation.
Specifically, if the transaction were to error out after we set the
bit and before entering the critical section, we could end up writing
the heap page to disk (with the bit cleared) and crashing before the
visibility map page made it to disk.  That would be bad.  heap_insert
has this correct, but somehow the order of operations got rearranged
when heap_multi_insert was added.

Also, add some more comments to visibilitymap_test, lazy_scan_heap,
and IndexOnlyNext, expounding on concurrency issues.

Per extensive code review by Andres Freund, and further review by Tom
Lane, who also made the original report about the bogus warnings.
2012-06-07 12:48:13 -04:00
Tom Lane
3dd8e59681 Fix bogus handling of control characters in json_lex_string().
The original coding misbehaved if "char" is signed, and also made the
extremely poor decision to print control characters literally when trying
to complain about them.  Report and patch by Shigeru Hanada.

In passing, also fix core dump risk in report_parse_error() should the
parse state be something other than what it expects.
2012-06-04 20:43:57 -04:00
Simon Riggs
d3abbbebe5 Avoid early reuse of btree pages, causing incorrect query results.
When we allowed read-only transactions to skip assigning XIDs
we introduced the possibility that a fully deleted btree page
could be reused. This broke the index link sequence which could
then lead to indexscans silently returning fewer rows than would
have been correct. The actual incidence of silent errors from
this is thought to be very low because of the exact workload
required and locking pre-conditions. Fix is to remove pages only
if index page opaque->btpo.xact precedes RecentGlobalXmin.

Noah Misch, reviewed by Simon Riggs
2012-06-01 12:21:45 +01:00
Simon Riggs
055c352abb After any checkpoint, close all smgr files handles in bgwriter 2012-06-01 09:24:53 +01:00
Simon Riggs
a297d64d92 Checkpointer starts before bgwriter to avoid missing fsync requests.
Noted while testing Hot Standby startup.
2012-06-01 08:25:17 +01:00
Simon Riggs
1ec6a2bbc9 Provide interim statistics while in mid-checkpoint.
Re-implements similar functionality in 9.1 and previously which
was removed during split of checkpointer and bgwriter.

Requested/spotted by Magnus Hagander
2012-06-01 08:19:06 +01:00
Tom Lane
a04dc87db1 Improve comment for GetStableLatestTransactionId(). 2012-05-31 11:20:02 -04:00
Simon Riggs
a2b516dab9 Only throw recovery conflicts when InHotStandby. Bug fix to recent
patch to allow Index Only Scans on Hot Standby.

Bug report from Jaime Casanova
2012-05-31 13:11:47 +01:00
Tom Lane
ad0009e7be Force PL and range-type support functions to be owned by a superuser.
We allow non-superusers to create procedural languages (with restrictions)
and range datatypes.  Previously, the automatically-created support
functions for these objects ended up owned by the creating user.  This
represents a rather considerable security hazard, because the owning user
might be able to alter a support function's definition in such a way as to
crash the server, inject trojan-horse SQL code, or even execute arbitrary
C code directly.  It appears that right now the only actually exploitable
problem is the infinite-recursion bug fixed in the previous patch for
CVE-2012-2655.  However, it's not hard to imagine that future additions of
more ALTER FUNCTION capability might unintentionally open up new hazards.
To forestall future problems, cause these support functions to be owned by
the bootstrap superuser, not the user creating the parent object.
2012-05-30 23:47:57 -04:00
Tom Lane
33c6eaf78e Ignore SECURITY DEFINER and SET attributes for a PL's call handler.
It's not very sensible to set such attributes on a handler function;
but if one were to do so, fmgr.c went into infinite recursion because
it would call fmgr_security_definer instead of the handler function proper.
There is no way for fmgr_security_definer to know that it ought to call the
handler and not the original function referenced by the FmgrInfo's fn_oid,
so it tries to do the latter, causing the whole process to start over
again.

Ordinarily such misconfiguration of a procedural language's handler could
be written off as superuser error.  However, because we allow non-superuser
database owners to create procedural languages and the handler for such a
language becomes owned by the database owner, it is possible for a database
owner to crash the backend, which ideally shouldn't be possible without
superuser privileges.  In 9.2 and up we will adjust things so that the
handler functions are always owned by superusers, but in existing branches
this is a minor security fix.

Problem noted by Noah Misch (after several of us had failed to detect
it :-().  This is CVE-2012-2655.
2012-05-30 23:27:57 -04:00
Tom Lane
cd0ff9c0f4 Expand the allowed range of timezone offsets to +/-15:59:59 from Greenwich.
We used to only allow offsets less than +/-13 hours, then it was +/14,
then it was +/-15.  That's still not good enough though, as per today's bug
report from Patric Bechtel.  This time I actually looked through the Olson
timezone database to find the largest offsets used anywhere.  The winners
are Asia/Manila, at -15:56:00 until 1844, and America/Metlakatla, at
+15:13:42 until 1867.  So we'd better allow offsets less than +/-16 hours.

Given the history, we are way overdue to have some greppable #define
symbols controlling this, so make some ... and also remove an obsolete
comment that didn't get fixed the last time.

Back-patch to all supported branches.
2012-05-30 19:58:35 -04:00
Robert Haas
07ab1383e3 Fix two more bugs in fast-path relation locking.
First, the previous code failed to account for the fact that, during Hot
Standby operation, the startup process takes AccessExclusiveLocks on
relations without setting MyDatabaseId.  This resulted in fast path
strong lock counts failing to be incremented with the startup process
took locks, which in turn allowed conflicting lock requests to succeed
when they should not have.  Report by Erik Rijkers, diagnosis by Heikki
Linnakangas.

Second, LockReleaseAll() failed to honor the allLocks and lockmethodid
restrictions with respect to fast-path locks.  It's not clear to me
whether this produces any user-visible breakage at the moment, but it's
certainly wrong.  Rearrange order of operations in LockReleaseAll to fix.
Noted by Tom Lane.
2012-05-30 16:17:46 -04:00
Heikki Linnakangas
d1996ed5e8 Change the way parent pages are tracked during buffered GiST build.
We used to mimic the way a stack is constructed when descending the tree
during normal GiST inserts, but that was quite complicated during a buffered
build. It was also wrong: in GiST, the left-to-right relationships on
different levels might not match each other, so that when you know the
parent of a child page, you won't necessarily find the parent of the page to
the right of the child page by following the rightlinks at the parent level.
This sometimes led to "could not re-find parent" errors while building a
GiST index.

We now use a simple hash table to track the parent of every internal page.
Whenever a page is split, and downlinks are moved from one page to another,
we update the hash table accordingly. This is also better for performance
than the old method, as we never need to move right to re-find the parent
page, which could take a significant amount of time for buffers that were
created much earlier in the index build.
2012-05-30 12:05:57 +03:00
Heikki Linnakangas
be02b16826 Delete the temporary file used in buffered GiST build, after the build.
There were two bugs here: We forgot to call gistFreeBuildBuffers() function
at the end of build, and we passed interXact == true to BufFileCreateTemp,
so the file wasn't automatically cleaned up at end-of-transaction either.
2012-05-30 12:05:57 +03:00
Heikki Linnakangas
4bc6fb57f7 Fix integer overflow bug in GiST buffering build calculations.
The result of (maintenance_work_mem * 1024) / BLCKSZ doesn't fit in a signed
32-bit integer, if maintenance_work_mem >= 2GB. Use double instead. And
while we're at it, write the calculations in an easier to understand form,
with the intermediary steps written out and commented.
2012-05-29 22:27:42 +03:00
Tom Lane
2755abf386 Teach AbortOutOfAnyTransaction to clean up partially-started transactions.
AbortOutOfAnyTransaction failed to do anything if the state it saw on
entry corresponded to failing partway through StartTransaction.  I fixed
AbortCurrentTransaction to cope with that case way back in commit
60b2444cc3, but evidently overlooked that
AbortOutOfAnyTransaction should do likewise.

Back-patch to all supported branches.  It's not clear that this omission
has any more-than-cosmetic consequences, but it's also not clear that it
doesn't, so back-patching seems the least risky choice.
2012-05-28 23:57:06 -04:00
Peter Eisentraut
388d251679 Update SQL features list
Set E081 Basic Privileges to supported, since by the letter of it, we
support it, even though not all possible forms of USAGE privileges are
implemented.
2012-05-27 23:34:16 +03:00
Peter Eisentraut
27314d32a8 Suppress -Wunused-result warning about write()
This is related to aa90e148ca, but this
code is only used under -DLINUX_OOM_ADJ, so it was apparently
overlooked then.
2012-05-27 22:35:01 +03:00
Tom Lane
532fe28dad Prevent synchronized scanning when systable_beginscan chooses a heapscan.
The only interesting-for-performance case wherein we force heapscan here
is when we're rebuilding the relcache init file, and the only such case
that is likely to be examining a catalog big enough to be syncscanned is
RelationBuildTupleDesc.  But the early-exit optimization in that code gets
broken if we start the scan at a random place within the catalog, so that
allowing syncscan is actually a big deoptimization if pg_attribute is large
(at least for the normal case where the rows for core system catalogs have
never been changed since initdb).  Hence, prevent syncscan here.  Per my
testing pursuant to complaints from Jeff Frost and Greg Sabino Mullane,
though neither of them seem to have actually hit this specific problem.

Back-patch to 8.3, where syncscan was introduced.
2012-05-26 19:09:52 -04:00
Tom Lane
d3b97d1488 Fix string truncation to be multibyte-aware in text_name and bpchar_name.
Previously, casts to name could generate invalidly-encoded results.

Also, make these functions match namein() more exactly, by consistently
using palloc0() instead of ad-hoc zeroing code.

Back-patch to all supported branches.

Karl Schnaitter and Tom Lane
2012-05-25 17:34:51 -04:00
Tom Lane
2a4c46e0ba Fix array overrun in regex code.
zaptreesubs() was coded to unconditionally reset a capture subre's
corresponding pmatch[] entry.  However, in regexes without backrefs, that
array is caller-supplied and might not have as many entries as the regex
has capturing parens.  So check the array length and do nothing if there
is no corresponding entry, much as subset() does.  Failure to check this
resulted in a stack clobber in the case reported by Marko Kreen.

This bug appears to have been latent in the regex library from the
beginning.  It was not exposed because find() called dissect() not
cdissect(), and the dissect() code path didn't ever call zaptreesubs()
(formerly zapmem()).  When I unified dissect() and cdissect() in commit
4dd78bf37a, the problem was exposed.

Now that I've seen this, I'm rather suspicious that we might need to
back-patch it; but will refrain for now, for lack of evidence that
the case can be hit in the previous coding.
2012-05-24 13:56:16 -04:00
Tom Lane
ed962fd712 Ensure that seqscans check for interrupts at least once per page.
If a seqscan encounters many consecutive pages containing only dead tuples,
it can remain in the loop in heapgettup for a long time, and there was no
CHECK_FOR_INTERRUPTS anywhere in that loop.  This meant there were
real-world situations where a query would be effectively uncancelable for
long stretches.  Add a check placed to occur once per page, which should be
enough to provide reasonable response time without adding any measurable
overhead.

Report and patch by Merlin Moncure (though I tweaked it a bit).
Back-patch to all supported branches.
2012-05-22 19:42:05 -04:00
Robert Haas
8fbe5a317d Fix error message for COMMENT/SECURITY LABEL ON COLUMN xxx IS 'yyy'
When the column name is an unqualified name, rather than table.column,
the error message complains about too many dotted names, which is
wrong.  Report by Peter Eisentraut based on examination of the
sepgsql regression test output, but the problem also affects COMMENT.
New wording as suggested by Tom Lane.
2012-05-22 11:23:36 -04:00
Robert Haas
219c024c64 Repair out-of-date information in src/backend/storage/buffer/README.
In commit d526575f89, we changed things so
that buffer usage counts are incremented when the buffer is pinned, rather
than when it is unpinned, but the README file didn't get the memo.

Report by Amit Kapila.
2012-05-22 09:32:09 -04:00
Tom Lane
b94ce6e807 Move postmaster's RemovePgTempFiles call to a less randomly chosen place.
There is no reason to do this as early as possible in postmaster startup,
and good reason not to do it until we have completely created the
postmaster's lock file, namely that it might contribute to pg_ctl thinking
that postmaster startup has timed out.  (This would require a rather
unusual amount of time to be spent scanning temp file directories, but we
have at least one field report of it happening reproducibly.)

Back-patch to 9.1.  Before that, pg_ctl didn't wait for additional info to
be added to the lock file, so it wasn't a problem.

Note that this is not a complete fix to the slow-start issue in 9.1,
because we still had identify_system_timezone being run during postmaster
start in 9.1.  But that's at least a reasonably well-defined delay, with
an easy workaround if needed, whereas the temp-files scan is not so
predictable and cannot be avoided.
2012-05-21 22:50:30 -04:00
Tom Lane
efae4653c9 Update woefully-obsolete comment.
The accurate info about what's in a lock file has been in miscadmin.h
for some time, so let's just make this comment point there instead of
maintaining a duplicative copy.
2012-05-21 22:11:00 -04:00
Peter Eisentraut
f1f6737e15 Fix incorrect logic in JSON number lexer
Detectable by gcc -Wlogical-op.

Add two regression test cases that would previously allow incorrect
values to pass.
2012-05-20 02:24:46 +03:00
Peter Eisentraut
2273a50364 Realign some --help output to have better spacing between columns 2012-05-18 20:34:14 +03:00
Heikki Linnakangas
1d27dcf578 Fix bug in gistRelocateBuildBuffersOnSplit().
When we create a temporary copy of the old node buffer, in stack, we mustn't
leak that into any of the long-lived data structures. Before this patch,
when we called gistPopItupFromNodeBuffer(), it got added to the array of
"loaded buffers". After gistRelocateBuildBuffersOnSplit() exits, the
pointer added to the loaded buffers array points to garbage. Often that goes
unnotied, because when we go through the array of loaded buffers to unload
them, buffers with a NULL pageBuffer are ignored, which can often happen by
accident even if the pointer points to garbage.

This patch fixes that by marking the temporary copy in stack explicitly as
temporary, and refrain from adding buffers marked as temporary to the array
of loaded buffers.

While we're at it, initialize nodeBuffer->pageBlocknum to InvalidBlockNumber
and improve comments a bit. This isn't strictly necessary, but makes
debugging easier.
2012-05-18 19:38:32 +03:00
Peter Eisentraut
939ec9b8a4 Update SQL features/conformance information to SQL:2011 2012-05-17 09:50:04 +03:00
Peter Eisentraut
be6d1c88a4 Change COLLATION keyword category
It was changed from unreserved to reserved as part of the COLLATION
FOR syntax, but it turns out that type_func_name_keyword is
sufficient.
2012-05-16 20:19:44 +03:00
Tom Lane
488c6dd170 Improve error message for ALTER COLUMN TYPE coercion failure.
Per recent discussion, the error message for this was actually a trifle
inaccurate, since it said "cannot be cast" which might be incorrect.
Adjust that wording, and add a HINT suggesting that a USING clause might
be needed.
2012-05-16 07:28:25 -04:00
Heikki Linnakangas
6593c5b5dc Fix bug in freespace calculation in heap_multi_insert().
If the amount of freespace on page was less than the amount reserved by
fillfactor, the calculation would underflow.

This fixes bug #6643 reported by Tomonari Katsumata.
2012-05-16 14:13:06 +03:00
Peter Eisentraut
c8e086795a Remove whitespace from end of lines
pgindent and perltidy should clean up the rest.
2012-05-15 22:19:41 +03:00
Peter Eisentraut
8afb026e57 Remove stray nbsp character 2012-05-15 21:38:59 +03:00
Heikki Linnakangas
d2495f272c Fix bug in to_tsquery().
We were using memcpy() to copy to a possibly overlapping memory region,
which is a no-no. Use memmove() instead.
2012-05-15 19:27:34 +03:00
Tom Lane
9b63e9869f In pgstat.c, use a timeout in WaitLatchOrSocket only on Windows.
We have no need for a timeout here really, but some broken products from
Redmond seem to lose FD_READ events occasionally, and waking up and
retrying the recv() is the only known way to work around that.  Perhaps
somebody will be motivated to figure out a better answer here; but not I.
2012-05-14 23:51:34 -04:00
Tom Lane
5a2bb06012 Revert "Add some temporary instrumentation to pgstat.c."
This reverts commit 7d88bb73f7.
That instrumentation has served its purpose.
2012-05-14 23:08:10 -04:00
Tom Lane
e42a21b9e6 Assert that WaitLatchOrSocket callers cannot wait only for writability.
Since we have chosen to report socket EOF and error conditions via the
WL_SOCKET_READABLE flag bit, it's unsafe to wait only for
WL_SOCKET_WRITEABLE; the caller would never be notified of the socket
condition, and in some of these implementations WaitLatchOrSocket would
busy-wait until something else happens.  Add this restriction to the API
specification, and add Asserts to check that callers don't try to do that.

At some point we might want to consider adjusting the API to relax this
restriction, but until we have an actual use case for waiting on a
write-only socket, it seems premature to design a solution.
2012-05-14 16:12:28 -04:00
Tom Lane
d461d0502b For testing purposes, reinsert a timeout in pgstat.c's wait call.
Test results from buildfarm members mastodon/narwhal (Windows Server 2003)
make it look like that platform just plain loses FD_READ events
occasionally, and the only reason our previous coding seemed to work was
that it timed out every couple of seconds and retried the whole operation.
Try to verify this by reinserting a finite timeout into the pgstat loop.
This isn't meant to be a permanent patch either, just to confirm or
disprove a theory.
2012-05-14 15:03:14 -04:00
Tom Lane
f1ca51549e Force pgwin32_recv into nonblock mode when called from pgstat.c.
This should get rid of the usage of pgwin32_waitforsinglesocket entirely,
and perhaps thereby remove the race condition that's evidently still
present on some versions of Windows.  The previous arrangement was a bit
unsafe anyway, since waiting at the recv() would not allow pgstat to notice
postmaster death.
2012-05-14 10:57:07 -04:00
Heikki Linnakangas
f15c2eae9c Remove unnecessary pg_verifymbstr() calls from tsvector/query in functions.
The input should've been validated well before it hits the input function.
Doing so again is a waste of cycles.
2012-05-14 14:30:32 +03:00
Heikki Linnakangas
9e4637bf89 Update comments that became out-of-date with the PGXACT struct.
When the "hot" members of PGPROC were split off to separate PGXACT structs,
many PGPROC fields referred to in comments were moved to PGXACT, but the
comments were neglected in the commit. Mostly this is just a search/replace
of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for
prepared transactions changed more, making some of the comments totally
bogus.

Noah Misch
2012-05-14 10:28:55 +03:00
Peter Eisentraut
64f09ca386 Remove leftovers of BeOS port
These should have been removed when the BeOS port was removed in
44f9021223.
2012-05-14 04:50:39 +03:00
Peter Eisentraut
6bf1e7668d Small punctuation editing of postgresql.conf.sample 2012-05-14 04:50:39 +03:00
Tom Lane
7d88bb73f7 Add some temporary instrumentation to pgstat.c.
Log main-loop blocking events and the results of inquiry messages.
This is to get some clarity as to what's happening on those Windows
buildfarm members that still don't like the latch-ified stats collector.
This bulks up the postmaster log a tad, so I won't leave it in place for
long.
2012-05-13 21:11:31 -04:00
Tom Lane
b8347138e9 Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation.  However, we forgot that the pg_tblspc
symlink might still be there.  We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.

There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.

Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-13 18:06:52 -04:00
Tom Lane
966970ed63 Re-revert stats collector latch changes.
This reverts commit cb2f2873d6, restoring
the latch-ified stats collector logic.  We'll soon see if this works any
better on the Windows buildfarm machines.
2012-05-13 14:44:39 -04:00
Tom Lane
b85427f227 Attempt to fix some issues in our Windows socket code.
Make sure WaitLatchOrSocket regards FD_CLOSE as a read-ready condition.
We might want to tweak this further, but it was surely wrong as-is.

Make pgwin32_waitforsinglesocket detach its private event object from the
passed socket before returning.  I suspect that failure to do so leads
to race conditions when other code (such as WaitLatchOrSocket) attaches
a different event object to the same socket.  Moreover, the existing
coding meant that repeated calls to pgwin32_waitforsinglesocket would
perform ResetEvent on an event actively connected to a socket, which
is rumored to be an unsafe practice; the WSAEventSelect documentation
appears to recommend against this, though it does not say not to do it
in so many words.

Also, uniformly use the coding pattern "WSAEventSelect(s, NULL, 0)" to
detach events from sockets, rather than passing the event in the second
parameter.  The WSAEventSelect documentation says that the second parameter
is ignored if the third is 0, so theoretically this should make no
difference.  However, elsewhere on the same reference page the use of NULL
in this context is recommended, and I have found suggestions on the net
that some versions of Windows have bugs with a non-NULL second parameter
in this usage.

Some other mostly-cosmetic cleanup, such as using the right one of
WSAGetLastError and GetLastError for reporting errors from these functions.
2012-05-13 14:35:40 -04:00
Tom Lane
fd350ef395 Fix bogus declaration of local variable.
rc should be an int here, not a pgsocket.  Fairly harmless as long as
pgsocket is an integer type, but nonetheless wrong.  Error introduced
in commit 87091cb1f1.
2012-05-13 00:30:32 -04:00
Tom Lane
398b240151 Avoid unnecessary process wakeups in the log collector.
syslogger was coded to wake up once per second whether there was anything
useful to do or not.  As part of our campaign to reduce the server's idle
power consumption, change it to use a latch for waiting.  Now, in the
absence of any data to log or any signals to service, it will only wake up
at the programmed logfile rotation times (if any).
2012-05-12 19:21:54 -04:00
Tom Lane
31ad655364 Fix WaitLatchOrSocket to handle EOF on socket correctly.
When using poll(), EOF on a socket is reported with the POLLHUP not
POLLIN flag (at least on Linux).  WaitLatchOrSocket failed to check
this bit, causing it to go into a busy-wait loop if EOF occurs.
We earlier fixed the same mistake in the test for the state of the
postmaster_alive socket, but missed it for the caller-supplied socket.
Fortunately, this error is new in 9.2, since 9.1 only had a select()
based code path not a poll() based one.
2012-05-12 16:36:47 -04:00
Simon Riggs
867540b49c Ensure backwards compatibility for GetStableLatestTransactionId() 2012-05-12 13:26:10 +01:00
Peter Eisentraut
afe86a9e73 Fix obsolescent C declaration syntax
gcc -Wextra/-Wold-style-declaration thinks that "inline" should go
before the function return type.
2012-05-12 12:52:02 +03:00
Tom Lane
d0c231d132 Cosmetic adjustments for postmaster's handling of checkpointer.
Correct some comments, order some operations a bit more consistently.
No functional changes.
2012-05-11 17:46:37 -04:00
Robert Haas
1331cc6c1a Prevent loss of init fork when truncating an unlogged table.
Fixes bug #6635, reported by Akira Kurosawa.
2012-05-11 09:48:56 -04:00
Simon Riggs
b762e8f50b Remove extraneous #include "storage/proc.h" 2012-05-11 14:46:46 +01:00
Simon Riggs
b06679e012 Ensure age() returns a stable value rather than the latest value 2012-05-11 14:36:24 +01:00
Heikki Linnakangas
3652d72dd4 On GiST page split, release the locks on child pages before recursing up.
When inserting the downlinks for a split gist page, we used hold the locks
on the child pages until the insertion into the parent - and recursively its
parent if it had to be split too - were all completed. Change that so that
the locks on child pages are released after the insertion in the immediate
parent is done, before recursing further up the tree.

This reduces the number of lwlocks that are held simultaneously. Holding
many locks is bad for concurrency, and in extreme cases you can even hit
the limit of 100 simultaneously held lwlocks in a backend. If you're really
unlucky, you can hit the limit while in a critical section, which brings
down the whole system.

This fixes bug #6629 reported by Tom Forbes. Backpatch to 9.1. The page
splitting code was rewritten in 9.1, and the old code did not have this
problem.
2012-05-11 12:35:28 +03:00
Tom Lane
cb2f2873d6 Temporarily revert stats collector latch changes so we can ship beta1.
This patch reverts commit 49340037ee and some
follow-on tweaking in pgstat.c.  While the basic scheme of latch-ifying the
stats collector seems sound enough, it's failing on most Windows buildfarm
members for unknown reasons, and there's no time left to debug that before
9.2beta1.  Better to ship a beta version without this improvement.  I hope
to re-revert this once beta1 is out, though.
2012-05-10 17:26:33 -04:00
Tom Lane
f40022f1ad Make WaitLatch's WL_POSTMASTER_DEATH result trustworthy; simplify callers.
Per a suggestion from Peter Geoghegan, make WaitLatch responsible for
verifying that the WL_POSTMASTER_DEATH bit it returns is truthful (by
testing PostmasterIsAlive).  Then simplify its callers, who no longer
need to do that for themselves.  Remove weasel wording about falsely-set
result bits from WaitLatch's API contract.
2012-05-10 14:34:53 -04:00
Tom Lane
ada8fa08fc Fix Windows implementation of PGSemaphoreLock.
The original coding failed to reset ImmediateInterruptOK before returning,
which would potentially allow a subsequent query-cancel interrupt to be
accepted at an unsafe point.  This is a really nasty bug since it's so hard
to predict the consequences, but they could be unpleasant.

Also, ensure that signal handlers are serviced before this function
returns, even if the semaphore is already set.  This should make the
behavior more like Unix.

Back-patch to all supported versions.
2012-05-10 13:36:14 -04:00
Tom Lane
8ebc908c57 Improve Windows implementation of WaitLatch/WaitLatchOrSocket.
Ensure that signal handlers are serviced before this function returns.
This should make the behavior more like Unix.  Also, add some more
error checking, and make some other cosmetic improvements.

No back-patch since it's not clear whether this is fixing any live bug
that would affect 9.1.  I'm more concerned about 9.2 anyway given our
considerable recent expansions in the usage of WaitLatch.
2012-05-10 13:26:47 -04:00
Tom Lane
fd71421b01 Improve tests for postmaster death in auxiliary processes.
In checkpointer and walwriter, avoid calling PostmasterIsAlive unless
WaitLatch has reported WL_POSTMASTER_DEATH.  This saves a kernel call per
iteration of the process's outer loop, which is not all that much, but a
cycle shaved is a cycle earned.  I had already removed the unconditional
PostmasterIsAlive calls in bgwriter and pgstat in previous patches, but
forgot that WL_POSTMASTER_DEATH is supposed to be treated as untrustworthy
(per comment in unix_latch.c); so adjust those two cases to match.

There are a few other places where the same idea might be applied, but only
after substantial code rearrangement, so I didn't bother.
2012-05-10 00:54:56 -04:00
Tom Lane
d3ae406f54 Further tweaking of nomenclature in checkpointer.c.
Get rid of some more naming choices that only make sense if you know that
this code used to be in the bgwriter, as well as some stray comments
referencing the bgwriter.
2012-05-10 00:01:10 -04:00
Tom Lane
6308ba05a7 Improve control logic for bgwriter hibernation mode.
Commit 6d90eaaa89 added a hibernation mode
to the bgwriter to reduce the server's idle-power consumption.  However,
its interaction with the detailed behavior of BgBufferSync's feedback
control loop wasn't very well thought out.  That control loop depends
primarily on the rate of buffer allocation, not the rate of buffer
dirtying, so the hibernation mode has to be designed to operate only when
no new buffer allocations are happening.  Also, the check for whether the
system is effectively idle was not quite right and would fail to detect
a constant low level of activity, thus allowing the bgwriter to go into
hibernation mode in a way that would let the cycle time vary quite a bit,
possibly further confusing the feedback loop.  To fix, move the wakeup
support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into
StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode
unless no buffer allocations have happened recently.

In addition, fix the delaying logic to remove the problem of possibly not
responding to signals promptly, which was basically caused by trying to use
the process latch's is_set flag for multiple purposes.  I can't prove it
but I'm suspicious that that hack was responsible for the intermittent
"postmaster does not shut down" failures we've been seeing in the buildfarm
lately.  In any case it did nothing to improve the readability or
robustness of the code.

In passing, express the hibernation sleep time as a multiplier on
BgWriterDelay, not a constant.  I'm not sure whether there's any value in
exposing the longer sleep time as an independently configurable setting,
but we can at least make it act like this for little extra code.
2012-05-09 23:37:10 -04:00
Peter Eisentraut
5d39807a00 Add make dependency so that postgres.bki is rebuilt in major version change
Every time since the current rule for postgres.bki was put in place
when we change the major version, people complain that their tests
fail in strange ways.  This is because the version number in
postgres.bki is not updated, because it has no dependency for that.
And you can't even force the rebuild manually if you don't happen to
know which file has the problem.  Fix that now before it will happen
again.

The only remaining problem with switching major versions, as far as
the regression tests are concerned, is that contrib needs to be
rebuilt.  But that's easily invoked, and in any case the failure modes
are more friendly if you forget that.
2012-05-09 20:45:56 +03:00
Simon Riggs
8f28789bff Rename BgWriterShmem/Request to CheckpointerShmem/Request 2012-05-09 14:23:45 +01:00
Simon Riggs
bbd3ec9dce Rename BgWriterCommLock to CheckpointerCommLock 2012-05-09 14:11:48 +01:00
Simon Riggs
5829387381 Avoid xid error from age() function when run on Hot Standby 2012-05-09 13:56:24 +01:00
Tom Lane
acd4c7d58b Fix an issue in recent walwriter hibernation patch.
Users of asynchronous-commit mode expect there to be a guaranteed maximum
delay before an async commit's WAL records get flushed to disk.  The
original version of the walwriter hibernation patch broke that.  Add an
extra shared-memory flag to allow async commits to kick the walwriter out
of hibernation mode, without adding any noticeable overhead in cases where
no action is needed.
2012-05-08 23:06:40 -04:00
Tom Lane
49340037ee Reduce idle power consumption of stats collector process.
Latch-ify the stats collector, so that it does not need an arbitrary wakeup
cycle to check for postmaster death.  The incremental savings in idle power
is pretty marginal, since we only had it waking every two seconds; but I
believe that this patch may also improve the collector's performance under
load, by reducing the number of kernel calls made per message when messages
are arriving constantly (we now avoid a select/poll call except when we
need to sleep).  The change also reduces the time needed for a normal
database shutdown on platforms where signals don't interrupt select().
2012-05-08 21:26:46 -04:00
Tom Lane
5461564a9d Reduce idle power consumption of walwriter and checkpointer processes.
This patch modifies the walwriter process so that, when it has not found
anything useful to do for many consecutive wakeup cycles, it extends its
sleep time to reduce the server's idle power consumption.  It reverts to
normal as soon as it's done any successful flushes.  It's still true that
during any async commit, backends check for completed, unflushed pages of
WAL and signal the walwriter if there are any; so that in practice the
walwriter can get awakened and returned to normal operation sooner than the
sleep time might suggest.

Also, improve the checkpointer so that it uses a latch and a computed delay
time to not wake up at all except when it has something to do, replacing a
previous hardcoded 0.5 sec wakeup cycle.  This also is primarily useful for
reducing the server's power consumption when idle.

In passing, get rid of the dedicated latch for signaling the walwriter in
favor of using its procLatch, since that comports better with possible
generic signal handlers using that latch.  Also, fix a pre-existing bug
with failure to save/restore errno in walwriter's signal handlers.

Peter Geoghegan, somewhat simplified by Tom
2012-05-08 20:03:26 -04:00
Magnus Hagander
916d589a10 Make "unexpected EOF" messages DEBUG1 unless in an open transaction
"Unexpected EOF on client connection" without an open transaction
is mostly noise, so turn it into DEBUG1. With an open transaction it's
still indicating a problem, so keep those as ERROR, and change the message
to indicate that it happened in a transaction.
2012-05-07 18:50:44 +02:00
Tom Lane
71b9549d05 Overdue code review for transaction-level advisory locks patch.
Commit 62c7bd31c8 had assorted problems, most
visibly that it broke PREPARE TRANSACTION in the presence of session-level
advisory locks (which should be ignored by PREPARE), as per a recent
complaint from Stephen Rees.  More abstractly, the patch made the
LockMethodData.transactional flag not merely useless but outright
dangerous, because in point of fact that flag no longer tells you anything
at all about whether a lock is held transactionally.  This fix therefore
removes that flag altogether.  We now rely entirely on the convention
already in use in lock.c that transactional lock holds must be owned by
some ResourceOwner, while session holds are never so owned.  Setting the
locallock struct's owner link to NULL thus denotes a session hold, and
there is no redundant marker for that.

PREPARE TRANSACTION now works again when there are session-level advisory
locks, and it is also able to transfer transactional advisory locks to the
prepared transaction, but for implementation reasons it throws an error if
we hold both types of lock on a single lockable object.  Perhaps it will be
worth improving that someday.

Assorted other minor cleanup and documentation editing, as well.

Back-patch to 9.1, except that in the 9.1 branch I did not remove the
LockMethodData.transactional flag for fear of causing an ABI break for
any external code that might be examining those structs.
2012-05-04 17:44:31 -04:00
Bruce Momjian
ebcaa5fcde Remove BSD/OS (BSDi) port. There are no known users upgrading to
Postgres 9.2, and perhaps no existing users either.
2012-05-03 10:58:44 -04:00
Peter Eisentraut
e9605a039b Even more duplicate word removal, in the spirit of the season 2012-05-02 20:56:03 +03:00
Robert Haas
0038110421 Avoid repeated CLOG access from heap_hot_search_buffer.
At the time we check whether the tuple is dead to all running
transactions, we've already verified that it isn't visible to our
scan, setting hint bits if appropriate.  So there's no need to
recheck CLOG for the all-dead test we do just a moment later.
So, add HeapTupleIsSurelyDead() to test the appropriate condition
under the assumption that all relevant hit bits are already set.

Review by Tom Lane.
2012-05-02 12:40:07 -04:00
Robert Haas
1b4998fd44 Further corrections from the department of redundancy department.
Thom Brown
2012-05-02 11:11:25 -04:00
Robert Haas
e01e66f808 More duplicate word removal. 2012-05-02 09:28:16 -04:00
Heikki Linnakangas
f291ccd43e Remove duplicate words in comments.
Found these with grep -r "for for ".
2012-05-02 10:20:27 +03:00
Tom Lane
50c2d6a1a6 Kill some remaining references to SVR4 and univel.
Both terms still appear in a few places, but I thought it best to leave
those alone in context.
2012-05-02 00:29:17 -04:00
Peter Eisentraut
f2f9439fbf Remove dead ports
Remove the following ports:

- dgux
- nextstep
- sunos4
- svr4
- ultrix4
- univel

These are obsolete and not worth rescuing.  In most cases, there is
circumstantial evidence that they wouldn't work anymore anyway.
2012-05-01 22:11:12 +03:00
Tom Lane
809e7e21af Converge all SQL-level statistics timing values to float8 milliseconds.
This patch adjusts the core statistics views to match the decision already
taken for pg_stat_statements, that values representing elapsed time should
be represented as float8 and measured in milliseconds.  By using float8,
we are no longer tied to a specific maximum precision of timing data.
(Internally, it's still microseconds, but we could now change that without
needing changes at the SQL level.)

The columns affected are
pg_stat_bgwriter.checkpoint_write_time
pg_stat_bgwriter.checkpoint_sync_time
pg_stat_database.blk_read_time
pg_stat_database.blk_write_time
pg_stat_user_functions.total_time
pg_stat_user_functions.self_time
pg_stat_xact_user_functions.total_time
pg_stat_xact_user_functions.self_time

The first four of these are new in 9.2, so there is no compatibility issue
from changing them.  The others require a release note comment that they
are now double precision (and can show a fractional part) rather than
bigint as before; also their underlying statistics functions now match
the column definitions, instead of returning bigint microseconds.
2012-04-30 14:03:33 -04:00
Robert Haas
0d2235a25b Remove duplicate word in comment.
Noted by Peter Geoghegan.
2012-04-30 13:14:46 -04:00
Tom Lane
1dd89eadcd Rename I/O timing statistics columns to blk_read_time and blk_write_time.
This seems more consistent with the pre-existing choices for names of
other statistics columns.  Rename assorted internal identifiers to match.
2012-04-29 18:13:33 -04:00
Tom Lane
309c64745e Rename track_iotiming GUC to track_io_timing.
This spelling seems significantly more readable to me.
2012-04-29 16:23:54 -04:00
Peter Eisentraut
81107282a5 Change return type of ExceptionalCondition to void and mark it noreturn
In ancient times, it was thought that this wouldn't work because of
TrapMacro/AssertMacro, but changing those to use a comma operator
appears to work without compiler warnings.
2012-04-29 21:20:14 +03:00
Tom Lane
cdbad241f4 Clear I/O timing counters after sending them to the stats collector.
This oversight caused the reported times to accumulate in an O(N^2)
fashion the longer a backend runs.
2012-04-28 15:11:13 -04:00
Tom Lane
d6f7d4fdc5 Fix printing of whole-row Vars at top level of a SELECT targetlist.
Normally whole-row Vars are printed as "tabname.*".  However, that does not
work at top level of a targetlist, because per SQL standard the parser will
think that the "*" should result in column-by-column expansion; which is
not at all what a whole-row Var implies.  We used to just print the table
name in such cases, which works most of the time; but it fails if the table
name matches a column name available anywhere in the FROM clause.  This
could lead for instance to a view being interpreted differently after dump
and reload.  Adding parentheses doesn't fix it, but there is a reasonably
simple kluge we can use instead: attach a no-op cast, so that the "*" isn't
syntactically at top level anymore.  This makes the printing of such
whole-row Vars a lot more consistent with other Vars, and may indeed fix
more cases than just the reported one; I'm suspicious that cases involving
schema qualification probably didn't work properly before, either.

Per bug report and fix proposal from Abbas Butt, though this patch is quite
different in detail from his.

Back-patch to all supported versions.
2012-04-27 19:49:18 -04:00
Tom Lane
537b266953 Fix syslogger's rotation disable/re-enable logic.
If it fails to open a new log file, the syslogger assumes there's something
wrong with its parameters (such as log_directory), and stops attempting
automatic time-based or size-based log file rotations.  Sending it SIGHUP
is supposed to start that up again.  However, the original coding for that
was really bogus, involving clobbering a couple of GUC variables and hoping
that SIGHUP processing would restore them.  Get rid of that technique in
favor of maintaining a separate flag showing we've turned rotation off.
Per report from Mark Kirkwood.

Also, the syslogger will automatically attempt to create the log_directory
directory if it doesn't exist, but that was only happening at startup.
For consistency and ease of use, it should do the same whenever the value
of log_directory is changed by SIGHUP.

Back-patch to all supported branches.
2012-04-27 00:12:42 -04:00
Robert Haas
3424bff90f Prevent index-only scans from returning wrong answers under Hot Standby.
The alternative of disallowing index-only scans in HS operation was
discussed, but the consensus was that it was better to treat marking
a page all-visible as a recovery conflict for snapshots that could still
fail to see XIDs on that page.  We may in the future try to soften this,
so that we simply force index scans to do heap fetches in cases where
this may be an issue, rather than throwing a hard conflict.
2012-04-26 20:00:21 -04:00
Tom Lane
7c85aa39fc Fix oversight in recent parameterized-path patch.
bitmap_scan_cost_est() has to be able to cope with a BitmapOrPath, but
I'd taken a shortcut that didn't work for that case.  Noted by Heikki.
Add some regression tests since this area is evidently under-covered.
2012-04-26 14:17:44 -04:00
Tom Lane
9fa82c9809 Fix planner's handling of RETURNING lists in writable CTEs.
setrefs.c failed to do "rtoffset" adjustment of Vars in RETURNING lists,
which meant they were left with the wrong varnos when the RETURNING list
was in a subquery.  That was never possible before writable CTEs, of
course, but now it's broken.  The executor fails to notice any problem
because ExecEvalVar just references the ecxt_scantuple for any normal
varno; but EXPLAIN breaks when the varno is wrong, as illustrated in a
recent complaint from Bartosz Dmytrak.

Since the eventual rtoffset of the subquery is not known at the time
we are preparing its plan node, the previous scheme of executing
set_returning_clause_references() at that time cannot handle this
adjustment.  Fortunately, it turns out that we don't really need to do it
that way, because all the needed information is available during normal
setrefs.c execution; we just have to dig it out of the ModifyTable node.
So, do that, and get rid of the kluge of early setrefs processing of
RETURNING lists.  (This is a little bit of a cheat in the case of inherited
UPDATE/DELETE, because we are not passing a "root" struct that corresponds
exactly to what the subplan was built with.  But that doesn't matter, and
anyway this is less ugly than early setrefs processing was.)

Back-patch to 9.1, where the problem became possible to hit.
2012-04-25 20:20:33 -04:00
Tom Lane
9873001e6d Another trivial comment-typo fix. 2012-04-25 14:28:58 -04:00
Robert Haas
3ce7f18e92 Casts to or from a domain type are ignored; warn and document.
Prohibiting this outright would break dumps taken from older versions
that contain such casts, which would create far more pain than is
justified here.

Per report by Jaime Casanova and subsequent discussion.
2012-04-24 09:20:53 -04:00
Robert Haas
5d4b60f2f2 Lots of doc corrections.
Josh Kupershmidt
2012-04-23 22:43:09 -04:00
Robert Haas
7ab9b2f3b7 Rearrange lazy_scan_heap to avoid visibility map race conditions.
We must set the visibility map bit before releasing our exclusive lock
on the heap page; otherwise, someone might clear the heap page bit
before we set the visibility map bit, leading to a situation where the
visibility map thinks the page is all-visible but it's really not.

This problem has existed since 8.4, but it wasn't critical before we
had index-only scans, since the worst case scenario was that the page
wouldn't get vacuumed until the next scan_all vacuum.

Along the way, a couple of minor, related improvements: (1) if we
pause the heap scan to do an index vac cycle, release any visibility
map page we're holding, since really long-running pins are not good
for a variety of reasons; and (2) warn if we see a page that's marked
all-visible in the visibility map but not on the page level, since
that should never happen any more (it was allowed in previous
releases, but not in 9.2).
2012-04-23 22:08:06 -04:00
Robert Haas
85efd5f065 Reduce hash size for compute_array_stats, compute_tsvector_stats.
The size is only a hint, but a big hint chews up a lot of memory without
apparently improving performance much.

Analysis and patch by Noah Misch.
2012-04-23 22:05:41 -04:00
Peter Eisentraut
48658a1b81 Fix some typos
Josh Kupershmidt
2012-04-22 19:23:47 +03:00
Tom Lane
33e99153e9 Use fuzzy not exact cost comparison for the final tie-breaker in add_path.
Instead of an exact cost comparison, use a fuzzy comparison with 1e-10
delta after all other path metrics have proved equal.  This is to avoid
having platform-specific roundoff behaviors determine the choice when
two paths are really the same to our cost estimators.  Adjust the
recently-added test case that made it obvious we had a problem here.
2012-04-21 00:51:14 -04:00
Alvaro Herrera
09ff76fcdb Recast "ONLY" column CHECK constraints as NO INHERIT
The original syntax wasn't universally loved, and it didn't allow its
usage in CREATE TABLE, only ALTER TABLE.  It now works everywhere, and
it also allows using ALTER TABLE ONLY to add an uninherited CHECK
constraint, per discussion.

The pg_constraint column has accordingly been renamed connoinherit.

This commit partly reverts some of the changes in
61d81bd28d, particularly some pg_dump and
psql bits, because now pg_get_constraintdef includes the necessary NO
INHERIT within the constraint definition.

Author: Nikhil Sontakke
Some tweaks by me
2012-04-20 23:56:57 -03:00
Tom Lane
1f03630011 Adjust join_search_one_level's handling of clauseless joins.
For an initial relation that lacks any join clauses (that is, it has to be
cartesian-product-joined to the rest of the query), we considered only
cartesian joins with initial rels appearing later in the initial-relations
list.  This creates an undesirable dependency on FROM-list order.  We would
never fail to find a plan, but perhaps we might not find the best available
plan.  Noted while discussing the logic with Amit Kapila.

Improve the comments a bit in this area, too.

Arguably this is a bug fix, but given the lack of complaints from the
field I'll refrain from back-patching.
2012-04-20 20:10:46 -04:00
Tom Lane
5b7b5518d0 Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate.  We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.

In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage.  This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.

To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing.  This is required at both base scans and joins.  It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree.  Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 15:53:47 -04:00
Robert Haas
293ec33c32 Remove bogus comment from HeapTupleSatisfiesNow.
This has been wrong for a really long time.  We don't use two-phase
locking to protect against serialization anomalies.

Per discussion on pgsql-hackers about 2011-03-07; original report
by Dan Ports.
2012-04-18 11:50:45 -04:00
Robert Haas
4a6fab03f2 Finish rename of FastPathStrongLocks to FastPathStrongRelationLocks.
Commit 8e5ac74c12 tried to do this renaming,
but I relied on gcc to tell me where I needed to make changes, instead of
grep.

Noted by Jeff Davis.
2012-04-18 11:29:34 -04:00
Robert Haas
53c5b869b4 Tighten up error recovery for fast-path locking.
The previous code could cause a backend crash after BEGIN; SAVEPOINT a;
LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO
SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts
in other situations.

Report by Zoltán Böszörményi; patch review by Jeff Davis.
2012-04-18 11:17:30 -04:00
Robert Haas
ab77b2da8b Fix incorrect comment in SetBufferCommitInfoNeedsSave().
Noah Misch spotted the fact that the old comment is in fact incorrect, due
to memory ordering hazards.
2012-04-18 10:55:40 -04:00
Robert Haas
e93c0b820f After PageSetAllVisible, use MarkBufferDirty.
Previously, we used SetBufferCommitInfoNeedsSave, but that's really
intended for dirty-marks we can theoretically afford to lose, such as
hint bits.  As for 9.2, the PD_ALL_VISIBLE mustn't be lost in this
way, since we could then end up with a heap page that isn't
all-visible and a visibility map page that is all visible, causing
index-only scans to return wrong answers.
2012-04-18 10:49:37 -04:00
Robert Haas
b5eccaef2c Fix copyfuncs/equalfuncs support for ReassignOwnedStmt.
Noah Misch
2012-04-18 10:45:18 -04:00
Robert Haas
53bbc681ca Fix various infelicities in node functions.
Mostly, this consists of adding support for fields which exist in the
structure but aren't handled by copy/equal/outfuncs; but the create
foreign table case can actually produce garbage output.

Noah Misch
2012-04-18 10:43:16 -04:00
Heikki Linnakangas
fe546f3da6 Don't wait for the commit record to be replicated if we wrote no WAL.
When using synchronous replication, we waited for the commit record to be
replicated, but if we our transaction didn't write any other WAL records,
that's not required because we don't even flush the WAL locally to disk in
that case. This lead to long waits when committing a transaction that only
modified a temporary table. Bug spotted by Thom Brown.
2012-04-17 16:28:31 +03:00
Peter Eisentraut
a33fcd7e79 Fix typo
Kyotaro HORIGUCHI
2012-04-16 15:36:40 +03:00
Robert Haas
ea6a2d8d47 Rename synchronous_commit='write' to 'remote_write'.
Fujii Masao, per discussion on pgsql-hackers
2012-04-14 10:53:22 -04:00
Robert Haas
4a2d7ad76f pg_size_pretty(numeric)
The output of the new pg_xlog_location_diff function is of type numeric,
since it could theoretically overflow an int8 due to signedness; this
provides a convenient way to format such values.

Fujii Masao, with some beautification by me.
2012-04-14 08:07:25 -04:00
Tom Lane
e54b10a62d Remove the "last ditch" code path in join_search_one_level().
So far as I can tell, it is no longer possible for this heuristic to do
anything useful, because the new weaker definition of
have_relevant_joinclause means that any relation with a joinclause must be
considered joinable to at least one other relation.  It would still be
possible for the code block to be entered, for example if there are join
order restrictions that prevent any join of the current level from being
formed; but in that case it's just a waste of cycles to attempt to form
cartesian joins, since the restrictions will still apply.

Furthermore, IMO the existence of this code path can mask bugs elsewhere;
we would have noticed the problem with cartesian joins a lot sooner if
this code hadn't compensated for it in the simplest case.

Accordingly, let's remove it and see what happens.  I'm committing this
separately from the prerequisite changes in have_relevant_joinclause,
just to make the question easier to revisit if there is some fault in
my logic.
2012-04-13 16:07:18 -04:00
Tom Lane
e3ffd05b02 Weaken the planner's tests for relevant joinclauses.
We should be willing to cross-join two small relations if that allows us
to use an inner indexscan on a large relation (that is, the potential
indexqual for the large table requires both smaller relations).  This
worked in simple cases but fell apart as soon as there was a join clause
to a fourth relation, because the existence of any two-relation join clause
caused the planner to not consider clauseless joins between other base
relations.  The added regression test shows an example case adapted from
a recent complaint from Benoit Delbosc.

Adjust have_relevant_joinclause, have_relevant_eclass_joinclause, and
has_relevant_eclass_joinclause to consider that a join clause mentioning
three or more relations is sufficient grounds for joining any subset of
those relations, even if we have to do so via a cartesian join.  Since such
clauses are relatively uncommon, this shouldn't affect planning speed on
typical queries; in fact it should help a bit, because the latter two
functions in particular get significantly simpler.

Although this is arguably a bug fix, I'm not going to risk back-patching
it, since it might have currently-unforeseen consequences.
2012-04-13 16:07:17 -04:00
Peter Eisentraut
c0cc526e8b Rename bytea_agg to string_agg and add delimiter argument
Per mailing list discussion, we would like to keep the bytea functions
parallel to the text functions, so rename bytea_agg to string_agg,
which already exists for text.

Also, to satisfy the rule that we don't want aggregate functions of
the same name with a different number of arguments, add a delimiter
argument, just like string_agg for text already has.
2012-04-13 21:36:59 +03:00
Peter Eisentraut
64e1309c76 Consistently quote encoding and locale names in messages 2012-04-13 20:37:07 +03:00
Robert Haas
61167bfaf2 Fix typo in comment. 2012-04-13 08:54:13 -04:00
Robert Haas
5630eddf1e Update lazy_scan_heap header comment.
The previous comment described how things worked in PostgreSQL 8.2
and prior.
2012-04-13 08:51:19 -04:00
Tom Lane
732bfa2448 Fix cost estimation for indexscan filter conditions.
cost_index's method for estimating per-tuple costs of evaluating filter
conditions (a/k/a qpquals) was completely wrong in the presence of derived
indexable conditions, such as range conditions derived from a LIKE clause.
This was largely masked in common cases as a result of all simple operator
clauses having about the same costs, but it could show up in a big way when
dealing with functional indexes containing expensive functions, as seen for
example in bug #6579 from Istvan Endredy.  Rejigger the calculation to give
sane answers when the indexquals aren't a subset of the baserestrictinfo
list.  As a side benefit, we now do the calculation properly for cases
involving join clauses (ie, parameterized indexscans), which we always
overestimated before.

There are still cases where this is an oversimplification, such as clauses
that can be dropped because they are implied by a partial index's
predicate.  But we've never accounted for that in cost estimates before,
and I'm not convinced it's worth the cycles to try to do so.
2012-04-11 20:24:17 -04:00
Tom Lane
880bfc3287 Silently ignore any nonexistent schemas that are listed in search_path.
Previously we attempted to throw an error or at least warning for missing
schemas, but this was done inconsistently because of implementation
restrictions (in many cases, GUC settings are applied outside transactions
so that we can't do system catalog lookups).  Furthermore, there were
exceptions to the rule even in the beginning, and we'd been poking more
and more holes in it as time went on, because it turns out that there are
lots of use-cases for having some irrelevant items in a common search_path
value.  It seems better to just adopt a philosophy similar to what's always
been done with Unix PATH settings, wherein nonexistent or unreadable
directories are silently ignored.

This commit also fixes the documentation to point out that schemas for
which the user lacks USAGE privilege are silently ignored.  That's always
been true but was previously not documented.

This is mostly in response to Robert Haas' complaint that 9.1 started to
throw errors or warnings for missing schemas in cases where prior releases
had not.  We won't adopt such a significant behavioral change in a back
branch, so something different will be needed in 9.1.
2012-04-11 12:02:50 -04:00
Tom Lane
3769fa5fc6 Make pg_tablespace_location(0) return the database's default tablespace.
This definition is convenient when applying the function to the
reltablespace column of pg_class, since that's what zero means there;
and it doesn't interfere with any other plausible use of the function.
Per gripe from Bruce Momjian.
2012-04-10 21:43:14 -04:00
Tom Lane
0d9819f7e3 Measure epoch of timestamp-without-time-zone from local not UTC midnight.
This patch reverts commit 191ef2b407
and thereby restores the pre-7.3 behavior of EXTRACT(EPOCH FROM
timestamp-without-tz).  Per discussion, the more recent behavior was
misguided on a couple of grounds: it makes it hard to get a
non-timezone-aware epoch value for a timestamp, and it makes this one
case dependent on the value of the timezone GUC, which is incompatible
with having timestamp_part() labeled as immutable.

The other behavior is still available (in all releases) by explicitly
casting the timestamp to timestamp with time zone before applying EXTRACT.

This will need to be called out as an incompatible change in the 9.2
release notes.  Although having mutable behavior in a function marked
immutable is clearly a bug, we're not going to back-patch such a change.
2012-04-10 12:04:42 -04:00
Tom Lane
65fd91333e Fix an Assert that turns out to be reachable after all.
estimate_num_groups() gets unhappy with
	create table empty();
	select * from empty except select * from empty e2;
I can't see any actual use-case for such a query (and the table is illegal
per SQL spec), but it seems like a good idea that it not cause an assert
failure.
2012-04-09 11:58:24 -04:00
Tom Lane
d515365a61 Don't bother copying empty support arrays in a zero-column MergeJoin.
The case could not arise when this code was originally written, but it can
now (since we made zero-column MergeJoins work for the benefit of FULL JOIN
ON TRUE).  I don't think there is any actual bug here, but we might as well
treat it consistently with other uses of COPY_POINTER_FIELD().  Per comment
from Ashutosh Bapat.
2012-04-09 11:41:54 -04:00
Robert Haas
3ae5133b1c Teach SLRU code to avoid replacing I/O-busy pages.
Patch by me; review by Tom Lane and others.
2012-04-08 23:05:55 -04:00
Heikki Linnakangas
03529a3ff9 set_stack_base() no longer needs to be called in PostgresMain.
This was a thinko in previous commit. Now that stack base pointer is now set
in PostmasterMain and SubPostmasterMain, it doesn't need to be set in
PostgresMain anymore.
2012-04-08 19:39:12 +03:00
Heikki Linnakangas
ef3883d130 Do stack-depth checking in all postmaster children.
We used to only initialize the stack base pointer when starting up a regular
backend, not in other processes. In particular, autovacuum workers can run
arbitrary user code, and without stack-depth checking, infinite recursion
in e.g an index expression will bring down the whole cluster.

The comment about PL/Java using set_stack_base() is not yet true. As the
code stands, PL/java still modifies the stack_base_ptr variable directly.
However, it's been discussed in the PL/Java mailing list that it should be
changed to use the function, because PL/Java is currently oblivious to the
register stack used on Itanium. There's another issues with PL/Java, namely
that the stack base pointer it sets is not really the base of the stack, it
could be something close to the bottom of the stack. That's a separate issue
that might need some further changes to this code, but that's a different
story.

Backpatch to all supported releases.
2012-04-08 19:07:55 +03:00
Tom Lane
7feecedcce Fix incorrect make maintainer-clean rule. 2012-04-07 18:16:50 -04:00
Tom Lane
95b9c333b2 Further adjustment of comment about qsort_tuple. 2012-04-07 17:48:40 -04:00
Tom Lane
a25ef7a5f6 Remove useless variable to suppress compiler warning. 2012-04-07 16:44:43 -04:00
Tom Lane
0ab4db52c0 Fix misleading output from gin_desc().
XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed
with a list link field labeled as "blkno", which was confusing, especially
when the link was empty (InvalidBlockNumber).  Print the metapage block
number instead, since that's what's actually being updated.  We could
include the link values too as a separate field, but not clear it's worth
the trouble.

Back-patch to 8.4 where the dubious code was added.
2012-04-06 18:10:21 -04:00
Tom Lane
17b985b1a0 Fix broken comparetup_datum code.
Commit 337b6f5ecf contained the entirely
fanciful assumption that it had made comparetup_datum unreachable.
Reported and patched by Takashi Yamamoto.

Fix up some not terribly accurate/useful comments from that commit, too.
2012-04-06 16:58:50 -04:00
Tom Lane
cea49fe82f Dept of second thoughts: improve the API for AnalyzeForeignTable.
If we make the initially-called function return the table physical-size
estimate, acquire_inherited_sample_rows will be able to use that to
allocate numbers of samples among child tables, when the day comes that
we want to support foreign tables in inheritance trees.
2012-04-06 16:04:10 -04:00
Tom Lane
263d9de66b Allow statistics to be collected for foreign tables.
ANALYZE now accepts foreign tables and allows the table's FDW to control
how the sample rows are collected.  (But only manual ANALYZEs will touch
foreign tables, for the moment, since among other things it's not very
clear how to handle remote permissions checks in an auto-analyze.)

contrib/file_fdw is extended to support this.

Etsuro Fujita, reviewed by Shigeru Hanada, some further tweaking by me.
2012-04-06 15:02:35 -04:00
Simon Riggs
8cb53654db Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLock 2012-04-06 10:21:40 +01:00
Robert Haas
21cc529698 checkopint -> checkpoint
Report by Guillaume Lelarge.
2012-04-05 21:37:33 -04:00
Robert Haas
b736aef2ec Publish checkpoint timing information to pg_stat_bgwriter.
Greg Smith, Peter Geoghegan, and Robert Haas
2012-04-05 14:04:37 -04:00
Robert Haas
644828908f Expose track_iotiming data via the statistics collector.
Ants Aasma's original patch to add timing information for buffer I/O
requests exposed this data at the relation level, which was judged too
costly.  I've here exposed it at the database level instead.
2012-04-05 11:40:24 -04:00
Tom Lane
c17e863bc7 Fix syslogger to not lose log coherency under high load.
The original coding of the syslogger had an arbitrary limit of 20 large
messages concurrently in progress, after which it would just punt and dump
message fragments to the output file separately.  Our ambitions are a bit
higher than that now, so allow the data structure to expand as necessary.

Reported and patched by Andrew Dunstan; some editing by Tom
2012-04-04 15:05:10 -04:00
Peter Eisentraut
38b9693fd9 Add support for renaming domain constraints 2012-04-03 08:11:51 +03:00
Simon Riggs
68219aaf6b Correct epoch of txid_current() when executed on a Hot Standby server.
Initialise ckptXidEpoch from starting checkpoint and maintain the correct
value as we roll forwards. This allows GetNextXidAndEpoch() to return the
correct epoch when executed during recovery. Backpatch to 9.0 when the
problem is first observable by a user.

Bug report from Daniel Farina
2012-03-29 14:55:30 +01:00
Andrew Dunstan
aeca650226 Unbreak Windows builds broken by pgpipe removal. 2012-03-29 04:11:57 -04:00
Heikki Linnakangas
5762a4d909 Inherit max_safe_fds to child processes in EXEC_BACKEND mode.
Postmaster sets max_safe_fds by testing how many open file descriptors it
can open, and that is normally inherited by all child processes at fork().
Not so on EXEC_BACKEND, ie. Windows, however. Because of that, we
effectively ignored max_files_per_process on Windows, and always assumed
a conservative default of 32 simultaneous open files. That could have an
impact on performance, if you need to access a lot of different files
in a query. After this patch, the value is passed to child processes by
save/restore_backend_variables() among many other global variables.

It has been like this forever, but given the lack of complaints about it,
I'm not backpatching this.
2012-03-29 08:19:11 +03:00
Andrew Dunstan
d2c1740dc2 Remove now redundant pgpipe code. 2012-03-28 23:24:07 -04:00
Tom Lane
5d3fcc4c2e Bend parse location rules for the convenience of pg_stat_statements.
Generally, the parse location assigned to a multiple-token construct is
the location of its leftmost token.  This commit breaks that rule for
the syntaxes TYPENAME 'LITERAL' and CAST(CONSTANT AS TYPENAME) --- the
resulting Const will have the location of the literal string, not the
typename or CAST keyword.  The cases where this matters are pretty thin on
the ground (no error messages in the regression tests change, for example),
and it's unlikely that any user would be confused anyway by an error cursor
pointing at the literal.  But still it's less than consistent.  The reason
for changing it is that contrib/pg_stat_statements wants to know the parse
location of the original literal, and it was agreed that this is the least
unpleasant way to preserve that information through parse analysis.

Peter Geoghegan
2012-03-27 15:17:41 -04:00
Tom Lane
a40fa613b5 Add some infrastructure for contrib/pg_stat_statements.
Add a queryId field to Query and PlannedStmt.  This is not used by the
core backend, except for being copied around at appropriate times.
It's meant to allow plug-ins to track a particular query forward from
parse analysis to execution.

The queryId is intentionally not dumped into stored rules (and hence this
commit doesn't bump catversion).  You could argue that choice either way,
but it seems better that stored rule strings not have any dependency
on plug-ins that might or might not be present.

Also, add a post_parse_analyze_hook that gets invoked at the end of
parse analysis (but only for top-level analysis of complete queries,
not cases such as analyzing a domain's default-value expression).
This is mainly meant to be used to compute and assign a queryId,
but it could have other applications.

Peter Geoghegan
2012-03-27 15:17:40 -04:00
Robert Haas
40b9b95769 New GUC, track_iotiming, to track I/O timings.
Currently, the only way to see the numbers this gathers is via
EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through
the stats collector and pg_stat_statements in subsequent patches.

Ants Aasma, reviewed by Greg Smith, with some further changes by me.
2012-03-27 14:55:02 -04:00
Peter Eisentraut
dcb33b1c64 Remove dead assignment
found by Coverity
2012-03-26 21:03:10 +03:00
Robert Haas
7386089d23 Code cleanup for heap_freeze_tuple.
It used to be case that lazy vacuum could call this function with only
a shared lock on the buffer, but neither lazy vacuum nor any other
code path does that any more.  Simplify the code accordingly and clean
up some related, obsolete comments.
2012-03-26 11:03:06 -04:00
Tom Lane
e8476f46fc Fix COPY FROM for null marker strings that correspond to invalid encoding.
The COPY documentation says "COPY FROM matches the input against the null
string before removing backslashes".  It is therefore reasonable to presume
that null markers like E'\\0' will work ... and they did, until someone put
the tests in the wrong order during microoptimization-driven rewrites.
Since then, we've been failing if the null marker is something that would
de-escape to an invalidly-encoded string.  Since null markers generally
need to be something that can't appear in the data, this represents a
nontrivial loss of functionality; surprising nobody noticed it earlier.

Per report from Jeff Davis.  Backpatch to 8.4 where this got broken.
2012-03-25 23:17:22 -04:00
Tom Lane
c7cea267de Replace empty locale name with implied value in CREATE DATABASE and initdb.
setlocale() accepts locale name "" as meaning "the locale specified by the
process's environment variables".  Historically we've accepted that for
Postgres' locale settings, too.  However, it's fairly unsafe to store an
empty string in a new database's pg_database.datcollate or datctype fields,
because then the interpretation could vary across postmaster restarts,
possibly resulting in index corruption and other unpleasantness.

Instead, we should expand "" to whatever it means at the moment of calling
CREATE DATABASE, which we can do by saving the value returned by
setlocale().

For consistency, make initdb set up the initial lc_xxx parameter values the
same way.  initdb was already doing the right thing for empty locale names,
but it did not replace non-empty names with setlocale results.  On a
platform where setlocale chooses to canonicalize the spellings of locale
names, this would result in annoying inconsistency.  (It seems that popular
implementations of setlocale don't do such canonicalization, which is a
pity, but the POSIX spec certainly allows it to be done.)  The same risk
of inconsistency leads me to not venture back-patching this, although it
could certainly be seen as a longstanding bug.

Per report from Jeff Davis, though this is not his proposed patch.
2012-03-25 21:47:22 -04:00
Tom Lane
8279eb4191 Fix planner's handling of outer PlaceHolderVars within subqueries.
For some reason, in the original coding of the PlaceHolderVar mechanism
I had supposed that PlaceHolderVars couldn't propagate into subqueries.
That is of course entirely possible.  When it happens, we need to treat
an outer-level PlaceHolderVar much like an outer Var or Aggref, that is
SS_replace_correlation_vars() needs to replace the PlaceHolderVar with
a Param, and then when building the finished SubPlan we have to provide
the PlaceHolderVar expression as an actual parameter for the SubPlan.
The handling of the contained expression is a bit delicate but it can be
treated exactly like an Aggref's expression.

In addition to the missing logic in subselect.c, prepjointree.c was failing
to search subqueries for PlaceHolderVars that need their relids adjusted
during subquery pullup.  It looks like everyplace else that touches
PlaceHolderVars got it right, though.

Per report from Mark Murawski.  In 9.1 and HEAD, queries affected by this
oversight would fail with "ERROR: Upper-level PlaceHolderVar found where
not expected".  But in 9.0 and 8.4, you'd silently get possibly-wrong
answers, since the value transmitted into the subquery wouldn't go to null
when it should.
2012-03-24 16:21:39 -04:00
Tom Lane
ed61127be4 Cast some printf arguments to avoid possibly-nonportable behavior.
Per compiler warnings on buildfarm member black_firefly.
2012-03-23 20:18:04 -04:00
Tom Lane
81a646febe Refactor simplify_function et al to centralize argument simplification.
We were doing the recursive simplification of function/operator arguments
in half a dozen different places, with rather baroque logic to ensure it
didn't get done multiple times on some arguments.  This patch improves that
by postponing argument simplification until after we've dealt with named
parameters and added any needed default expressions.

Marti Raudsepp, somewhat hacked on by me
2012-03-23 19:15:58 -04:00
Tom Lane
0339047bc9 Code review for protransform patches.
Fix loss of previous expression-simplification work when a transform
function fires: we must not simply revert to untransformed input tree.
Instead build a dummy FuncExpr node to pass to the transform function.
This has the additional advantage of providing a simpler, more uniform
API for transform functions.

Move documentation to a somewhat less buried spot, relocate some
poorly-placed code, be more wary of null constants and invalid typmod
values, add an opr_sanity check on protransform function signatures,
and some other minor cosmetic adjustments.

Note: although this patch touches pg_proc.h, no need for catversion
bump, because the changes are cosmetic and don't actually change the
intended catalog contents.
2012-03-23 17:29:57 -04:00
Peter Eisentraut
0e85abd658 Clean up compiler warnings from unused variables with asserts disabled
For those variables only used when asserts are enabled, use a new
macro PG_USED_FOR_ASSERTS_ONLY, which expands to
__attribute__((unused)) when asserts are not enabled.
2012-03-21 23:33:10 +02:00
Tom Lane
f70f095c90 Allow new relmapper entries when allow_system_table_mods is true.
This restores the pre-9.0 situation that it's possible to add new indexes
on pg_class and other mapped-but-not-shared catalogs, so long as you broke
the glass and flipped the big red Dont-Touch-Me switch.  As before, there
are a lot of gotchas, and you'd have to be pretty desperate to try this
on a production database; but there doesn't seem to be a reason for
relmapper.c to be preventing such things all by itself.  Per
experimentation with a case suggested by Cody Cutrer.
2012-03-21 14:09:39 -04:00
Robert Haas
aefa6d163e Add some CHECK_FOR_INTERRUPTS() calls to the heap-sort call path.
I broke this in commit 337b6f5ecf, which
among other things arranged for quicksorts to CHECK_FOR_INTERRUPTS()
slightly less frequently.  Sadly, it also arranged for heapsorts to
CHECK_FOR_INTERRUPTS() much less frequently.  Repair.
2012-03-20 21:26:39 -04:00
Tom Lane
9dbf2b7d75 Restructure SELECT INTO's parsetree representation into CreateTableAsStmt.
Making this operation look like a utility statement seems generally a good
idea, and particularly so in light of the desire to provide command
triggers for utility statements.  The original choice of representing it as
SELECT with an IntoClause appendage had metastasized into rather a lot of
places, unfortunately, so that this patch is a great deal more complicated
than one might at first expect.

In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS
subcommands required restructuring some EXPLAIN-related APIs.  Add-on code
that calls ExplainOnePlan or ExplainOneUtility, or uses
ExplainOneQuery_hook, will need adjustment.

Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO,
which formerly were accepted though undocumented, are no longer accepted.
The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE.
The CREATE RULE case doesn't seem to have much real-world use (since the
rule would work only once before failing with "table already exists"),
so we'll not bother with that one.

Both SELECT INTO and CREATE TABLE AS still return a command tag of
"SELECT nnnn".  There was some discussion of returning "CREATE TABLE nnnn",
but for the moment backwards compatibility wins the day.

Andres Freund and Tom Lane
2012-03-19 21:38:12 -04:00
Peter Eisentraut
693ff85d47 backend: Fix minor memory leak in configuration file processing
Just for consistency with the other code paths.

found by Coverity
2012-03-16 20:34:59 +02:00
Tom Lane
b67ad046e6 Improve commentary in match_pathkeys_to_index().
For a little while there I thought match_pathkeys_to_index() was broken
because it wasn't trying to match index columns to pathkeys in order.
Actually that's correct, because GiST can support ordering operators
on any random collection of index columns, but it sure needs a comment.
2012-03-16 14:07:21 -04:00
Tom Lane
dd4134ea56 Revisit handling of UNION ALL subqueries with non-Var output columns.
In commit 57664ed25e I tried to fix a bug
reported by Teodor Sigaev by making non-simple-Var output columns distinct
(by wrapping their expressions with dummy PlaceHolderVar nodes).  This did
not work too well.  Commit b28ffd0fcc fixed
some ensuing problems with matching to child indexes, but per a recent
report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
still broken, because constant-simplification didn't handle the injected
PlaceHolderVars well either.  On reflection, the original patch was quite
misguided: there is no reason to expect that EquivalenceClass child members
will be distinct.  So instead of trying to make them so, we should ensure
that we can cope with the situation when they're not.

Accordingly, this patch reverts the code changes in the above-mentioned
commits (though the regression test cases they added stay).  Instead, I've
added assorted defenses to make sure that duplicate EC child members don't
cause any problems.  Teodor's original problem ("MergeAppend child's
targetlist doesn't match MergeAppend") is addressed more directly by
revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
list guide creation of each child's sort list.

In passing, get rid of add_sort_column; as far as I can tell, testing for
duplicate sort keys at this stage is dead code.  Certainly it doesn't
trigger often enough to be worth expending cycles on in ordinary queries.
And keeping the test would've greatly complicated the new logic in
prepare_sort_from_pathkeys, because comparing pathkey list entries against
a previous output array requires that we not skip any entries in the list.

Back-patch to 9.1, like the previous patches.  The only known issue in
this area that wasn't caused by the ill-advised previous patches was the
MergeAppend planning failure, which of course is not relevant before 9.1.
It's possible that we need some of the new defenses against duplicate child
EC entries in older branches, but until there's some clear evidence of that
I'm going to refrain from back-patching further.
2012-03-16 13:11:55 -04:00
Peter Eisentraut
eb990a2b9e Add const qualifier to tzn returned by timestamp2tm()
The tzn value might come from tm->tm_zone, which libc declares as
const, so it's prudent that the upper layers know about this as well.
2012-03-15 21:17:19 +02:00
Peter Eisentraut
531e60aec0 Remove unused tzn arguments for timestamp2tm() 2012-03-15 21:13:35 +02:00
Peter Eisentraut
ad4fb0d0d2 Improve EncodeDateTime and EncodeTimeOnly APIs
Use an explicit argument to tell whether to include the time zone in
the output, rather than using some undocumented pointer magic.
2012-03-14 23:03:34 +02:00
Peter Eisentraut
6f018c6dda COPY: Add an assertion
This is for tools such as Coverity that don't know that the grammar
enforces that the case of not having a relation (but instead a query)
cannot happen in the FROM case.
2012-03-14 22:44:40 +02:00
Peter Eisentraut
e684ab5e1e Add additional safety check against invalid backup label file
It was already checking for invalid data after "BACKUP FROM", but
would possibly crash if "BACKUP FROM" was missing altogether.

found by Coverity
2012-03-14 22:41:50 +02:00
Tom Lane
b4af1c25bb Fix SPGiST vacuum algorithm to handle concurrent tuple motion properly.
A leaf tuple that we need to delete could get moved as a consequence of an
insertion happening concurrently with the VACUUM scan.  If it moves from a
page past the current scan point to a page before, we'll miss it, which is
not acceptable.  Hence, when we see a leaf-page REDIRECT that could have
been made since our scan started, chase down the redirection pointer much
as if we were doing a normal index search, and be sure to vacuum every page
it leads to.  This fixes the issue because, if the tuple was on page N at
the instant we start our scan, we will surely find it as a consequence of
chasing the redirect from page N, no matter how much it moves around in
between.  Problem noted by Takashi Yamamoto.
2012-03-12 16:10:28 -04:00
Peter Eisentraut
bad250f4f3 Use correct sizeof operand in qsort call
Probably no practical impact, since all pointers ought to have the
same size, but it was wrong nonetheless.  Found by Coverity.
2012-03-12 20:56:13 +02:00
Peter Eisentraut
c9f310d377 Add comment for missing break in switch
For clarity, following other sites, and to silence Coverity.
2012-03-12 20:55:09 +02:00
Tom Lane
c6be1f43ab Make INSERT/UPDATE queries depend on their specific target columns.
We have always created a whole-table dependency for the target relation,
but that's not really good enough, as it doesn't prevent scenarios such
as dropping an individual target column or altering its type.  So we
have to create an individual dependency for each target column, as well.

Per report from Bill MacArthur of a rule containing UPDATE breaking
after such an alteration.  Note that this patch doesn't try to make
such cases work, only to ensure that the attempted ALTER TABLE throws
an error telling you it can't cope with adjusting the rule.

This is a long-standing bug, but given the lack of prior reports
I'm not going to risk back-patching it.  A back-patch wouldn't do
anything to fix existing rules' dependency lists, anyway.
2012-03-11 18:14:23 -04:00
Tom Lane
c6a11b89e4 Teach SPGiST to store nulls and do whole-index scans.
This patch fixes the other major compatibility-breaking limitation of
SPGiST, that it didn't store anything for null values of the indexed
column, and so could not support whole-index scans or "x IS NULL"
tests.  The approach is to create a wholly separate search tree for
the null entries, and use fixed "allTheSame" insertion and search
rules when processing this tree, instead of calling the index opclass
methods.  This way the opclass methods do not need to worry about
dealing with nulls.

Catversion bump is for pg_am updates as well as the change in on-disk
format of SPGiST indexes; there are some tweaks in SPGiST WAL records
as well.

Heavily rewritten version of a patch by Oleg Bartunov and Teodor Sigaev.
(The original also stored nulls separately, but it reused GIN code to do
so; which required undesirable compromises in the on-disk format, and
would likely lead to bugs due to the GIN code being required to work in
two very different contexts.)
2012-03-11 16:29:59 -04:00
Peter Eisentraut
86947e666d Add more detail to error message for invalid arguments for server process
It now prints the argument that was at fault.

Also fix a small misbehavior where the error message issued by
getopt() would complain about a program named "--single", because
that's what argv[0] is in the server process.
2012-03-11 02:03:52 +02:00
Tom Lane
03e56f798e Restructure SPGiST opclass interface API to support whole-index scans.
The original API definition was incapable of supporting whole-index scans
because there was no way to invoke leaf-value reconstruction without
checking any qual conditions.  Also, it was inefficient for
multiple-qual-condition scans because value reconstruction got done over
again for each qual condition, and because other internal work in the
consistent functions likewise had to be done for each qual.  To fix these
issues, pass the whole scankey array to the opclass consistent functions,
instead of only letting them see one item at a time.  (Essentially, the
loop over scankey entries is now inside the consistent functions not
outside them.  This makes the consistent functions a bit more complicated,
but not unreasonably so.)

In itself this commit does nothing except save a few cycles in
multiple-qual-condition index scans, since we can't support whole-index
scans on SPGiST indexes until nulls are included in the index.  However,
I consider this a must-fix for 9.2 because once we release it will get
very much harder to change the opclass API definition.
2012-03-10 18:36:49 -05:00
Peter Eisentraut
39d74e346c Add support for renaming constraints
reviewed by Josh Berkus and Dimitri Fontaine
2012-03-10 20:19:13 +02:00
Robert Haas
07d1edb954 Extend object access hook framework to support arguments, and DROP.
This allows loadable modules to get control at drop time, perhaps for the
purpose of performing additional security checks or to log the event.
The initial purpose of this code is to support sepgsql, but other
applications should be possible as well.

KaiGai Kohei, reviewed by me.
2012-03-09 14:34:56 -05:00
Tom Lane
b14953932d Revise FDW planning API, again.
Further reflection shows that a single callback isn't very workable if we
desire to let FDWs generate multiple Paths, because that forces the FDW to
do all work necessary to generate a valid Plan node for each Path.  Instead
split the former PlanForeignScan API into three steps: GetForeignRelSize,
GetForeignPaths, GetForeignPlan.  We had already bit the bullet of breaking
the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain,
and it's substantially more flexible for complex FDWs.

Add an fdw_private field to RelOptInfo so that the new functions can save
state there rather than possibly having to recalculate information two or
three times.

In addition, we'd not thought through what would be needed to allow an FDW
to set up subexpressions of its choice for runtime execution.  We could
treat ForeignScan.fdw_private as an executable expression but that seems
likely to break existing FDWs unnecessarily (in particular, it would
restrict the set of node types allowable in fdw_private to those supported
by expression_tree_walker).  Instead, invent a separate field fdw_exprs
which will receive the postprocessing appropriate for expression trees.
(One field is enough since it can be a list of expressions; also, we assume
the corresponding expression state tree(s) will be held within fdw_state,
so we don't need to add anything to ForeignScanState.)

Per review of Hanada Shigeru's pgsql_fdw patch.  We may need to tweak this
further as we continue to work on that patch, but to me it feels a lot
closer to being right now.
2012-03-09 12:49:25 -05:00
Heikki Linnakangas
342baf4ce6 Update outdated comment. HeapTupleHeader.t_natts field doesn't exist anymore.
Kevin Grittner
2012-03-09 08:07:56 +02:00
Tom Lane
08dd23cec7 Fix some issues with temp/transient tables in extension scripts.
Phil Sorber reported that a rewriting ALTER TABLE within an extension
update script failed, because it creates and then drops a placeholder
table; the drop was being disallowed because the table was marked as an
extension member.  We could hack that specific case but it seems likely
that there might be related cases now or in the future, so the most
practical solution seems to be to create an exception to the general rule
that extension member objects can only be dropped by dropping the owning
extension.  To wit: if the DROP is issued within the extension's own
creation or update scripts, we'll allow it, implicitly performing an
"ALTER EXTENSION DROP object" first.  This will simplify cases such as
extension downgrade scripts anyway.

No docs change since we don't seem to have documented the idea that you
would need ALTER EXTENSION DROP for such an action to begin with.

Also, arrange for explicitly temporary tables to not get linked as
extension members in the first place, and the same for the magic
pg_temp_nnn schemas that are created to hold them.  This prevents assorted
unpleasant results if an extension script creates a temp table: the forced
drop at session end would either fail or remove the entire extension, and
neither of those outcomes is desirable.  Note that this doesn't fix the
ALTER TABLE scenario, since the placeholder table is not temp (unless the
table being rewritten is).

Back-patch to 9.1.
2012-03-08 15:53:09 -05:00
Heikki Linnakangas
d93f209f48 Silence warning about unused variable, when building without assertions. 2012-03-08 11:10:02 +02:00
Tom Lane
66a7e6bae9 Improve estimation of IN/NOT IN by assuming array elements are distinct.
In constructs such as "x IN (1,2,3,4)" and "x <> ALL(ARRAY[1,2,3,4])",
we formerly always used a general-purpose assumption that the probability
of success is independent for each comparison of "x" to an array element.
But in real-world usage of these constructs, that's a pretty poor
assumption; it's much saner to assume that the array elements are distinct
and so the match probabilities are disjoint.  Apply that assumption if the
operator appears to behave as equality (for ANY) or inequality (for ALL).
But fall back to the normal independent-probabilities calculation if this
yields an impossible result, ie probability > 1 or < 0.  We could protect
ourselves against bad estimates even more by explicitly checking for equal
array elements, but that is expensive and doesn't seem worthwhile: doing
it would amount to optimizing for poorly-written queries at the expense
of well-written ones.

Daniele Varrazzo and Tom Lane, after a suggestion by Ants Aasma
2012-03-07 22:59:49 -05:00
Tom Lane
9088d1b965 Add GetForeignColumnOptions() to foreign.c, and add some documentation.
GetForeignColumnOptions provides some abstraction for accessing
column-specific FDW options, on a par with the access functions that were
already provided here for other FDW-related information.

Adjust file_fdw.c to use GetForeignColumnOptions instead of equivalent
hand-rolled code.

In addition, add some SGML documentation for the functions exported by
foreign.c that are meant for use by FDW authors.

(This is the fdw_helper portion of the proposed pgsql_fdw patch.)

Hanada Shigeru, reviewed by KaiGai Kohei
2012-03-07 18:20:58 -05:00
Tom Lane
d4bf3c9c94 Expose an API for calculating catcache hash values.
Now that cache invalidation callbacks get only a hash value, and not a
tuple TID (per commits 632ae6829f and
b5282aa893), the only way they can restrict
what they invalidate is to know what the hash values mean.  setrefs.c was
doing this via a hard-wired assumption but that seems pretty grotty, and
it'll only get worse as more cases come up.  So let's expose a calculation
function that takes the same parameters as SearchSysCache.  Per complaint
from Marko Kreen.
2012-03-07 14:51:13 -05:00
Tom Lane
19dbc34631 Add a hook for processing messages due to be sent to the server log.
Use-cases for this include custom log filtering rules and custom log
message transmission mechanisms (for instance, lossy log message
collection, which has been discussed several times recently).

As is our common practice for hooks, there's no regression test nor
user-facing documentation for this, though the author did exhibit a
sample module using the hook.

Martin Pihlak, reviewed by Marti Raudsepp
2012-03-06 15:35:41 -05:00
Robert Haas
bc97c38115 Typo fix.
Fujii Masao
2012-03-06 08:23:51 -05:00
Heikki Linnakangas
e587e2e3e3 Make the comments more clear on the fact that UpdateFullPageWrites() is not
safe to call concurrently from multiple processes.
2012-03-06 10:45:58 +02:00
Heikki Linnakangas
7714c63829 Remove extra copies of LogwrtResult.
This simplifies the code a little bit. The new rule is that to update
XLogCtl->LogwrtResult, you must hold both WALWriteLock and info_lck, whereas
before we had two copies, one that was protected by WALWriteLock and another
protected by info_lck. The code that updates them was already holding both
locks, so merging the two is trivial.

The third copy, XLogCtl->Insert.LogwrtResult, was not totally redundant, it
was used in AdvanceXLInsertBuffer to update the backend-local copy, before
acquiring the info_lck to read the up-to-date value. But the value of that
seems dubious; at best it's saving one spinlock acquisition per completed
WAL page, which is not significant compared to all the other work involved.
And in practice, it's probably not saving even that much.
2012-03-06 10:18:33 +02:00
Heikki Linnakangas
3b682df326 Simplify the way changes to full_page_writes are logged.
It's harmless to do full page writes even when not strictly necessary, so
when turning full_page_writes on, we can set the global flag first, and then
call XLogInsert. Likewise, when turning it off, we can write the WAL record
first, and then clear the flag. This way XLogInsert doesn't need any special
handling of the XLOG_FPW_CHANGE record type. XLogInsert is complicated
enough already, so anything we can keep away from there is a good thing.

Actually I don't think the atomicity of the shared memory flag matters,
anyway, because we only write the XLOG_FPW_CHANGE at the end of recovery,
when there are no concurrent WAL insertions going on. But might as well make
it safe, in case we allow changing full_page_writes on the fly in the
future.
2012-03-06 09:48:30 +02:00
Tom Lane
6b289942bf Redesign PlanForeignScan API to allow multiple paths for a foreign table.
The original API specification only allowed an FDW to create a single
access path, which doesn't seem like a terribly good idea in hindsight.
Instead, move the responsibility for building the Path node and calling
add_path() into the FDW's PlanForeignScan function.  Now, it can do that
more than once if appropriate.  There is no longer any need for the
transient FdwPlan struct, so get rid of that.

Etsuro Fujita, Shigeru Hanada, Tom Lane
2012-03-05 16:15:59 -05:00
Tom Lane
80da9e68fd Rewrite GiST support code for rangetypes.
This patch installs significantly smarter penalty and picksplit functions
for ranges, making GiST indexes for them smaller and faster to search.

There is no on-disk format change, so no catversion bump, but you'd need
to REINDEX to get the benefits for any existing index.

Alexander Korotkov, reviewed by Jeff Davis
2012-03-04 22:50:06 -05:00
Tom Lane
e2eed78910 Remove useless "rough estimate" path from mcelem_array_contained_selec.
The code in this function that tried to cope with a missing count histogram
was quite ineffective for anything except a perfectly flat distribution.
Furthermore, since we were already punting for missing MCELEM slot, it's
rather useless to sweat over missing DECHIST: there are no cases where
ANALYZE will create the first but not the second.  So just simplify the
code by punting rather than pretending we can do something useful.
2012-03-04 16:03:38 -05:00
Tom Lane
4fb694aebc Improve histogram-filling loop in new compute_array_stats() code.
Do "frac" arithmetic in int64 to prevent overflow with large statistics
targets, and improve the comments so people have some chance of
understanding how it works.

Alexander Korotkov and Tom Lane
2012-03-04 15:40:16 -05:00
Magnus Hagander
141b89826d More carefully validate xlog location string inputs
Now that we have validate_xlog_location, call it from the previously
existing functions taking xlog locatoins as a string input.

Suggested by Fujii Masao
2012-03-04 12:25:47 +01:00
Magnus Hagander
bc5ac36865 Add function pg_xlog_location_diff to help comparisons
Comparing two xlog locations are useful for example when calculating
replication lag.

Euler Taveira de Oliveira, reviewed by Fujii Masao, and some cleanups
from me
2012-03-04 12:22:38 +01:00
Tom Lane
0e5e167aae Collect and use element-frequency statistics for arrays.
This patch improves selectivity estimation for the array <@, &&, and @>
(containment and overlaps) operators.  It enables collection of statistics
about individual array element values by ANALYZE, and introduces
operator-specific estimators that use these stats.  In addition,
ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)"
and "const <> ANY/ALL (array_column)" are estimated by treating them as
variants of the containment operators.

Since we still collect scalar-style stats about the array values as a
whole, the pg_stats view is expanded to show both these stats and the
array-style stats in separate columns.  This creates an incompatible change
in how stats for tsvector columns are displayed in pg_stats: the stats
about lexemes are now displayed in the array-related columns instead of the
original scalar-related columns.

There are a few loose ends here, notably that it'd be nice to be able to
suppress either the scalar-style stats or the array-element stats for
columns for which they're not useful.  But the patch is in good enough
shape to commit for wider testing.

Alexander Korotkov, reviewed by Noah Misch and Nathan Boley
2012-03-03 20:20:57 -05:00
Peter Eisentraut
b59ca98209 Allow CREATE TABLE (LIKE ...) from composite type
The only reason this didn't work before was that parserOpenTable()
rejects composite types.  So use relation_openrv() directly and
manually do the errposition() setup that parserOpenTable() does.
2012-03-03 16:03:05 +02:00
Tom Lane
44634e474f Allow child-relation entries to be made in ec_has_const EquivalenceClasses.
This fixes an oversight in commit 11cad29c91,
which introduced MergeAppend plans.  Before that happened, we never
particularly cared about the sort ordering of scans of inheritance child
relations, since appending their outputs together would destroy any
ordering anyway.  But now it's important to be able to match child relation
sort orderings to those of the surrounding query.  The original coding of
add_child_rel_equivalences skipped ec_has_const EquivalenceClasses, on the
originally-correct grounds that adding child expressions to them was
useless.  The effect of this is that when a parent variable is equated to
a constant, we can't recognize that index columns on the equivalent child
variables are not sort-significant; that is, we can't recognize that a
child index on, say, (x, y) is able to generate output in "ORDER BY y"
order when there is a clause "WHERE x = constant".  Adding child
expressions to the (x, constant) EquivalenceClass fixes this, without any
downside that I can see other than a few more planner cycles expended on
such queries.

Per recent gripe from Robert McGehee.  Back-patch to 9.1 where MergeAppend
was introduced.
2012-03-02 14:29:07 -05:00
Peter Eisentraut
6688d2878e Add COLLATION FOR expression
reviewed by Jaime Casanova
2012-03-02 21:12:16 +02:00
Heikki Linnakangas
2502f45979 When a GiST page is split during index build, it might not have a buffer.
Previously it was thought that it's impossible as the code stands, because
insertions create buffers as tuples are cascaded downwards, and index
split also creaters buffers eagerly for all halves. But the example from
Jay Levitt demonstrates that it can happen, when the root page is split.
It's in fact OK if the buffer doesn't exist, so we just need to remove the
sanity check. In fact, we've been discussing the possibility of destroying
empty buffers to conserve memory, which would render the sanity check
completely useless anyway.

Fix by Alexander Korotkov
2012-03-02 13:16:09 +02:00
Alvaro Herrera
3433c6ba00 Remove TOAST table from pg_database
The only toastable column now is datacl, but we don't really support
long ACLs anyway.  The TOAST table should have been removed when the
pg_db_role_setting catalog was introduced in commit
2eda8dfb52, but I forgot to do that.

Per -hackers discussion on March 2011.
2012-03-01 12:50:52 -03:00
Heikki Linnakangas
d6a7271958 Correctly detect SSI conflicts of prepared transactions after crash.
A prepared transaction can get new conflicts in and out after preparing, so
we cannot rely on the in- and out-flags stored in the statefile at prepare-
time. As a quick fix, make the conservative assumption that after a restart,
all prepared transactions are considered to have both in- and out-conflicts.
That can lead to unnecessary rollbacks after a crash, but that shouldn't be
a big problem in practice; you don't want prepared transactions to hang
around for a long time anyway.

Dan Ports
2012-02-29 15:42:36 +02:00
Tom Lane
5c02a00d44 Move CRC tables to libpgport, and provide them in a separate include file.
This makes it much more convenient to build tools for Postgres that are
separately compiled and require a matching CRC implementation.

To prevent multiple copies of the CRC polynomial tables being introduced
into the postgres binaries, they are now included in the static library
libpgport that is mainly meant for replacement system functions.  That
seems like a bit of a kludge, but there's no better place.

This cleans up building of the tools pg_controldata and pg_resetxlog,
which previously had to build their own copies of pg_crc.o.

In the future, external programs that need access to the CRC tables can
include the tables directly from the new header file pg_crc_tables.h.

Daniel Farina, reviewed by Abhijit Menon-Sen and Tom Lane
2012-02-28 19:53:39 -05:00
Tom Lane
0140a11b9b Fix thinko in new match_join_clauses_to_index() logic.
We don't need to constrain the other side of an indexable join clause to
not be below an outer join; an example here is

SELECT FROM t1 LEFT JOIN t2 ON t1.a = t2.b LEFT JOIN t3 ON t2.c = t3.d;

We can consider an inner indexscan on t3.d using c = d as indexqual, even
though t2.c is potentially nulled by a previous outer join.  The comparable
logic in orindxpath.c has always worked that way, but I was being overly
cautious here.
2012-02-28 18:10:40 -05:00
Peter Eisentraut
973e9fb294 Add const qualifiers where they are accidentally cast away
This only produces warnings under -Wcast-qual, but it's more correct
and consistent in any case.
2012-02-28 12:42:08 +02:00
Alvaro Herrera
cb3a7c2b95 ALTER TABLE: skip FK validation when it's safe to do so
We already skip rewriting the table in these cases, but we still force a
whole table scan to validate the data.  This can be skipped, and thus
we can make the whole ALTER TABLE operation just do some catalog touches
instead of scanning the table, when these two conditions hold:

(a) Old and new pg_constraint.conpfeqop match exactly.  This is actually
stronger than needed; we could loosen things by way of operator
families, but it'd require a lot more effort.

(b) The functions, if any, implementing a cast from the foreign type to
the primary opcintype are the same.  For this purpose, we can consider a
binary coercion equivalent to an exact type match.  When the opcintype
is polymorphic, require that the old and new foreign types match
exactly.  (Since ri_triggers.c does use the executor, the stronger check
for polymorphic types is no mere future-proofing.  However, no core type
exercises its necessity.)

Author: Noah Misch

Committer's note: catalog version bumped due to change of the Constraint
node.  I can't actually find any way to have such a node in a stored
rule, but given that we have "out" support for them, better be safe.
2012-02-27 19:10:24 -03:00
Peter Eisentraut
9bf8603c7a Call check_keywords.pl in maintainer-check
For that purpose, have check_keywords.pl print errors to stderr and
return a useful exit status.
2012-02-27 13:53:12 +02:00
Tom Lane
1b630751d0 Fix some more bugs in GIN's WAL replay logic.
In commit 4016bdef8a I fixed a bunch of
ginxlog.c bugs having to do with not handling XLogReadBuffer failures
correctly.  However, in ginRedoUpdateMetapage and ginRedoDeleteListPages,
I unaccountably thought that failure to read the metapage would be
impossible and just put in an elog(PANIC) call.  This is of course wrong:
failure is exactly what will happen if the index got dropped (or rebuilt)
between creation of the WAL record and the crash we're trying to recover
from.  I believe this explains Nicholas Wilson's recent report of these
errors getting reached.

Also, fix memory leak in forgetIncompleteSplit.  This wasn't of much
concern when the code was written, but in a long-running standby server
page split records could be expected to accumulate indefinitely.

Back-patch to 8.4 --- before that, GIN didn't have a metapage.
2012-02-26 15:12:17 -05:00
Peter Eisentraut
b5c077c368 Remove useless cast 2012-02-26 15:31:16 +02:00
Peter Eisentraut
66f0cf7da8 Remove useless const qualifier
Claiming that the typevar argument to DefineCompositeType() is const
was a plain lie.  A similar case in DefineVirtualRelation() was
already changed in passing in commit 1575fbcb.  Also clean up the now
unnecessary casts that used to cast away the const.
2012-02-26 15:22:27 +02:00
Tom Lane
4dd78bf37a Merge dissect() into cdissect() to remove a pile of near-duplicate code.
The "uncomplicated" case isn't materially less complicated than the full
case, certainly not enough so to justify duplicating nearly 500 lines
of code.  The only extra work being done in the full path is zaptreesubs,
which is very cheap compared to everything else being done here, and
besides that I'm less than convinced that it's not needed in some cases
even without backrefs.
2012-02-24 18:40:31 -05:00
Tom Lane
587359479a Avoid repeated creation/freeing of per-subre DFAs during regex search.
In nested sub-regex trees, lower-level nodes created DFAs and then
destroyed them again before exiting, which is a bit dumb considering that
the recursive search is likely to call those nodes again later.  Instead
cache each created DFA until the end of pg_regexec().  This is basically a
space for time tradeoff, in that it might increase the maximum memory
usage.  However, in most regex patterns there are not all that many subre
nodes, so not that many DFAs --- and in any case, the peak usage occurs
when reaching the bottom recursion level, and except for alternation cases
that's going to be the same anyway.
2012-02-24 18:40:30 -05:00
Tom Lane
3cbfe485e4 Remove useless "retry memory" logic within regex engine.
Apparently some primordial version of Spencer's engine needed cdissect()
and child functions to be able to continue matching from a previous
position when re-called.  That is dead code, though, since trivial
inspection shows that cdissect can never be entered without having
previously done zapmem which resets the relevant retry counter.  I have
also verified experimentally that no case in the Tcl regression tests
reaches cdissect with a nonzero retry value.  Accordingly, remove that
logic.  This doesn't really save any noticeable number of cycles in itself,
but it is one step towards making dissect() and cdissect() equivalent,
which will allow removing hundreds of lines of near-duplicated code.

Since struct subre's "retry" field is no longer particularly related to
any kind of retry, rename it to "id".  As of this commit it's only used
for identifying a subre node in debug printouts, so you might think we
should get rid of the field entirely; but I have a plan for another use.
2012-02-24 18:40:28 -05:00
Peter Eisentraut
9cfd800aab Add some enumeration commas, for consistency 2012-02-24 11:04:45 +02:00
Tom Lane
173e29aa5d Fix the general case of quantified regex back-references.
Cases where a back-reference is part of a larger subexpression that
is quantified have never worked in Spencer's regex engine, because
he used a compile-time transformation that neglected the need to
check the back-reference match in iterations before the last one.
(That was okay for capturing parens, and we still do it if the
regex has *only* capturing parens ... but it's not okay for backrefs.)

To make this work properly, we have to add an "iteration" node type
to the regex engine's vocabulary of sub-regex nodes.  Since this is a
moderately large change with a fair risk of introducing new bugs of its
own, apply to HEAD only, even though it's a fix for a longstanding bug.
2012-02-24 01:41:03 -05:00
Andrew Dunstan
0c9e5d5e0d Correctly handle NULLs in JSON output.
Error reported by David Wheeler.
2012-02-23 23:44:16 -05:00
Tom Lane
077711c2e3 Remove arbitrary limitation on length of common name in SSL certificates.
Both libpq and the backend would truncate a common name extracted from a
certificate at 32 bytes.  Replace that fixed-size buffer with dynamically
allocated string so that there is no hard limit.  While at it, remove the
code for extracting peer_dn, which we weren't using for anything; and
don't bother to store peer_cn longer than we need it in libpq.

This limit was not so terribly unreasonable when the code was written,
because we weren't using the result for anything critical, just logging it.
But now that there are options for checking the common name against the
server host name (in libpq) or using it as the user's name (in the server),
this could result in undesirable failures.  In the worst case it even seems
possible to spoof a server name or user name, if the correct name is
exactly 32 bytes and the attacker can persuade a trusted CA to issue a
certificate in which that string is a prefix of the certificate's common
name.  (To exploit this for a server name, he'd also have to send the
connection astray via phony DNS data or some such.)  The case that this is
a realistic security threat is a bit thin, but nonetheless we'll treat it
as one.

Back-patch to 8.4.  Older releases contain the faulty code, but it's not
a security problem because the common name wasn't used for anything
interesting.

Reported and patched by Heikki Linnakangas

Security: CVE-2012-0867
2012-02-23 15:48:04 -05:00
Tom Lane
891e6e7bfd Require execute permission on the trigger function for CREATE TRIGGER.
This check was overlooked when we added function execute permissions to the
system years ago.  For an ordinary trigger function it's not a big deal,
since trigger functions execute with the permissions of the table owner,
so they couldn't do anything the user issuing the CREATE TRIGGER couldn't
have done anyway.  However, if a trigger function is SECURITY DEFINER,
that is not the case.  The lack of checking would allow another user to
install it on his own table and then invoke it with, essentially, forged
input data; which the trigger function is unlikely to realize, so it might
do something undesirable, for instance insert false entries in an audit log
table.

Reported by Dinesh Kumar, patch by Robert Haas

Security: CVE-2012-0866
2012-02-23 15:38:56 -05:00
Peter Eisentraut
c9d7004440 Remove inappropriate quotes
And adjust wording for consistency.
2012-02-23 12:52:17 +02:00
Peter Eisentraut
8251670cb3 Fix build without OpenSSL
This is a fixup for commit a445cb92ef.
2012-02-23 10:20:25 +02:00
Robert Haas
2254367435 Make EXPLAIN (BUFFERS) track blocks dirtied, as well as those written.
Also expose the new counters through pg_stat_statements.

Patch by me.  Review by Fujii Masao and Greg Smith.
2012-02-22 20:33:05 -05:00
Robert Haas
f74f9a277c Fix typo in comment.
Sandro Santilli
2012-02-22 19:46:12 -05:00
Peter Eisentraut
a445cb92ef Add parameters for controlling locations of server-side SSL files
This allows changing the location of the files that were previously
hard-coded to server.crt, server.key, root.crt, root.crl.

server.crt and server.key continue to be the default settings and are
thus required to be present by default if SSL is enabled.  But the
settings for the server-side CA and CRL are now empty by default, and
if they are set, the files are required to be present.  This replaces
the previous behavior of ignoring the functionality if the files were
not found.
2012-02-22 23:40:46 +02:00
Alvaro Herrera
a417f85e1d REASSIGN OWNED: Support foreign data wrappers and servers
This was overlooked when implementing those kinds of objects, in commit
cae565e503.

Per report from Pawel Casperek.
2012-02-22 17:33:12 -03:00
Tom Lane
593a9631a7 Don't clear btpo_cycleid during _bt_vacuum_one_page.
When "vacuuming" a single btree page by removing LP_DEAD tuples, we are not
actually within a vacuum operation, but rather in an ordinary insertion
process that could well be running concurrently with a vacuum.  So clearing
the cycleid is incorrect, and could cause the concurrent vacuum to miss
removing tuples that it needs to remove.  This is a longstanding bug
introduced by commit e6284649b9 of
2006-07-25.  I believe it explains Maxim Boguk's recent report of index
corruption, and probably some other previously unexplained reports.

In 9.0 and up this is a one-line fix; before that we need to introduce a
flag to tell _bt_delitems what to do.
2012-02-21 15:03:36 -05:00
Tom Lane
9789c99d01 Cosmetic cleanup for commit a760893dbd.
Mostly, fixing overlooked comments.
2012-02-21 14:14:16 -05:00
Magnus Hagander
c2a2f7516b Avoid double close of file handle in syslogger on win32
This causes an exception when running under a debugger or in particular
when running on a debug version of Windows.

Patch from MauMau
2012-02-21 17:12:25 +01:00
Andrew Dunstan
6b044cb810 Fix typo, noticed by Will Crawford. 2012-02-21 11:03:51 -05:00
Andrew Dunstan
83fcaffea2 Fix a couple of cases of JSON output.
First, as noted by Itagaki Takahiro, a datum of type JSON doesn't
need to be escaped. Second, ensure that numeric output not in
the form of a legal JSON number is quoted and escaped.
2012-02-20 15:01:03 -05:00
Tom Lane
5223f96d92 Fix regex back-references that are directly quantified with *.
The syntax "\n*", that is a backref with a * quantifier directly applied
to it, has never worked correctly in Spencer's library.  This has been an
open bug in the Tcl bug tracker since 2005:
https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894

The core of the problem is in parseqatom(), which first changes "\n*" to
"\n+|" and then applies repeat() to the NFA representing the backref atom.
repeat() thinks that any arc leading into its "rp" argument is part of the
sub-NFA to be repeated.  Unfortunately, since parseqatom() already created
the arc that was intended to represent the empty bypass around "\n+", this
arc gets moved too, so that it now leads into the state loop created by
repeat().  Thus, what was supposed to be an "empty" bypass gets turned into
something that represents zero or more repetitions of the NFA representing
the backref atom.  In the original example, in place of
	^([bc])\1*$
we now have something that acts like
	^([bc])(\1+|[bc]*)$
At runtime, the branch involving the actual backref fails, as it's supposed
to, but then the other branch succeeds anyway.

We could no doubt fix this by some rearrangement of the operations in
parseqatom(), but that code is plenty ugly already, and what's more the
whole business of converting "x*" to "x+|" probably needs to go away to fix
another problem I'll mention in a moment.  Instead, this patch suppresses
the *-conversion when the target is a simple backref atom, leaving the case
of m == 0 to be handled at runtime.  This makes the patch in regcomp.c a
one-liner, at the cost of having to tweak cbrdissect() a little.  In the
event I went a bit further than that and rewrote cbrdissect() to check all
the string-length-related conditions before it starts comparing characters.
It seems a bit stupid to possibly iterate through many copies of an
n-character backreference, only to fail at the end because the target
string's length isn't a multiple of n --- we could have found that out
before starting.  The existing coding could only be a win if integer
division is hugely expensive compared to character comparison, but I don't
know of any modern machine where that might be true.

This does not fix all the problems with quantified back-references.  In
particular, the code is still broken for back-references that appear within
a larger expression that is quantified (so that direct insertion of the
quantification limits into the BACKREF node doesn't apply).  I think fixing
that will take some major surgery on the NFA code, specifically introducing
an explicit iteration node type instead of trying to transform iteration
into concatenation of modified regexps.

Back-patch to all supported branches.  In HEAD, also add a regression test
case for this.  (It may seem a bit silly to create a regression test file
for just one test case; but I'm expecting that we will soon import a whole
bunch of regex regression tests from Tcl, so might as well create the
infrastructure now.)
2012-02-20 00:52:33 -05:00
Tom Lane
e00f68e49c Add caching of ctype.h/wctype.h results in regc_locale.c.
While this doesn't save a huge amount of runtime, it still seems worth
doing, especially since I realized that the data copying I did in my first
draft was quite unnecessary.  In this version, once we have the results
cached, getting them back for re-use is really very cheap.

Also, remove the hard-wired limitation to not consider wctype.h results for
character codes above 255.  It turns out that we can't push the limit as
far up as I'd originally hoped, because the regex colormap code is not
efficient enough to cope very well with character classes containing many
thousand letters, which a Unicode locale is entirely capable of producing.
Still, we can push it up to U+7FF (which I chose as the limit of 2-byte
UTF8 characters), which will at least make Eastern Europeans happy pending
a better solution.  Thus, this commit resolves the specific complaint in
bug #6457, but not the more general issue that letters of non-western
alphabets are mostly not recognized as matching [[:alpha:]].
2012-02-19 21:01:13 -05:00
Tom Lane
27af91438b Create the beginnings of internals documentation for the regex code.
Create src/backend/regex/README to hold an implementation overview of
the regex package, and fill it in with some preliminary notes about
the code's DFA/NFA processing and colormap management.  Much more to
do there of course.

Also, improve some code comments around the colormap and cvec code.
No functional changes except to add one missing assert.
2012-02-19 18:58:23 -05:00
Andrew Dunstan
2f582f76b1 Improve pretty printing of viewdefs.
Some line feeds are added to target lists and from lists to make
them more readable. By default they wrap at 80 columns if possible,
but the wrap column is also selectable - if 0 it wraps after every
item.

Andrew Dunstan, reviewed by Hitoshi Harada.
2012-02-19 11:43:46 -05:00
Tom Lane
08fd6ff37f Sync regex code with Tcl 8.5.11.
Sync our regex code with upstream changes since last time we did this,
which was Tcl 8.5.0 (see commit df1e965e12).

There are no functional changes here; the main point is just to lay down
a commit-log marker that somebody has looked at this recently, and to do
what we can to keep the two codebases comparable.
2012-02-17 19:44:26 -05:00
Tom Lane
4767bc8ff2 Improve statistics estimation to make some use of DISTINCT in sub-queries.
Formerly, we just punted when trying to estimate stats for variables coming
out of sub-queries using DISTINCT, on the grounds that whatever stats we
might have for underlying table columns would be inapplicable.  But if the
sub-query has only one DISTINCT column, we can consider its output variable
as being unique, which is useful information all by itself.  The scope of
this improvement is pretty narrow, but it costs nearly nothing, so we might
as well do it.  Per discussion with Andres Freund.

This patch differs from the draft I submitted yesterday in updating various
comments about vardata.isunique (to reflect its extended meaning) and in
tweaking the interaction with security_barrier views.  There does not seem
to be a reason why we can't use this sort of knowledge even when the
sub-query is such a view.
2012-02-16 17:34:00 -05:00
Tom Lane
4bfe68dfab Run a portal's cleanup hook immediately when pushing it to FAILED state.
This extends the changes of commit 6252c4f9e2
so that we run the cleanup hook earlier for failure cases as well as
success cases.  As before, the point is to avoid an assertion failure from
an Assert I added in commit a874fe7b4c, which
was meant to check that no user-written code can be called during portal
cleanup.  This fixes a case reported by Pavan Deolasee in which the Assert
could be triggered during backend exit (see the new regression test case),
and also prevents the possibility that the cleanup hook is run after
portions of the portal's state have already been recycled.  That doesn't
really matter in current usage, but it foreseeably could matter in the
future.

Back-patch to 9.1 where the Assert in question was added.
2012-02-15 16:19:01 -05:00
Robert Haas
edec8c8e00 Fix VPATH builds, broken by my recent commit to speed up tuplesorting.
The relevant commit is 337b6f5ecf.
2012-02-15 15:53:53 -05:00
Robert Haas
337b6f5ecf Speed up in-memory tuplesorting.
Per recent work by Peter Geoghegan, it's significantly faster to
tuplesort on a single sortkey if ApplySortComparator is inlined into
quicksort rather reached via a function pointer.  It's also faster
in general to have a version of quicksort which is specialized for
sorting SortTuple objects rather than objects of arbitrary size and
type.  This requires a couple of additional copies of the quicksort
logic, which in this patch are generate using a Perl script.  There
might be some benefit in adding further specializations here too,
but thus far it's not clear that those gains are worth their weight
in code footprint.
2012-02-15 12:13:32 -05:00
Robert Haas
73a4b994a6 Make CREATE/ALTER FUNCTION support NOT LEAKPROOF.
Because it isn't good to be able to turn things on, and not off again.
2012-02-15 10:45:08 -05:00
Tom Lane
398f70ec07 Preserve column names in the execution-time tupledesc for a RowExpr.
The hstore and json datatypes both have record-conversion functions that
pay attention to column names in the composite values they're handed.
We used to not worry about inserting correct field names into tuple
descriptors generated at runtime, but given these examples it seems
useful to do so.  Observe the nicer-looking results in the regression
tests whose results changed.

catversion bump because there is a subtle change in requirements for stored
rule parsetrees: RowExprs from ROW() constructs now have to include field
names.

Andrew Dunstan and Tom Lane
2012-02-14 17:34:56 -05:00
Robert Haas
cd30728fb2 Allow LEAKPROOF functions for better performance of security views.
We don't normally allow quals to be pushed down into a view created
with the security_barrier option, but functions without side effects
are an exception: they're OK.  This allows much better performance in
common cases, such as when using an equality operator (that might
even be indexable).

There is an outstanding issue here with the CREATE FUNCTION / ALTER
FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the
leakproof flag.  But I'm committing this as-is so that it doesn't
have to be rebased again; we can fix up the grammar in a future
commit.

KaiGai Kohei, with some wordsmithing by me.
2012-02-13 22:21:14 -05:00
Heikki Linnakangas
21b1634275 Fix heap_multi_insert to set t_self field in the caller's tuples.
If tuples were toasted, heap_multi_insert didn't update the ctid on the
original tuples. This caused a failure if there was an after trigger
(including a foreign key), on the table, and a tuple got toasted.

Per off-list report and test case from Ted Phelps
2012-02-13 10:20:50 +02:00
Robert Haas
d429ebe347 Add a comment to AdjustIntervalForTypmod to reduce chance of future bugs.
It's not entirely evident how the logic here relates to the
interval_transform function, so let's clue people in that they need to
check that if the rules change.
2012-02-09 12:24:36 -05:00
Robert Haas
6656588575 Improve interval_transform function to detect a few more cases.
Noah Misch, per a review comment from me.
2012-02-09 12:24:22 -05:00
Heikki Linnakangas
82e73ba0d1 Add new keywords SNAPSHOT and TYPES to the keyword list in gram.y
These were added to kwlist.h as unreserved keywords in separate patches,
but authors forgot to add them to the corresponding list in gram.y.
Because of that, even though they were supposed to be unreserved keywords,
they could not be used as identifiers. src/tools/check_keywords.pl is your
friend.
2012-02-09 11:37:54 +02:00
Tom Lane
331bf6712c Throw error sooner for unlogged GiST indexes.
Throwing an error only after we've built the main index fork is pretty
unfriendly when the table already contains data.  Per gripe from Jay
Levitt.
2012-02-08 16:19:27 -05:00
Tom Lane
cb7c84fae8 Check misplaced window functions before checking aggregate/group by sanity.
If somebody puts a window function in WHERE, we should complain about that
in so many words.  The previous coding tended to complain about the window
function's arguments instead, which is likely to be misleading to users who
are unclear on the semantics of window functions; as seen for example in
bug #6440 from Matyas Novak.

Just another example of how "add new code at the end" is frequently a bad
heuristic.
2012-02-08 13:15:02 -05:00
Robert Haas
c13897983a Add transform functions for various temporal typmod coercisions.
This enables ALTER TABLE to skip table and index rebuilds in some cases.

Noah Misch, with trivial changes by me.
2012-02-08 09:33:37 -05:00
Heikki Linnakangas
1a01560cbb Rename LWLockWaitUntilFree to LWLockAcquireOrWait.
LWLockAcquireOrWait makes it more clear that the lock is acquired if it's
free.
2012-02-08 09:17:13 +02:00
Robert Haas
af7dd696b0 Fix typos pointed out by Noah Misch. 2012-02-07 21:40:49 -05:00
Robert Haas
f7d7dade8a Add a transform function for varbit typmod coercisions.
This enables ALTER TABLE to skip table and index rebuilds when the
new type is unconstraint varbit, or when the allowable number of bits
is not decreasing.

Noah Misch, with review and a fix for an OID collision by me.
2012-02-07 12:42:50 -05:00
Robert Haas
3cc0800829 Add a transform function for numeric typmod coercisions.
This enables ALTER TABLE to skip table and index rebuilds when a column
is changed to an unconstrained numeric, or when the scale is unchanged
and the precision does not decrease.

Noah Misch, with a few stylistic changes and a fix for an OID
collision by me.
2012-02-07 12:08:26 -05:00
Robert Haas
af7914c662 Add TIMING option to EXPLAIN, to allow eliminating of timing overhead.
Sometimes it may be useful to get actual row counts out of EXPLAIN
(ANALYZE) without paying the cost of timing every node entry/exit.
With this patch, you can say EXPLAIN (ANALYZE, TIMING OFF) to get that.

Tomas Vondra, reviewed by Eric Theise, with minor doc changes by me.
2012-02-07 11:23:04 -05:00
Heikki Linnakangas
15ad6f1510 When building with LWLOCK_STATS, initialize the stats in LWLockWaitUntilFree.
If LWLockWaitUntilFree was called before the first LWLockAcquire call, you
would either crash because of access to uninitialized array or account the
acquisition incorrectly. LWLockConditionalAcquire doesn't have this problem
because it doesn't update the lwlock stats.

In practice, this never happens because there is no codepath where you would
call LWLockWaitUntilfree before LWLockAcquire after a new process is
launched. But that's just accidental, there's no guarantee that that's
always going to be true in the future.

Spotted by Jeff Janes.
2012-02-07 10:11:54 +02:00
Tom Lane
442231d7f7 Fix postmaster to attempt restart after a hot-standby crash.
The postmaster was coded to treat any unexpected exit of the startup
process (i.e., the WAL replay process) as a catastrophic crash, and not try
to restart it. This was OK so long as the startup process could not have
any sibling postmaster children.  However, if a hot-standby backend
crashes, we SIGQUIT the startup process along with everything else, and the
resulting exit is hardly "unexpected".  Treating it as such meant we failed
to restart a standby server after any child crash at all, not only a crash
of the WAL replay process as intended.  Adjust that.  Back-patch to 9.0
where hot standby was introduced.
2012-02-06 15:30:21 -05:00
Tom Lane
5fc78efcec Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay.  Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction.  Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record.  And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.

Back-patch to 9.0 where Hot Standby was introduced.  In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly.  Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 14:44:41 -05:00
Tom Lane
c6d76d7c82 Add locking around WAL-replay modification of shared-memory variables.
Originally, most of this code assumed that no Postgres backends could be
running concurrently with it, and so no locking could be needed.  That
assumption fails in Hot Standby.  While it's still true that Hot Standby
backends should never change values like nextXid, they can examine them,
and consistency is important in some cases such as when computing a
snapshot.  Therefore, prudence requires that WAL replay code obtain the
relevant locks when modifying such variables, even though it can examine
them without taking a lock.  We were following that coding rule in some
places but not all.  This commit applies the coding rule uniformly to all
updates of ShmemVariableCache and MultiXactState fields; a search of the
replay routines did not find any other cases that seemed to be at risk.

In addition, this commit fixes a longstanding thinko in replay of NEXTOID
and checkpoint records: we tried to advance nextOid only if it was behind
the value in the WAL record, but the comparison would draw the wrong
conclusion if OID wraparound had occurred since the previous value.
Better to just unconditionally assign the new value, since OID assignment
shouldn't be happening during replay anyway.

The additional locking seems to be more in the nature of future-proofing
than fixing any live bug, so I am not going to back-patch it.  The NEXTOID
fix will be back-patched separately.
2012-02-06 12:34:10 -05:00
Tom Lane
17118825b8 Fix transient clobbering of shared buffers during WAL replay.
RestoreBkpBlocks was in the habit of zeroing and refilling the target
buffer; which was perfectly safe when the code was written, but is unsafe
during Hot Standby operation.  The reason is that we have coding rules
that allow backends to continue accessing a tuple in a heap relation while
holding only a pin on its buffer.  Such a backend could see transiently
zeroed data, if WAL replay had occasion to change other data on the page.
This has been shown to be the cause of bug #6425 from Duncan Rance (who
deserves kudos for developing a sufficiently-reproducible test case) as
well as Bridget Frey's re-report of bug #6200.  It most likely explains the
original report as well, though we don't yet have confirmation of that.

To fix, change the code so that only bytes that are supposed to change will
change, even transiently.  This actually saves cycles in RestoreBkpBlocks,
since it's not writing the same bytes twice.

Also fix seq_redo, which has the same disease, though it has to work a bit
harder to meet the requirement.

So far as I can tell, no other WAL replay routines have this type of bug.
In particular, the index-related replay routines, which would certainly be
broken if they had to meet the same standard, are not at risk because we
do not have coding rules that allow access to an index page when not
holding a buffer lock on it.

Back-patch to 9.0 where Hot Standby was added.
2012-02-05 15:49:17 -05:00
Tom Lane
ee68a44106 Improve comment. 2012-02-04 22:37:34 -05:00
Tom Lane
2af72cefea Add missing Assert and fix inaccurate elog message in standby_redo().
All other WAL redo routines either call RestoreBkpBlocks() or Assert that
they haven't been passed any backup blocks.  Make this one do likewise.
Also, fix incorrect routine name in its failure message.
2012-02-04 22:32:35 -05:00
Tom Lane
9bff0780cf Allow SQL-language functions to reference parameters by name.
Matthew Draper, reviewed by Hitoshi Harada
2012-02-04 19:23:49 -05:00
Andrew Dunstan
39909d1d39 Add array_to_json and row_to_json functions.
Also move the escape_json function from explain.c to json.c where it
seems to belong.

Andrew Dunstan, Reviewd by Abhijit Menon-Sen.
2012-02-03 12:11:16 -05:00
Robert Haas
0ed7445d73 Allow spgist's text_ops to handle pattern-matching operators.
This was presumably intended to work this way all along, but a few key
bits of indxpath.c didn't get the memo.

Robert Haas and Tom Lane
2012-02-02 13:10:56 -05:00
Robert Haas
b4e0741727 Avoid re-checking for visibility map extension too frequently.
When testing bits (but not when setting or clearing them), we now
won't check whether the map has been extended.  This significantly
improves performance in the case where the visibility map doesn't
exist yet, by avoiding an extra system call per tuple.  To make
sure backends notice eventually, send an smgr inval on VM extension.

Dean Rasheed, with minor modifications by me.
2012-02-01 20:35:42 -05:00
Peter Eisentraut
8a02339e9b initdb: Add options --auth-local and --auth-host
reviewed by Robert Haas and Pavel Stehule
2012-02-01 21:18:55 +02:00
Tom Lane
c318aeed84 Try to be more consistent about accepting denormalized float8 numbers.
On some platforms, strtod() reports ERANGE for a denormalized value (ie,
one that can be represented as distinct from zero, but is too small to have
full precision).  On others, it doesn't.  It seems better to try to accept
these values consistently, so add a test to see if the result value
indicates a true out-of-range condition.  This should be okay per Single
Unix Spec.  On machines where the underlying math isn't IEEE standard, the
behavior for such small numbers may not be very consistent, but then it
wouldn't be anyway.

Marti Raudsepp, after a proposal by Jeroen Vermeulen
2012-02-01 13:11:16 -05:00
Robert Haas
5384a73f98 Built-in JSON data type.
Like the XML data type, we simply store JSON data as text, after checking
that it is valid.  More complex operations such as canonicalization and
comparison may come later, but this is enough for not.

There are a few open issues here, such as whether we should attempt to
detect UTF-8 surrogate pairs represented as \uXXXX\uYYYY, but this gets
the basic framework in place.
2012-01-31 11:48:23 -05:00
Heikki Linnakangas
82d4b262d9 Fix bug in the new wait-until-lwlock-is-free mechanism.
If there was a wait-until-free process in the head of the wait queue,
followed by an exclusive locker, the exclusive locker was not be woken up
as it should.
2012-01-31 00:09:30 +02:00
Peter Eisentraut
82e83f46a2 Add sequence USAGE privileges to information schema
The sequence USAGE privilege is sufficiently similar to the SQL
standard that it seems reasonable to show in the information schema.
Also add some compatibility notes about it on the GRANT reference
page.
2012-01-30 21:45:42 +02:00
Heikki Linnakangas
9b38d46d9f Make group commit more effective.
When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.

This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.

Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.
2012-01-30 16:53:48 +02:00
Simon Riggs
ba1868ba31 Minor bug fix and cleanup from self-review of sync rep queues patch. 2012-01-30 14:36:17 +00:00
Simon Riggs
73f617f13f Various minor comments changes from bgwriter to checkpointer. 2012-01-30 14:34:25 +00:00
Heikki Linnakangas
a578257040 Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.

Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.

Backpatch to all supported versions.
2012-01-30 11:13:36 +02:00
Tom Lane
ad10853b30 Assorted comment fixes, mostly just typos, but some obsolete statements.
YAMAMOTO Takashi
2012-01-29 19:23:56 -05:00
Tom Lane
21a39de580 Tweak index costing for problems with partial indexes.
btcostestimate() makes an estimate of the number of index tuples that will
be visited based on knowledge of which index clauses can actually bound the
scan within nbtree.  However, it forgot to account for partial indexes in
this calculation, with the result that the cost of the index scan could be
significantly overestimated for a partial index.  Fix that by merging the
predicate with the abbreviated indexclause list, in the same way as we do
with the full list to estimate how many heap tuples will be visited.

Also, slightly increase the "fudge factor" that's meant to give preference
to smaller indexes over larger ones.  While this is applied to all indexes,
it's most important for partial indexes since it can be the only factor
that makes a partial index look cheaper than a similar full index.
Experimentation shows that the existing value is so small as to easily get
swamped by noise such as page-boundary-roundoff behavior.  I'm tempted to
kick it up more than this, but will refrain for now.

Per report from Ruben Blanco.  These are long-standing issues, but given
the lack of prior complaints I'm not going to risk changing planner
behavior in back branches by back-patching.
2012-01-29 18:37:14 -05:00
Tom Lane
b28ffd0fcc Fix pushing of index-expression qualifications through UNION ALL.
In commit 57664ed25e, I made the planner
wrap non-simple-variable outputs of appendrel children (IOW, child SELECTs
of UNION ALL subqueries) inside PlaceHolderVars, in order to solve some
issues with EquivalenceClass processing.  However, this means that any
upper-level WHERE clauses mentioning such outputs will now contain
PlaceHolderVars after they're pushed down into the appendrel child,
and that prevents indxpath.c from recognizing that they could be matched
to index expressions.  To fix, add explicit stripping of PlaceHolderVars
from index operands, same as we have long done for RelabelType nodes.
Add a regression test covering both this and the plain-UNION case (which
is a totally different code path, but should also be able to do it).

Per bug #6416 from Matteo Beccati.  Back-patch to 9.1, same as the
previous change.
2012-01-29 16:31:23 -05:00
Tom Lane
4ec6581c0c Fix handling of init_plans list in inheritance_planner().
Formerly we passed an empty list to each per-child-table invocation of
grouping_planner, and then merged the results into the global list.
However, that fails if there's a CTE attached to the statement, because
create_ctescan_plan uses the list to find the plan referenced by a CTE
reference; so it was unable to find any CTEs attached to the outer UPDATE
or DELETE.  But there's no real reason not to use the same list throughout
the process, and doing so is simpler and faster anyway.

Per report from Josh Berkus of "could not find plan for CTE" failures.
Back-patch to 9.1 where we added support for WITH attached to UPDATE or
DELETE.  Add some regression test cases, too.
2012-01-28 20:24:42 -05:00
Tom Lane
7c1719bc68 Fix handling of data-modifying CTE subplans in EvalPlanQual.
We can't just skip initializing such subplans, because the referencing CTE
node will expect to find the subplan available when it initializes.  That
in turn means that ExecInitModifyTable must allow the case (which actually
it needed to do anyway, since there's no guarantee that ModifyTable is
exactly at the top of the CTE plan tree).  So move the complaint about not
being allowed in EvalPlanQual mode to execution instead of initialization.
Testing turned up yet another problem, which is that we'd try to
re-initialize the result relation's index list, leading to leaks and
dangling pointers.

Per report from Phil Sorber.  Back-patch to 9.1 where data-modifying CTEs
were introduced.
2012-01-28 17:43:57 -05:00
Magnus Hagander
672614cf21 Prevent logging "failed to stat file: success" for temp files
This was broken in commit bc3347484a, the
addition of statistics counters for temp files.

Reported by Thom Brown
2012-01-28 10:03:26 +01:00
Tom Lane
0816fad6ee Undo 8.4-era lobotomization of subquery pullup rules.
After the planner was fixed to convert some IN/EXISTS subqueries into
semijoins or antijoins, we had to prevent it from doing that in some
cases where the plans risked getting much worse.  The reason the plans
got worse was that in the unoptimized implementation, subqueries could
reference parameters from the outer query at any join level, and so
full table scans could be avoided even if they were one or more levels
of join below where the semi/anti join would be.  Now that we have
sufficient mechanism in the planner to handle such cases properly,
it should no longer be necessary to play dumb here.

This reverts commits 07b9936a0f and
cd1f0d04bf.  The latter was a stopgap
fix that wasn't really sufficiently analyzed at the time.  Rather
than just restricting ourselves to cases where the new join can be
stacked on the right-hand input, we should also consider whether it
can be stacked on the left-hand input.
2012-01-27 19:46:41 -05:00
Tom Lane
e2fa76d80b Use parameterized paths to generate inner indexscans more flexibly.
This patch fixes the planner so that it can generate nestloop-with-
inner-indexscan plans even with one or more levels of joining between
the indexscan and the nestloop join that is supplying the parameter.
The executor was fixed to handle such cases some time ago, but the
planner was not ready.  This should improve our plans in many situations
where join ordering restrictions formerly forced complete table scans.

There is probably a fair amount of tuning work yet to be done, because
of various heuristics that have been added to limit the number of
parameterized paths considered.  However, we are not going to find out
what needs to be adjusted until the code gets some real-world use, so
it's time to get it in there where it can be tested easily.

Note API change for index AM amcostestimate functions.  I'm not aware of
any non-core index AMs, but if there are any, they will need minor
adjustments.
2012-01-27 19:26:38 -05:00
Peter Eisentraut
b376ec6fa5 Show default privileges in information schema
Hitherto, the information schema only showed explicitly granted
privileges that were visible in the *acl catalog columns.  If no
privileges had been granted, the implicit privileges were not shown.

To fix that, add an SQL-accessible version of the acldefault()
function, and use that inside the aclexplode() calls to substitute the
catalog-specific default privilege set for null values.

reviewed by Abhijit Menon-Sen
2012-01-27 21:58:51 +02:00
Peter Eisentraut
bf90562aa4 Revert unfortunate whitespace change
In e5e2fc842c, blank lines were removed
after a comment block, which now looks as though the comment refers to
the immediately following code, but it actually refers to the
preceding code.  So put the blank lines back.
2012-01-27 21:39:38 +02:00
Peter Eisentraut
2787458362 Disallow ALTER DOMAIN on non-domain type everywhere
This has been the behavior already in most cases, but through
omission, ALTER DOMAIN / OWNER TO and ALTER DOMAIN / SET SCHEMA would
silently work on non-domain types as well.
2012-01-27 21:20:34 +02:00
Peter Eisentraut
8137f2c323 Hide most variable-length fields from Form_pg_* structs
Those fields only appear in the structs so that genbki.pl can create
the BKI bootstrap files for the catalogs.  But they are not actually
usable from C.  So hiding them can prevent coding mistakes, saves
stack space, and can help the compiler.

In certain catalogs, the first variable-length field has been kept
visible after manual inspection.  These exceptions are noted in C
comments.

reviewed by Tom Lane
2012-01-27 20:16:17 +02:00
Peter Eisentraut
8a3f745f16 Do not access indclass through Form_pg_index
Normally, accessing variable-length members of catalog structures past
the first one doesn't work at all.  Here, it happened to work because
indnatts was checked to be 1, and so the defined FormData_pg_index
layout, using int2vector[1] and oidvector[1] for variable-length
arrays, happened to match the actual memory layout.  But it's a very
fragile assumption, and it's not in a performance-critical path, so
code it properly using heap_getattr() instead.

bug analysis by Tom Lane
2012-01-27 20:08:34 +02:00
Heikki Linnakangas
cf3fff6326 Initialize the new bgwriterLatch field properly.
Peter Geoghegan
2012-01-27 18:25:32 +02:00
Robert Haas
c5a03256c7 Adjust tuplesort.c based on the fact that we never use the OS's qsort().
Our own qsort_arg() implementation doesn't have the defect previously
observed to affect only QNX 4, so it seems sufficiently to assert that
it isn't broken rather than retesting.  Also, update a few comments to
clarify why it's valuable to retain a tie-break rule based on CTID
during index builds.

Peter Geoghegan, with slight tweaks by me.
2012-01-26 14:43:28 -05:00
Robert Haas
2d1371d3ee Be more clear when a new column name collides with a system column name.
We now use the same error message for ALTER TABLE .. ADD COLUMN or
ALTER TABLE .. RENAME COLUMN that we do for CREATE TABLE.  The old
message was accurate, but might be confusing to users not aware of our
system columns.

Vik Reykja, with some changes by me, and further proofreading by Tom Lane
2012-01-26 12:44:30 -05:00
Heikki Linnakangas
6d90eaaa89 Make bgwriter sleep longer when it has no work to do, to save electricity.
To make it wake up promptly when activity starts again, backends nudge it
by setting a latch in MarkBufferDirty(). The latch is kept set while
bgwriter is active, so there is very little overhead from that when the
system is busy. It is only armed before going into longer sleep.

Peter Geoghegan, with some changes by me.
2012-01-26 18:39:13 +02:00
Robert Haas
467ff207f5 Add missing #include, to suppress compiler warning. 2012-01-26 10:16:26 -05:00
Magnus Hagander
7729e22d83 Fix a copy/pasted typo in several comments 2012-01-26 16:02:33 +01:00
Magnus Hagander
61cb8c5abb Add deadlock counter to pg_stat_database
Adds a counter that tracks number of deadlocks that occurred in
each database to pg_stat_database.

Magnus Hagander, reviewed by Jaime Casanova
2012-01-26 15:58:19 +01:00
Robert Haas
0e549697d1 Classify DROP operations by whether or not they are user-initiated.
This doesn't do anything useful just yet, but is intended as supporting
infrastructure for allowing sepgsql to sensibly check DROP permissions.

KaiGai Kohei and Robert Haas
2012-01-26 09:30:27 -05:00
Magnus Hagander
bc3347484a Track temporary file count and size in pg_stat_database
Add counters for number and size of temporary files used
for spill-to-disk queries for each database to the
pg_stat_database view.

Tomas Vondra, review by Magnus Hagander
2012-01-26 14:41:19 +01:00
Robert Haas
9d35116611 Damage control for yesterday's CheckIndexCompatible changes.
Rip out a regression test that doesn't play well with settings put in
place by the build farm, and rewrite the code in CheckIndexCompatible
in a hopefully more transparent style.
2012-01-26 08:21:31 -05:00
Robert Haas
9f9135d129 Instrument index-only scans to count heap fetches performed.
Patch by me; review by Tom Lane, Jeff Davis, and Peter Geoghegan.
2012-01-25 20:41:52 -05:00
Robert Haas
6eb71ac552 Make CheckIndexCompatible simpler and more bullet-proof.
This gives up the "don't rewrite the index" behavior in a couple of
relatively unimportant cases, such as changing between an array type
and an unconstrained domain over that array type, in return for
making this code more future-proof.

Noah Misch
2012-01-25 15:28:07 -05:00
Simon Riggs
8366c7803e Allow pg_basebackup from standby node with safety checking.
Base backup follows recommended procedure, plus goes to great
lengths to ensure that partial page writes are avoided.

Jun Ishizuka and Fujii Masao, with minor modifications
2012-01-25 18:02:04 +00:00
Alvaro Herrera
74ab96a45e Add pg_trigger_depth() function
This reports the depth level of triggers currently in execution, or zero
if not called from inside a trigger.

No catversion bump in this patch, but you have to initdb if you want
access to the new function.

Author: Kevin Grittner
2012-01-25 13:22:54 -03:00
Simon Riggs
443b4821f1 Add new replication mode synchronous_commit = 'write'.
Replication occurs only to memory on standby, not to disk,
so provides additional performance if user wishes to
reduce durability level slightly. Adds concept of multiple
independent sync rep queues.

Fujii Masao and Simon Riggs
2012-01-24 20:22:37 +00:00
Peter Eisentraut
89dda5f297 Remove quotes around format_type_be() output
format_type_be() takes care of any needed quoting itself.
2012-01-24 21:49:27 +02:00
Tom Lane
f26c9896b3 Suppress variable-clobbered-by-longjmp warning seen with older gcc versions. 2012-01-24 13:44:07 -05:00
Tom Lane
beef89567e Suppress possibly-uninitialized-variable warning seen with older gcc versions. 2012-01-24 13:40:26 -05:00
Bruce Momjian
890a9992ce Reduce tab outdent of "error handling" GUC comments in postgresql.conf,
to match surrounding outdenting.
2012-01-24 10:41:00 -05:00
Simon Riggs
c172b7b02e Resolve timing issue with logging locks for Hot Standby.
We log AccessExclusiveLocks for replay onto standby nodes,
but because of timing issues on ProcArray it is possible to
log a lock that is still held by a just committed transaction
that is very soon to be removed. To avoid any timing issue we
avoid applying locks made by transactions with InvalidXid.

Simon Riggs, bug report Tom Lane, diagnosis Pavan Deolasee
2012-01-23 23:37:32 +00:00
Simon Riggs
b8a91d9d1c ALTER <thing> [IF EXISTS] ... allows silent DDL if required,
e.g. ALTER FOREIGN TABLE IF EXISTS foo RENAME TO bar

Pavel Stehule
2012-01-23 23:25:04 +00:00
Magnus Hagander
a65023e7de Further doc cleanups from the pg_stat_activity changes
Fujii Masao
2012-01-20 12:23:26 +01:00
Robert Haas
cc53a1e7cc Add bitwise AND, OR, and NOT operators for macaddr data type.
Brendan Jurd, reviewed by Fujii Masao
2012-01-19 15:25:14 -05:00
Magnus Hagander
4f42b546fd Separate state from query string in pg_stat_activity
This separates the state (running/idle/idleintransaction etc) into
it's own field ("state"), and leaves the query field containing just
query text.

The query text will now mean "current query" when a query is running
and "last query" in other states. Accordingly,the field has been
renamed from current_query to query.

Since backwards compatibility was broken anyway to make that, the procpid
field has also been renamed to pid - along with the same field in
pg_stat_replication for consistency.

Scott Mead and Magnus Hagander, review work from Greg Smith
2012-01-19 14:19:20 +01:00
Heikki Linnakangas
fa352d662e Make pg_relation_size() and friends return NULL if the object doesn't exist.
That avoids errors when the functions are used in queries like "SELECT
pg_relation_size(oid) FROM pg_class", and a table is dropped concurrently.

Phil Sorber
2012-01-19 13:06:30 +02:00
Heikki Linnakangas
326b922e8b Fix corner case in cleanup of transactions using SSI.
When the only remaining active transactions are READ ONLY, we do a "partial
cleanup" of committed transactions because certain types of conflicts
aren't possible anymore. For committed r/w transactions, we release the
SIREAD locks but keep the SERIALIZABLEXACT. However, for committed r/o
transactions, we can go further and release the SERIALIZABLEXACT too. The
problem was with the latter case: we were returning the SERIALIZABLEXACT to
the free list without removing it from the finished list.

The only real change in the patch is the SHMQueueDelete line, but I also
reworked some of the surrounding code to make it obvious that r/o and r/w
transactions are handled differently -- the existing code felt a bit too
clever.

Dan Ports
2012-01-18 17:57:33 +02:00
Magnus Hagander
ae137bcaab Fix warning about unused variable 2012-01-18 10:24:15 +01:00
Robert Haas
4b496a3583 Catch fatal flex errors in the GUC file lexer.
This prevents the postmaster from unexpectedly croaking if postgresql.conf
contains something like:

include 'invalid_directory_name'

Noah Misch. Reviewed by Tom Lane and myself.
2012-01-17 20:51:38 -05:00
Robert Haas
754b8140a1 fastgetattr is in access/htup.h, not access/heapam.h
Noted by Peter Geoghegan
2012-01-16 20:37:01 -05:00
Alvaro Herrera
3b11247aad Disallow merging ONLY constraints in children tables
When creating a child table, or when attaching an existing table as
child of another, we must not allow inheritable constraints to be
merged with non-inheritable ones, because then grandchildren would not
properly get the constraint.  This would violate the grandparent's
expectations.

Bugs noted by Robert Haas.

Author: Nikhil Sontakke
2012-01-16 19:27:05 -03:00
Robert Haas
1575fbcb79 Prevent adding relations to a concurrently dropped schema.
In the previous coding, it was possible for a relation to be created
via CREATE TABLE, CREATE VIEW, CREATE SEQUENCE, CREATE FOREIGN TABLE,
etc.  in a schema while that schema was meanwhile being concurrently
dropped.  This led to a pg_class entry with an invalid relnamespace
value.  The same problem could occur if a relation was moved using
ALTER .. SET SCHEMA while the target schema was being concurrently
dropped.  This patch prevents both of those scenarios by locking the
schema to which the relation is being added using AccessShareLock,
which conflicts with the AccessExclusiveLock taken by DROP.

As a desirable side effect, this also prevents the use of CREATE OR
REPLACE VIEW to queue for an AccessExclusiveLock on a relation on which
you have no rights: that will now fail immediately with a permissions
error, before trying to obtain a lock.

We need similar protection for all other object types, but as everything
other than relations uses a slightly different set of code paths, I'm
leaving that for a separate commit.

Original complaint (as far as I could find) about CREATE by Nikhil
Sontakke; risk for ALTER .. SET SCHEMA pointed out by Tom Lane;
further details by Dan Farina; patch by me; review by Hitoshi Harada.
2012-01-16 09:49:34 -05:00
Heikki Linnakangas
b2b4af535e Fix poll() implementation of WaitLatchOrSocket to notice postmaster death.
When the remote end of the pipe is closed, select() reports the fd as
readable, but poll() has a separate POLLHUP return code for that.

Spotted by Peter Geoghegan.
2012-01-15 22:08:03 +02:00
Magnus Hagander
0495aaad8b Allow a user to kill his own queries using pg_cancel_backend()
Allows a user to use pg_cancel_queries() to cancel queries in
other backends if they are running under the same role.
pg_terminate_backend() still requires superuser permissoins.

Short patch, many authors working on the bikeshed: Magnus Hagander,
Josh Kupershmidt, Edward Muller, Greg Smith.
2012-01-15 15:34:40 +01:00
Heikki Linnakangas
00c5f55061 Make superuser imply replication privilege. The idea of a privilege that
superuser doesn't have doesn't make much sense, as a superuser can do
whatever he wants through other means, anyway. So instead of granting
replication privilege to superusers in CREATE USER time by default, allow
replication connection from superusers whether or not they have the
replication privilege.

Patch by Noah Misch, per discussion on bug report #6264
2012-01-14 18:22:16 +02:00
Robert Haas
d0dcb315db Fix broken logic in lazy_vacuum_heap.
As noted by Tom Lane, the previous coding in this area, which I
introduced in commit bbb6e559c4, was
poorly tested and caused the vacuum's second heap to go into what would
have been an infinite loop but for the fact that it eventually caused a
memory allocation failure.  This version seems to work better.
2012-01-13 08:22:31 -05:00
Robert Haas
4d0b11a0ca Typo fix. 2012-01-13 08:21:45 -05:00
Simon Riggs
5530623d03 Correctly initialise shared recoveryLastRecPtr in recovery.
Previously we used ReadRecPtr rather than EndRecPtr, which was
not a serious error but caused pg_stat_replication to report
incorrect replay_location until at least one WAL record is replayed.

Fujii Masao
2012-01-13 13:02:44 +00:00
Simon Riggs
3f1787c253 Minor but necessary improvements to WAL keepalives
Fujii Masao
2012-01-13 12:59:08 +00:00
Tom Lane
21b446dd09 Fix CLUSTER/VACUUM FULL for toast values owned by recently-updated rows.
In commit 7b0d0e9356, I made CLUSTER and
VACUUM FULL try to preserve toast value OIDs from the original toast table
to the new one.  However, if we have to copy both live and recently-dead
versions of a row that has a toasted column, those versions may well
reference the same toast value with the same OID.  The patch then led to
duplicate-key failures as we tried to insert the toast value twice with the
same OID.  (The previous behavior was not very desirable either, since it
would have silently inserted the same value twice with different OIDs.
That wastes space, but what's worse is that the toast values inserted for
already-dead heap rows would not be reclaimed by subsequent ordinary
VACUUMs, since they go into the new toast table marked live not deleted.)

To fix, check if the copied OID already exists in the new toast table, and
if so, assume that it stores the desired value.  This is reasonably safe
since the only case where we will copy an OID from a previous toast pointer
is when toast_insert_or_update was given that toast pointer and so we just
pulled the data from the old table; if we got two different values that way
then we have big problems anyway.  We do have to assume that no other
backend is inserting items into the new toast table concurrently, but
that's surely safe for CLUSTER and VACUUM FULL.

Per bug #6393 from Maxim Boguk.  Back-patch to 9.0, same as the previous
patch.
2012-01-12 16:40:14 -05:00
Heikki Linnakangas
1b9dea04b5 Remove useless 'needlock' argument from GetXLogInsertRecPtr. It was always
passed as 'true'.
2012-01-11 11:01:47 +02:00
Heikki Linnakangas
9c808f89c2 Refactor XLogInsert a bit. The rdata entries for backup blocks are now
constructed before acquiring WALInsertLock, which slightly reduces the time
the lock is held. Although I could not measure any benefit in benchmarks,
the code is more readable this way.
2012-01-11 11:01:47 +02:00
Peter Eisentraut
a9f2e31cf6 Support CREATE TABLE (LIKE ...) with foreign tables and views
Composite types are not yet supported, because parserOpenTable()
rejects them.
2012-01-10 21:46:29 +02:00
Peter Eisentraut
db49517c62 Rename the internal structures of the CREATE TABLE (LIKE ...) facility
The original implementation of this interpreted it as a kind of
"inheritance" facility and named all the internal structures
accordingly.  This turned out to be very confusing, because it has
nothing to do with the INHERITS feature.  So rename all the internal
parser infrastructure, update the comments, adjust the error messages,
and split up the regression tests.
2012-01-07 23:02:33 +02:00
Robert Haas
df970a0ac8 Fix backwards logic in previous commit.
I wrote this code before committing it, but managed not to include it in
the actual commit.
2012-01-06 22:54:43 -05:00
Robert Haas
1489e2f26a Improve behavior of concurrent ALTER TABLE, and do some refactoring.
ALTER TABLE (and ALTER VIEW, ALTER SEQUENCE, etc.) now use a
RangeVarGetRelid callback to check permissions before acquiring a table
lock.  We also now use the same callback for all forms of ALTER TABLE,
rather than having separate, almost-identical callbacks for ALTER TABLE
.. SET SCHEMA and ALTER TABLE .. RENAME, and no callback at all for
everything else.

I went ahead and changed the code so that no form of ALTER TABLE works
on foreign tables; you must use ALTER FOREIGN TABLE instead.  In 9.1,
it was possible to use ALTER TABLE .. SET SCHEMA or ALTER TABLE ..
RENAME on a foreign table, but not any other form of ALTER TABLE, which
did not seem terribly useful or consistent.

Patch by me; review by Noah Misch.
2012-01-06 22:42:26 -05:00
Robert Haas
33aaa139e6 Make the number of CLOG buffers adaptive, based on shared_buffers.
Previously, this was hardcoded: we always had 8.  Performance testing
shows that isn't enough, especially on big SMP systems, so we allow it
to scale up as high as 32 when there's adequate memory.  On the flip
side, when shared_buffers is very small, drop the number of CLOG buffers
down to as little as 4, so that we can start the postmaster even
when very little shared memory is available.

Per extensive discussion with Simon Riggs, Tom Lane, and others on
pgsql-hackers.
2012-01-06 14:32:18 -05:00
Robert Haas
7e4911b2ae Fix variable confusion in BufferSync().
As noted by Heikki Linnakangas, the previous coding confused the "flags"
variable with the "mask" variable.  The affect of this appears to be that
unlogged buffers would get written out at every checkpoint rather than
only at shutdown time.  Although that's arguably an acceptable failure
mode, I'm back-patching this change, since it seems like a poor idea to
rely on this happening to work.
2012-01-06 08:35:48 -05:00
Peter Eisentraut
104e7dac28 Improve ALTER DOMAIN / DROP CONSTRAINT with nonexistent constraint
ALTER DOMAIN / DROP CONSTRAINT on a nonexistent constraint name did
not report any error.  Now it reports an error.  The IF EXISTS option
was added to get the usual behavior of ignoring nonexistent objects to
drop.
2012-01-05 19:48:55 +02:00
Tom Lane
dfd26f9c5f Make executor's SELECT INTO code save and restore original tuple receiver.
As previously coded, the QueryDesc's dest pointer was left dangling
(pointing at an already-freed receiver object) after ExecutorEnd.  It's a
bit astonishing that it took us this long to notice, and I'm not sure that
the known problem case with SQL functions is the only one.  Fix it by
saving and restoring the original receiver pointer, which seems the most
bulletproof way of ensuring any related bugs are also covered.

Per bug #6379 from Paul Ramsey.  Back-patch to 8.4 where the current
handling of SELECT INTO was introduced.
2012-01-04 18:30:55 -05:00
Tom Lane
ac7a5a3f25 Fix coerce_to_target_type for coerce_type's klugy handling of COLLATE.
Because coerce_type recurses into the argument of a CollateExpr,
coerce_to_target_type's longstanding code for detecting whether coerce_type
had actually done anything (to wit, returned a different node than it
passed in) was broken in 9.1.  This resulted in unexpected failures in
hide_coercion_node; which was not the latter's fault, since it's critical
that we never call it on anything that wasn't inserted by coerce_type.
(Else we might decide to "hide" a user-written function call.)

Fix by removing and replacing the CollateExpr in coerce_to_target_type
itself.  This is all pretty ugly but I don't immediately see a way to make
it nicer.

Per report from Jean-Yves F. Barbier.
2012-01-02 14:43:45 -05:00
Bruce Momjian
e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Simon Riggs
64233902d2 Send new protocol keepalive messages to standby servers.
Allows streaming replication users to calculate transfer latency
and apply delay via internal functions. No external functions yet.
2011-12-31 13:30:26 +00:00
Tom Lane
2ae2e9c007 Revert "Remove troublesome Asserts in cost_mergejoin()."
This reverts commit ff68b256a5.
The recent change to use -fexcess-precision=standard should make those
Asserts safe, and does fix a test case that formerly crashed for me,
so I think there's no need to have a cross-version difference in the
code here.
2011-12-30 17:58:15 -05:00
Peter Eisentraut
037a82704c Standardize treatment of strcmp() return value
Always compare the return value to 0, don't use cute tricks like
if (!strcmp(...)).
2011-12-27 21:19:09 +02:00
Peter Eisentraut
d383c23f6f Remove support for on_exit()
All supported platforms support the C89 standard function atexit()
(SunOS 4 probably being the last one not to), and supporting both
makes the code clumsy.
2011-12-27 20:57:59 +02:00
Peter Eisentraut
9099d84374 Sort file list when creating gettext-files
That way, the created .pot file is more deterministic and not
dependent on the order in which the files are found.
2011-12-27 20:20:56 +02:00
Tom Lane
472d3935a2 Rethink representation of index clauses' mapping to index columns.
In commit e2c2c2e8b1 I made use of nested
list structures to show which clauses went with which index columns, but
on reflection that's a data structure that only an old-line Lisp hacker
could love.  Worse, it adds unnecessary complication to the many places
that don't much care which clauses go with which index columns.  Revert
to the previous arrangement of flat lists of clauses, and instead add a
parallel integer list of column numbers.  The places that care about the
pairing can chase both lists with forboth(), while the places that don't
care just examine one list the same as before.

The only real downside to this is that there are now two more lists that
need to be passed to amcostestimate functions in case they care about
column matching (which btcostestimate does, so not passing the info is not
an option).  Rather than deal with 11-argument amcostestimate functions,
pass just the IndexPath and expect the functions to extract fields from it.
That gets us down to 7 arguments which is better than 11, and it seems
more future-proof against likely additions to the information we keep
about an index path.
2011-12-24 19:03:21 -05:00
Tom Lane
e2c2c2e8b1 Improve planner's handling of duplicated index column expressions.
It's potentially useful for an index to repeat the same indexable column
or expression in multiple index columns, if the columns have different
opclasses.  (If they share opclasses too, the duplicate column is pretty
useless, but nonetheless we've allowed such cases since 9.0.)  However,
the planner failed to cope with this, because createplan.c was relying on
simple equal() matching to figure out which index column each index qual
is intended for.  We do have that information available upstream in
indxpath.c, though, so the fix is to not flatten the multi-level indexquals
list when putting it into an IndexPath.  Then we can rely on the sublist
structure to identify target index columns in createplan.c.  There's a
similar issue for index ORDER BYs (the KNNGIST feature), so introduce a
multi-level-list representation for that too.  This adds a bit more
representational overhead, but we might more or less buy that back by not
having to search for matching index columns anymore in createplan.c;
likewise btcostestimate saves some cycles.

Per bug #6351 from Christian Rudolph.  Likely symptoms include the "btree
index keys must be ordered by attribute" failure shown there, as well as
"operator MMMM is not a member of opfamily NNNN".

Although this is a pre-existing problem that can be demonstrated in 9.0 and
9.1, I'm not going to back-patch it, because the API changes in the planner
seem likely to break things such as index plugins.  The corner cases where
this matters seem too narrow to justify possibly breaking things in a minor
release.
2011-12-23 18:45:14 -05:00
Robert Haas
d5448c7d31 Add bytea_agg, parallel to string_agg.
Pavel Stehule
2011-12-23 08:40:25 -05:00
Robert Haas
0e4611c023 Add a security_barrier option for views.
When a view is marked as a security barrier, it will not be pulled up
into the containing query, and no quals will be pushed down into it,
so that no function or operator chosen by the user can be applied to
rows not exposed by the view.  Views not configured with this
option cannot provide robust row-level security, but will perform far
better.

Patch by KaiGai Kohei; original problem report by Heikki Linnakangas
(in October 2009!).  Review (in earlier versions) by Noah Misch and
others.  Design advice by Tom Lane and myself.  Further review and
cleanup by me.
2011-12-22 16:16:31 -05:00
Peter Eisentraut
f90dd28062 Add ALTER DOMAIN ... RENAME
You could already rename domains using ALTER TYPE, but with this new
command it is more consistent with how other commands treat domains as
a subcategory of types.
2011-12-22 22:43:56 +02:00
Tom Lane
c31224e257 Update per-column ACLs, not only per-table ACL, when changing table owner.
We forgot to modify column ACLs, so privileges were still shown as having
been granted by the old owner.  This meant that neither the new owner nor
a superuser could revoke the now-untraceable-to-table-owner permissions.
Per bug #6350 from Marc Balmer.

This has been wrong since column ACLs were added, so back-patch to 8.4.
2011-12-21 18:23:11 -05:00
Robert Haas
cbe24a6dd8 Improve behavior of concurrent CLUSTER.
In the previous coding, a user could queue up for an AccessExclusiveLock
on a table they did not have permission to cluster, thus potentially
interfering with access by authorized users who got stuck waiting behind
the AccessExclusiveLock.  This approach avoids that.  cluster() has the
same permissions-checking requirements as REINDEX TABLE, so this commit
moves the now-shared callback to tablecmds.c and renames it, per
discussion with Noah Misch.
2011-12-21 15:17:28 -05:00
Robert Haas
d573e239f0 Take fewer snapshots.
When a PORTAL_ONE_SELECT query is executed, we can opportunistically
reuse the parse/plan shot for the execution phase.  This cuts down the
number of snapshots per simple query from 2 to 1 for the simple
protocol, and 3 to 2 for the extended protocol.  Since we are only
reusing a snapshot taken early in the processing of the same protocol
message, the change shouldn't be user-visible, except that the remote
possibility of the planning and execution snapshots being different is
eliminated.

Note that this change does not make it safe to assume that the parse/plan
snapshot will certainly be reused; that will currently only happen if
PortalStart() decides to use the PORTAL_ONE_SELECT strategy.  It might
be worth trying to provide some stronger guarantees here in the future,
but for now we don't.

Patch by me; review by Dimitri Fontaine.
2011-12-21 09:16:55 -05:00
Robert Haas
7f0e4bb82e Shave a few cycles in string_agg().
Pavel Stehule
2011-12-21 08:53:50 -05:00
Tom Lane
1db5af2794 Fix gincostestimate to handle ScalarArrayOpExpr reasonably.
The original coding of this function overlooked the possibility that
it could be passed anything except simple OpExpr indexquals.  But
ScalarArrayOpExpr is possible too, and the code would probably crash
(and surely give ridiculous answers) in such a case.  Add logic to try
to estimate sanely for such cases.

In passing, fix the treatment of inner-indexscan cost estimation: it was
failing to scale up properly for multiple iterations of a nestloop.
(I think somebody might've thought that index_pages_fetched() is linear,
but of course it's not.)

Report, diagnosis, and preliminary patch by Marti Raudsepp; I refactored
it a bit and fixed the cost estimation.

Back-patch into 9.1 where the bogus code was introduced.
2011-12-20 19:57:34 -05:00
Tom Lane
d0024cd188 Avoid crashing when we have problems unlinking files post-commit.
smgrdounlink takes care to not throw an ERROR if it fails to unlink
something, but that caution was rendered useless by commit
3396000684, which put an smgrexists call in
front of it; smgrexists *does* throw error if anything looks funny, such
as getting a permissions error from trying to open the file.  If that
happens post-commit, you get a PANIC, and what's worse the same logic
appears in the WAL replay code, so the database even fails to restart.

Restore the intended behavior by removing the smgrexists call --- it isn't
accomplishing anything that we can't do better by adjusting mdunlink's
ideas of whether it ought to warn about ENOENT or not.

Per report from Joseph Shraibman of unrecoverable crash after trying to
drop a table whose FSM fork had somehow gotten chmod'd to 000 permissions.
Backpatch to 8.4, where the bogus coding was introduced.
2011-12-20 15:00:36 -05:00
Peter Eisentraut
729205571e Add support for privileges on types
This adds support for the more or less SQL-conforming USAGE privilege
on types and domains.  The intent is to be able restrict which users
can create dependencies on types, which restricts the way in which
owners can alter types.

reviewed by Yeb Havinga
2011-12-20 00:05:19 +02:00
Tom Lane
8f57b064fd Rename updateNodeLink to spgUpdateNodeLink.
On reflection, the original name seems way too generic for a global
symbol.  A quick check shows this is the only exported function name
in SP-GiST that doesn't begin with "spg" or contain "SpGist", so the
rest of them seem all right.
2011-12-19 15:38:32 -05:00
Alvaro Herrera
61d81bd28d Allow CHECK constraints to be declared ONLY
This makes them enforceable only on the parent table, not on children
tables.  This is useful in various situations, per discussion involving
people bitten by the restrictive behavior introduced in 8.4.

Message-Id:
8762mp93iw.fsf@comcast.net
CAFaPBrSMMpubkGf4zcRL_YL-AERUbYF_-ZNNYfb3CVwwEqc9TQ@mail.gmail.com

Authors: Nikhil Sontakke, Alex Hunsaker
Reviewed by Robert Haas and myself
2011-12-19 17:30:23 -03:00
Tom Lane
9220362493 Teach SP-GiST to do index-only scans.
Operator classes can specify whether or not they support this; this
preserves the flexibility to use lossy representations within an index.

In passing, move constant data about a given index into the rd_amcache
cache area, instead of doing fresh lookups each time we start an index
operation.  This is mainly to try to make sure that spgcanreturn() has
insignificant cost; I still don't have any proof that it matters for
actual index accesses.  Also, get rid of useless copying of FmgrInfo
pointers; we can perfectly well use the relcache's versions in-place.
2011-12-19 14:58:41 -05:00
Tom Lane
3695a55513 Replace simple constant pg_am.amcanreturn with an AM support function.
The need for this was debated when we put in the index-only-scan feature,
but at the time we had no near-term expectation of having AMs that could
support such scans for only some indexes; so we kept it simple.  However,
the SP-GiST AM forces the issue, so let's fix it.

This patch only installs the new API; no behavior actually changes.
2011-12-18 15:50:37 -05:00
Tom Lane
b7a0e8fb4d Defend against null scankeys in spgist searches.
Should've thought of that one earlier.
2011-12-17 19:08:28 -05:00
Tom Lane
dd45d3ad33 Fix some long-obsolete references to XLogOpenRelation.
These were missed in commit a213f1ee6c,
which removed that function.
2011-12-17 18:26:52 -05:00
Tom Lane
85df5dbf5a Fix compiler warning seen on 64-bit machine. 2011-12-17 16:51:36 -05:00
Tom Lane
8daeb5ddd6 Add SP-GiST (space-partitioned GiST) index access method.
SP-GiST is comparable to GiST in flexibility, but supports non-balanced
partitioned search structures rather than balanced trees.  As described at
PGCon 2011, this new indexing structure can beat GiST in both index build
time and query speed for search problems that it is well matched to.

There are a number of areas that could still use improvement, but at this
point the code seems committable.

Teodor Sigaev and Oleg Bartunov, with considerable revisions by Tom Lane
2011-12-17 16:42:30 -05:00
Robert Haas
0d76b60db4 Various micro-optimizations for GetSnapshopData().
Heikki Linnakangas had the idea of rearranging GetSnapshotData to
avoid checking for sub-XIDs when no top-level XID is present.  This
patch does that plus further a bit of further, related rearrangement.
Benchmarking show a significant improvement on unlogged tables at
higher concurrency levels, and mostly indifferent result on permanent
tables (which are presumably bottlenecked elsewhere).  Most of the
benefit seems to come from using the new NormalTransactionIdPrecedes()
macro rather than the function call TransactionIdPrecedes().
2011-12-16 21:48:47 -05:00
Andrew Dunstan
6d09b2105f include_if_exists facility for config file.
This works the same as include, except that an error is not thrown
if the file is missing. Instead the fact that it's missing is
logged.

Greg Smith, reviewed by Euler Taveira de Oliveira.
2011-12-15 19:40:58 -05:00
Robert Haas
1da5c11959 Improve behavior of concurrent ALTER <relation> .. SET SCHEMA.
If the referrent of a name changes while we're waiting for the lock,
we must recheck permissons.  We also now check the relkind before
locking, since it's easy to do that long the way.

Patch by me; review by Noah Misch.
2011-12-15 19:02:58 -05:00
Robert Haas
74a1d4fe7c Improve behavior of concurrent rename statements.
Previously, renaming a table, sequence, view, index, foreign table,
column, or trigger checked permissions before locking the object, which
meant that if permissions were revoked during the lock wait, we would
still allow the operation.  Similarly, if the original object is dropped
and a new one with the same name is created, the operation will be allowed
if we had permissions on the old object; the permissions on the new
object don't matter.  All this is now fixed.

Along the way, attempting to rename a trigger on a foreign table now gives
the same error message as trying to create one there in the first place
(i.e. that it's not a table or view) rather than simply stating that no
trigger by that name exists.

Patch by me; review by Noah Misch.
2011-12-15 19:02:38 -05:00
Tom Lane
2dd9322ba6 Move BKP_REMOVABLE bit from individual WAL records to WAL page headers.
Removing this bit from xl_info allows us to restore the old limit of four
(not three) separate pages touched by a WAL record, which is needed for the
upcoming SP-GiST feature, and will likely be useful elsewhere in future.

When we implemented XLR_BKP_REMOVABLE in 2007, we had to do it like that
because no special WAL-visible action was taken when starting a backup.
However, now we force a segment switch when starting a backup, so a
compressing WAL archiver (such as pglesslog) that uses the state shown in
the current page header will not be fooled as to removability of backup
blocks.  The only downside is that the archiver will not return to
compressing mode for up to one WAL page after the backup is over, which is
a small price to pay for getting back the extra xl_info bit.  In any case
the archiver could look for XLOG_BACKUP_END records if it thought it was
worth the trouble to do so.

Bump XLOG_PAGE_MAGIC since this is effectively a change in WAL format.
2011-12-12 16:22:14 -05:00
Heikki Linnakangas
8409b60476 Revert the behavior of inet/cidr functions to not unpack the arguments.
I forgot to change the functions to use the PG_GETARG_INET_PP() macro,
when I changed DatumGetInetP() to unpack the datum, like Datum*P macros
usually do. Also, I screwed up the definition of the PG_GETARG_INET_PP()
macro, and didn't notice because it wasn't used.

This fixes the memory leak when sorting inet values, as reported
by Jochen Erwied and debugged by Andres Freund. Backpatch to 8.3, like
the previous patch that broke it.
2011-12-12 10:10:53 +02:00
Andrew Dunstan
0f44335122 Miscellaneous cleanup to silence compiler warnings seen on Mingw.
Remove some dead code, conditionally declare some items or call
some code, and fix one or two declarations.
2011-12-10 18:15:15 -05:00
Peter Eisentraut
5bcf8ede45 Add ALTER FOREIGN DATA WRAPPER / RENAME and ALTER SERVER / RENAME 2011-12-09 20:42:30 +02:00
Heikki Linnakangas
9f0d2bdc88 Don't set reachedMinRecoveryPoint during crash recovery. In crash recovery,
we don't reach consistency before replaying all of the WAL. Rename the
variable to reachedConsistency, to make its intention clearer.

In master, that was an active bug because of the recent patch to
immediately PANIC if a reference to a missing page is found in WAL after
reaching consistency, as Tom Lane's test case demonstrated. In 9.1 and 9.0,
the only consequence was a misleading "consistent recovery state reached at
%X/%X" message in the log at the beginning of crash recovery (the database
is not consistent at that point yet). In 8.4, the log message was not
printed in crash recovery, even though there was a similar
reachedMinRecoveryPoint local variable that was also set early. So,
backpatch to 9.1 and 9.0.
2011-12-09 15:21:12 +02:00
Heikki Linnakangas
5d8a894e30 Cancel running query if it is detected that the connection to the client is
lost. The only way we detect that at the moment is when write() fails when
we try to write to the socket.

Florian Pflug with small changes by me, reviewed by Greg Jaskiewicz.
2011-12-09 14:21:36 +02:00
Peter Eisentraut
d5f23af6bf Add const qualifiers to node inspection functions
Thomas Munro
2011-12-07 21:46:56 +02:00
Tom Lane
0d0ec527af Fix corner cases in readlink() usage.
Make sure all calls are protected by HAVE_READLINK, and get the buffer
overflow tests right.  Be a bit more paranoid about string length in
_tarWriteHeader(), too.
2011-12-07 13:34:13 -05:00
Magnus Hagander
0d9b09282f Better error reporting if the link target is too long
This situation won't set errno, so using %m will give an incorrect
error message.
2011-12-07 12:19:20 +01:00
Magnus Hagander
1f422db663 Avoid using readlink() on platforms that don't support it
We don't have any such platforms now, but might in the future.

Also, detect cases when a tablespace symlink points to a path that
is longer than we can handle, and give a warning.
2011-12-07 12:09:05 +01:00
Magnus Hagander
16d8e594ac Remove spclocation field from pg_tablespace
Instead, add a function pg_tablespace_location(oid) used to return
the same information, and do this by reading the symbolic link.

Doing it this way makes it possible to relocate a tablespace when the
database is down by simply changing the symbolic link.
2011-12-07 10:37:33 +01:00
Tom Lane
c6e3ac11b6 Create a "sort support" interface API for faster sorting.
This patch creates an API whereby a btree index opclass can optionally
provide non-SQL-callable support functions for sorting.  In the initial
patch, we only use this to provide a directly-callable comparator function,
which can be invoked with a bit less overhead than the traditional
SQL-callable comparator.  While that should be of value in itself, the real
reason for doing this is to provide a datatype-extensible framework for
more aggressive optimizations, as in Peter Geoghegan's recent work.

Robert Haas and Tom Lane
2011-12-07 00:19:39 -05:00
Robert Haas
d2a662182e Typo fixes for commit 2ad36c4e44.
Noted during post-commit review by by Noah Misch.
2011-12-06 15:50:02 -05:00