Commit Graph

12707 Commits

Author SHA1 Message Date
Tom Lane b8347138e9 Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation.  However, we forgot that the pg_tblspc
symlink might still be there.  We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.

There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.

Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-13 18:06:52 -04:00
Tom Lane 966970ed63 Re-revert stats collector latch changes.
This reverts commit cb2f2873d6, restoring
the latch-ified stats collector logic.  We'll soon see if this works any
better on the Windows buildfarm machines.
2012-05-13 14:44:39 -04:00
Tom Lane b85427f227 Attempt to fix some issues in our Windows socket code.
Make sure WaitLatchOrSocket regards FD_CLOSE as a read-ready condition.
We might want to tweak this further, but it was surely wrong as-is.

Make pgwin32_waitforsinglesocket detach its private event object from the
passed socket before returning.  I suspect that failure to do so leads
to race conditions when other code (such as WaitLatchOrSocket) attaches
a different event object to the same socket.  Moreover, the existing
coding meant that repeated calls to pgwin32_waitforsinglesocket would
perform ResetEvent on an event actively connected to a socket, which
is rumored to be an unsafe practice; the WSAEventSelect documentation
appears to recommend against this, though it does not say not to do it
in so many words.

Also, uniformly use the coding pattern "WSAEventSelect(s, NULL, 0)" to
detach events from sockets, rather than passing the event in the second
parameter.  The WSAEventSelect documentation says that the second parameter
is ignored if the third is 0, so theoretically this should make no
difference.  However, elsewhere on the same reference page the use of NULL
in this context is recommended, and I have found suggestions on the net
that some versions of Windows have bugs with a non-NULL second parameter
in this usage.

Some other mostly-cosmetic cleanup, such as using the right one of
WSAGetLastError and GetLastError for reporting errors from these functions.
2012-05-13 14:35:40 -04:00
Tom Lane fd350ef395 Fix bogus declaration of local variable.
rc should be an int here, not a pgsocket.  Fairly harmless as long as
pgsocket is an integer type, but nonetheless wrong.  Error introduced
in commit 87091cb1f1.
2012-05-13 00:30:32 -04:00
Tom Lane 398b240151 Avoid unnecessary process wakeups in the log collector.
syslogger was coded to wake up once per second whether there was anything
useful to do or not.  As part of our campaign to reduce the server's idle
power consumption, change it to use a latch for waiting.  Now, in the
absence of any data to log or any signals to service, it will only wake up
at the programmed logfile rotation times (if any).
2012-05-12 19:21:54 -04:00
Tom Lane 31ad655364 Fix WaitLatchOrSocket to handle EOF on socket correctly.
When using poll(), EOF on a socket is reported with the POLLHUP not
POLLIN flag (at least on Linux).  WaitLatchOrSocket failed to check
this bit, causing it to go into a busy-wait loop if EOF occurs.
We earlier fixed the same mistake in the test for the state of the
postmaster_alive socket, but missed it for the caller-supplied socket.
Fortunately, this error is new in 9.2, since 9.1 only had a select()
based code path not a poll() based one.
2012-05-12 16:36:47 -04:00
Simon Riggs 867540b49c Ensure backwards compatibility for GetStableLatestTransactionId() 2012-05-12 13:26:10 +01:00
Peter Eisentraut afe86a9e73 Fix obsolescent C declaration syntax
gcc -Wextra/-Wold-style-declaration thinks that "inline" should go
before the function return type.
2012-05-12 12:52:02 +03:00
Tom Lane d0c231d132 Cosmetic adjustments for postmaster's handling of checkpointer.
Correct some comments, order some operations a bit more consistently.
No functional changes.
2012-05-11 17:46:37 -04:00
Robert Haas 1331cc6c1a Prevent loss of init fork when truncating an unlogged table.
Fixes bug #6635, reported by Akira Kurosawa.
2012-05-11 09:48:56 -04:00
Simon Riggs b762e8f50b Remove extraneous #include "storage/proc.h" 2012-05-11 14:46:46 +01:00
Simon Riggs b06679e012 Ensure age() returns a stable value rather than the latest value 2012-05-11 14:36:24 +01:00
Heikki Linnakangas 3652d72dd4 On GiST page split, release the locks on child pages before recursing up.
When inserting the downlinks for a split gist page, we used hold the locks
on the child pages until the insertion into the parent - and recursively its
parent if it had to be split too - were all completed. Change that so that
the locks on child pages are released after the insertion in the immediate
parent is done, before recursing further up the tree.

This reduces the number of lwlocks that are held simultaneously. Holding
many locks is bad for concurrency, and in extreme cases you can even hit
the limit of 100 simultaneously held lwlocks in a backend. If you're really
unlucky, you can hit the limit while in a critical section, which brings
down the whole system.

This fixes bug #6629 reported by Tom Forbes. Backpatch to 9.1. The page
splitting code was rewritten in 9.1, and the old code did not have this
problem.
2012-05-11 12:35:28 +03:00
Tom Lane cb2f2873d6 Temporarily revert stats collector latch changes so we can ship beta1.
This patch reverts commit 49340037ee and some
follow-on tweaking in pgstat.c.  While the basic scheme of latch-ifying the
stats collector seems sound enough, it's failing on most Windows buildfarm
members for unknown reasons, and there's no time left to debug that before
9.2beta1.  Better to ship a beta version without this improvement.  I hope
to re-revert this once beta1 is out, though.
2012-05-10 17:26:33 -04:00
Tom Lane f40022f1ad Make WaitLatch's WL_POSTMASTER_DEATH result trustworthy; simplify callers.
Per a suggestion from Peter Geoghegan, make WaitLatch responsible for
verifying that the WL_POSTMASTER_DEATH bit it returns is truthful (by
testing PostmasterIsAlive).  Then simplify its callers, who no longer
need to do that for themselves.  Remove weasel wording about falsely-set
result bits from WaitLatch's API contract.
2012-05-10 14:34:53 -04:00
Tom Lane ada8fa08fc Fix Windows implementation of PGSemaphoreLock.
The original coding failed to reset ImmediateInterruptOK before returning,
which would potentially allow a subsequent query-cancel interrupt to be
accepted at an unsafe point.  This is a really nasty bug since it's so hard
to predict the consequences, but they could be unpleasant.

Also, ensure that signal handlers are serviced before this function
returns, even if the semaphore is already set.  This should make the
behavior more like Unix.

Back-patch to all supported versions.
2012-05-10 13:36:14 -04:00
Tom Lane 8ebc908c57 Improve Windows implementation of WaitLatch/WaitLatchOrSocket.
Ensure that signal handlers are serviced before this function returns.
This should make the behavior more like Unix.  Also, add some more
error checking, and make some other cosmetic improvements.

No back-patch since it's not clear whether this is fixing any live bug
that would affect 9.1.  I'm more concerned about 9.2 anyway given our
considerable recent expansions in the usage of WaitLatch.
2012-05-10 13:26:47 -04:00
Tom Lane fd71421b01 Improve tests for postmaster death in auxiliary processes.
In checkpointer and walwriter, avoid calling PostmasterIsAlive unless
WaitLatch has reported WL_POSTMASTER_DEATH.  This saves a kernel call per
iteration of the process's outer loop, which is not all that much, but a
cycle shaved is a cycle earned.  I had already removed the unconditional
PostmasterIsAlive calls in bgwriter and pgstat in previous patches, but
forgot that WL_POSTMASTER_DEATH is supposed to be treated as untrustworthy
(per comment in unix_latch.c); so adjust those two cases to match.

There are a few other places where the same idea might be applied, but only
after substantial code rearrangement, so I didn't bother.
2012-05-10 00:54:56 -04:00
Tom Lane d3ae406f54 Further tweaking of nomenclature in checkpointer.c.
Get rid of some more naming choices that only make sense if you know that
this code used to be in the bgwriter, as well as some stray comments
referencing the bgwriter.
2012-05-10 00:01:10 -04:00
Tom Lane 6308ba05a7 Improve control logic for bgwriter hibernation mode.
Commit 6d90eaaa89 added a hibernation mode
to the bgwriter to reduce the server's idle-power consumption.  However,
its interaction with the detailed behavior of BgBufferSync's feedback
control loop wasn't very well thought out.  That control loop depends
primarily on the rate of buffer allocation, not the rate of buffer
dirtying, so the hibernation mode has to be designed to operate only when
no new buffer allocations are happening.  Also, the check for whether the
system is effectively idle was not quite right and would fail to detect
a constant low level of activity, thus allowing the bgwriter to go into
hibernation mode in a way that would let the cycle time vary quite a bit,
possibly further confusing the feedback loop.  To fix, move the wakeup
support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into
StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode
unless no buffer allocations have happened recently.

In addition, fix the delaying logic to remove the problem of possibly not
responding to signals promptly, which was basically caused by trying to use
the process latch's is_set flag for multiple purposes.  I can't prove it
but I'm suspicious that that hack was responsible for the intermittent
"postmaster does not shut down" failures we've been seeing in the buildfarm
lately.  In any case it did nothing to improve the readability or
robustness of the code.

In passing, express the hibernation sleep time as a multiplier on
BgWriterDelay, not a constant.  I'm not sure whether there's any value in
exposing the longer sleep time as an independently configurable setting,
but we can at least make it act like this for little extra code.
2012-05-09 23:37:10 -04:00
Peter Eisentraut 5d39807a00 Add make dependency so that postgres.bki is rebuilt in major version change
Every time since the current rule for postgres.bki was put in place
when we change the major version, people complain that their tests
fail in strange ways.  This is because the version number in
postgres.bki is not updated, because it has no dependency for that.
And you can't even force the rebuild manually if you don't happen to
know which file has the problem.  Fix that now before it will happen
again.

The only remaining problem with switching major versions, as far as
the regression tests are concerned, is that contrib needs to be
rebuilt.  But that's easily invoked, and in any case the failure modes
are more friendly if you forget that.
2012-05-09 20:45:56 +03:00
Simon Riggs 8f28789bff Rename BgWriterShmem/Request to CheckpointerShmem/Request 2012-05-09 14:23:45 +01:00
Simon Riggs bbd3ec9dce Rename BgWriterCommLock to CheckpointerCommLock 2012-05-09 14:11:48 +01:00
Simon Riggs 5829387381 Avoid xid error from age() function when run on Hot Standby 2012-05-09 13:56:24 +01:00
Tom Lane acd4c7d58b Fix an issue in recent walwriter hibernation patch.
Users of asynchronous-commit mode expect there to be a guaranteed maximum
delay before an async commit's WAL records get flushed to disk.  The
original version of the walwriter hibernation patch broke that.  Add an
extra shared-memory flag to allow async commits to kick the walwriter out
of hibernation mode, without adding any noticeable overhead in cases where
no action is needed.
2012-05-08 23:06:40 -04:00
Tom Lane 49340037ee Reduce idle power consumption of stats collector process.
Latch-ify the stats collector, so that it does not need an arbitrary wakeup
cycle to check for postmaster death.  The incremental savings in idle power
is pretty marginal, since we only had it waking every two seconds; but I
believe that this patch may also improve the collector's performance under
load, by reducing the number of kernel calls made per message when messages
are arriving constantly (we now avoid a select/poll call except when we
need to sleep).  The change also reduces the time needed for a normal
database shutdown on platforms where signals don't interrupt select().
2012-05-08 21:26:46 -04:00
Tom Lane 5461564a9d Reduce idle power consumption of walwriter and checkpointer processes.
This patch modifies the walwriter process so that, when it has not found
anything useful to do for many consecutive wakeup cycles, it extends its
sleep time to reduce the server's idle power consumption.  It reverts to
normal as soon as it's done any successful flushes.  It's still true that
during any async commit, backends check for completed, unflushed pages of
WAL and signal the walwriter if there are any; so that in practice the
walwriter can get awakened and returned to normal operation sooner than the
sleep time might suggest.

Also, improve the checkpointer so that it uses a latch and a computed delay
time to not wake up at all except when it has something to do, replacing a
previous hardcoded 0.5 sec wakeup cycle.  This also is primarily useful for
reducing the server's power consumption when idle.

In passing, get rid of the dedicated latch for signaling the walwriter in
favor of using its procLatch, since that comports better with possible
generic signal handlers using that latch.  Also, fix a pre-existing bug
with failure to save/restore errno in walwriter's signal handlers.

Peter Geoghegan, somewhat simplified by Tom
2012-05-08 20:03:26 -04:00
Magnus Hagander 916d589a10 Make "unexpected EOF" messages DEBUG1 unless in an open transaction
"Unexpected EOF on client connection" without an open transaction
is mostly noise, so turn it into DEBUG1. With an open transaction it's
still indicating a problem, so keep those as ERROR, and change the message
to indicate that it happened in a transaction.
2012-05-07 18:50:44 +02:00
Tom Lane 71b9549d05 Overdue code review for transaction-level advisory locks patch.
Commit 62c7bd31c8 had assorted problems, most
visibly that it broke PREPARE TRANSACTION in the presence of session-level
advisory locks (which should be ignored by PREPARE), as per a recent
complaint from Stephen Rees.  More abstractly, the patch made the
LockMethodData.transactional flag not merely useless but outright
dangerous, because in point of fact that flag no longer tells you anything
at all about whether a lock is held transactionally.  This fix therefore
removes that flag altogether.  We now rely entirely on the convention
already in use in lock.c that transactional lock holds must be owned by
some ResourceOwner, while session holds are never so owned.  Setting the
locallock struct's owner link to NULL thus denotes a session hold, and
there is no redundant marker for that.

PREPARE TRANSACTION now works again when there are session-level advisory
locks, and it is also able to transfer transactional advisory locks to the
prepared transaction, but for implementation reasons it throws an error if
we hold both types of lock on a single lockable object.  Perhaps it will be
worth improving that someday.

Assorted other minor cleanup and documentation editing, as well.

Back-patch to 9.1, except that in the 9.1 branch I did not remove the
LockMethodData.transactional flag for fear of causing an ABI break for
any external code that might be examining those structs.
2012-05-04 17:44:31 -04:00
Bruce Momjian ebcaa5fcde Remove BSD/OS (BSDi) port. There are no known users upgrading to
Postgres 9.2, and perhaps no existing users either.
2012-05-03 10:58:44 -04:00
Peter Eisentraut e9605a039b Even more duplicate word removal, in the spirit of the season 2012-05-02 20:56:03 +03:00
Robert Haas 0038110421 Avoid repeated CLOG access from heap_hot_search_buffer.
At the time we check whether the tuple is dead to all running
transactions, we've already verified that it isn't visible to our
scan, setting hint bits if appropriate.  So there's no need to
recheck CLOG for the all-dead test we do just a moment later.
So, add HeapTupleIsSurelyDead() to test the appropriate condition
under the assumption that all relevant hit bits are already set.

Review by Tom Lane.
2012-05-02 12:40:07 -04:00
Robert Haas 1b4998fd44 Further corrections from the department of redundancy department.
Thom Brown
2012-05-02 11:11:25 -04:00
Robert Haas e01e66f808 More duplicate word removal. 2012-05-02 09:28:16 -04:00
Heikki Linnakangas f291ccd43e Remove duplicate words in comments.
Found these with grep -r "for for ".
2012-05-02 10:20:27 +03:00
Tom Lane 50c2d6a1a6 Kill some remaining references to SVR4 and univel.
Both terms still appear in a few places, but I thought it best to leave
those alone in context.
2012-05-02 00:29:17 -04:00
Peter Eisentraut f2f9439fbf Remove dead ports
Remove the following ports:

- dgux
- nextstep
- sunos4
- svr4
- ultrix4
- univel

These are obsolete and not worth rescuing.  In most cases, there is
circumstantial evidence that they wouldn't work anymore anyway.
2012-05-01 22:11:12 +03:00
Tom Lane 809e7e21af Converge all SQL-level statistics timing values to float8 milliseconds.
This patch adjusts the core statistics views to match the decision already
taken for pg_stat_statements, that values representing elapsed time should
be represented as float8 and measured in milliseconds.  By using float8,
we are no longer tied to a specific maximum precision of timing data.
(Internally, it's still microseconds, but we could now change that without
needing changes at the SQL level.)

The columns affected are
pg_stat_bgwriter.checkpoint_write_time
pg_stat_bgwriter.checkpoint_sync_time
pg_stat_database.blk_read_time
pg_stat_database.blk_write_time
pg_stat_user_functions.total_time
pg_stat_user_functions.self_time
pg_stat_xact_user_functions.total_time
pg_stat_xact_user_functions.self_time

The first four of these are new in 9.2, so there is no compatibility issue
from changing them.  The others require a release note comment that they
are now double precision (and can show a fractional part) rather than
bigint as before; also their underlying statistics functions now match
the column definitions, instead of returning bigint microseconds.
2012-04-30 14:03:33 -04:00
Robert Haas 0d2235a25b Remove duplicate word in comment.
Noted by Peter Geoghegan.
2012-04-30 13:14:46 -04:00
Tom Lane 1dd89eadcd Rename I/O timing statistics columns to blk_read_time and blk_write_time.
This seems more consistent with the pre-existing choices for names of
other statistics columns.  Rename assorted internal identifiers to match.
2012-04-29 18:13:33 -04:00
Tom Lane 309c64745e Rename track_iotiming GUC to track_io_timing.
This spelling seems significantly more readable to me.
2012-04-29 16:23:54 -04:00
Peter Eisentraut 81107282a5 Change return type of ExceptionalCondition to void and mark it noreturn
In ancient times, it was thought that this wouldn't work because of
TrapMacro/AssertMacro, but changing those to use a comma operator
appears to work without compiler warnings.
2012-04-29 21:20:14 +03:00
Tom Lane cdbad241f4 Clear I/O timing counters after sending them to the stats collector.
This oversight caused the reported times to accumulate in an O(N^2)
fashion the longer a backend runs.
2012-04-28 15:11:13 -04:00
Tom Lane d6f7d4fdc5 Fix printing of whole-row Vars at top level of a SELECT targetlist.
Normally whole-row Vars are printed as "tabname.*".  However, that does not
work at top level of a targetlist, because per SQL standard the parser will
think that the "*" should result in column-by-column expansion; which is
not at all what a whole-row Var implies.  We used to just print the table
name in such cases, which works most of the time; but it fails if the table
name matches a column name available anywhere in the FROM clause.  This
could lead for instance to a view being interpreted differently after dump
and reload.  Adding parentheses doesn't fix it, but there is a reasonably
simple kluge we can use instead: attach a no-op cast, so that the "*" isn't
syntactically at top level anymore.  This makes the printing of such
whole-row Vars a lot more consistent with other Vars, and may indeed fix
more cases than just the reported one; I'm suspicious that cases involving
schema qualification probably didn't work properly before, either.

Per bug report and fix proposal from Abbas Butt, though this patch is quite
different in detail from his.

Back-patch to all supported versions.
2012-04-27 19:49:18 -04:00
Tom Lane 537b266953 Fix syslogger's rotation disable/re-enable logic.
If it fails to open a new log file, the syslogger assumes there's something
wrong with its parameters (such as log_directory), and stops attempting
automatic time-based or size-based log file rotations.  Sending it SIGHUP
is supposed to start that up again.  However, the original coding for that
was really bogus, involving clobbering a couple of GUC variables and hoping
that SIGHUP processing would restore them.  Get rid of that technique in
favor of maintaining a separate flag showing we've turned rotation off.
Per report from Mark Kirkwood.

Also, the syslogger will automatically attempt to create the log_directory
directory if it doesn't exist, but that was only happening at startup.
For consistency and ease of use, it should do the same whenever the value
of log_directory is changed by SIGHUP.

Back-patch to all supported branches.
2012-04-27 00:12:42 -04:00
Robert Haas 3424bff90f Prevent index-only scans from returning wrong answers under Hot Standby.
The alternative of disallowing index-only scans in HS operation was
discussed, but the consensus was that it was better to treat marking
a page all-visible as a recovery conflict for snapshots that could still
fail to see XIDs on that page.  We may in the future try to soften this,
so that we simply force index scans to do heap fetches in cases where
this may be an issue, rather than throwing a hard conflict.
2012-04-26 20:00:21 -04:00
Tom Lane 7c85aa39fc Fix oversight in recent parameterized-path patch.
bitmap_scan_cost_est() has to be able to cope with a BitmapOrPath, but
I'd taken a shortcut that didn't work for that case.  Noted by Heikki.
Add some regression tests since this area is evidently under-covered.
2012-04-26 14:17:44 -04:00
Tom Lane 9fa82c9809 Fix planner's handling of RETURNING lists in writable CTEs.
setrefs.c failed to do "rtoffset" adjustment of Vars in RETURNING lists,
which meant they were left with the wrong varnos when the RETURNING list
was in a subquery.  That was never possible before writable CTEs, of
course, but now it's broken.  The executor fails to notice any problem
because ExecEvalVar just references the ecxt_scantuple for any normal
varno; but EXPLAIN breaks when the varno is wrong, as illustrated in a
recent complaint from Bartosz Dmytrak.

Since the eventual rtoffset of the subquery is not known at the time
we are preparing its plan node, the previous scheme of executing
set_returning_clause_references() at that time cannot handle this
adjustment.  Fortunately, it turns out that we don't really need to do it
that way, because all the needed information is available during normal
setrefs.c execution; we just have to dig it out of the ModifyTable node.
So, do that, and get rid of the kluge of early setrefs processing of
RETURNING lists.  (This is a little bit of a cheat in the case of inherited
UPDATE/DELETE, because we are not passing a "root" struct that corresponds
exactly to what the subplan was built with.  But that doesn't matter, and
anyway this is less ugly than early setrefs processing was.)

Back-patch to 9.1, where the problem became possible to hit.
2012-04-25 20:20:33 -04:00
Tom Lane 9873001e6d Another trivial comment-typo fix. 2012-04-25 14:28:58 -04:00
Robert Haas 3ce7f18e92 Casts to or from a domain type are ignored; warn and document.
Prohibiting this outright would break dumps taken from older versions
that contain such casts, which would create far more pain than is
justified here.

Per report by Jaime Casanova and subsequent discussion.
2012-04-24 09:20:53 -04:00
Robert Haas 5d4b60f2f2 Lots of doc corrections.
Josh Kupershmidt
2012-04-23 22:43:09 -04:00
Robert Haas 7ab9b2f3b7 Rearrange lazy_scan_heap to avoid visibility map race conditions.
We must set the visibility map bit before releasing our exclusive lock
on the heap page; otherwise, someone might clear the heap page bit
before we set the visibility map bit, leading to a situation where the
visibility map thinks the page is all-visible but it's really not.

This problem has existed since 8.4, but it wasn't critical before we
had index-only scans, since the worst case scenario was that the page
wouldn't get vacuumed until the next scan_all vacuum.

Along the way, a couple of minor, related improvements: (1) if we
pause the heap scan to do an index vac cycle, release any visibility
map page we're holding, since really long-running pins are not good
for a variety of reasons; and (2) warn if we see a page that's marked
all-visible in the visibility map but not on the page level, since
that should never happen any more (it was allowed in previous
releases, but not in 9.2).
2012-04-23 22:08:06 -04:00
Robert Haas 85efd5f065 Reduce hash size for compute_array_stats, compute_tsvector_stats.
The size is only a hint, but a big hint chews up a lot of memory without
apparently improving performance much.

Analysis and patch by Noah Misch.
2012-04-23 22:05:41 -04:00
Peter Eisentraut 48658a1b81 Fix some typos
Josh Kupershmidt
2012-04-22 19:23:47 +03:00
Tom Lane 33e99153e9 Use fuzzy not exact cost comparison for the final tie-breaker in add_path.
Instead of an exact cost comparison, use a fuzzy comparison with 1e-10
delta after all other path metrics have proved equal.  This is to avoid
having platform-specific roundoff behaviors determine the choice when
two paths are really the same to our cost estimators.  Adjust the
recently-added test case that made it obvious we had a problem here.
2012-04-21 00:51:14 -04:00
Alvaro Herrera 09ff76fcdb Recast "ONLY" column CHECK constraints as NO INHERIT
The original syntax wasn't universally loved, and it didn't allow its
usage in CREATE TABLE, only ALTER TABLE.  It now works everywhere, and
it also allows using ALTER TABLE ONLY to add an uninherited CHECK
constraint, per discussion.

The pg_constraint column has accordingly been renamed connoinherit.

This commit partly reverts some of the changes in
61d81bd28d, particularly some pg_dump and
psql bits, because now pg_get_constraintdef includes the necessary NO
INHERIT within the constraint definition.

Author: Nikhil Sontakke
Some tweaks by me
2012-04-20 23:56:57 -03:00
Tom Lane 1f03630011 Adjust join_search_one_level's handling of clauseless joins.
For an initial relation that lacks any join clauses (that is, it has to be
cartesian-product-joined to the rest of the query), we considered only
cartesian joins with initial rels appearing later in the initial-relations
list.  This creates an undesirable dependency on FROM-list order.  We would
never fail to find a plan, but perhaps we might not find the best available
plan.  Noted while discussing the logic with Amit Kapila.

Improve the comments a bit in this area, too.

Arguably this is a bug fix, but given the lack of complaints from the
field I'll refrain from back-patching.
2012-04-20 20:10:46 -04:00
Tom Lane 5b7b5518d0 Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate.  We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.

In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage.  This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.

To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing.  This is required at both base scans and joins.  It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree.  Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
2012-04-19 15:53:47 -04:00
Robert Haas 293ec33c32 Remove bogus comment from HeapTupleSatisfiesNow.
This has been wrong for a really long time.  We don't use two-phase
locking to protect against serialization anomalies.

Per discussion on pgsql-hackers about 2011-03-07; original report
by Dan Ports.
2012-04-18 11:50:45 -04:00
Robert Haas 4a6fab03f2 Finish rename of FastPathStrongLocks to FastPathStrongRelationLocks.
Commit 8e5ac74c12 tried to do this renaming,
but I relied on gcc to tell me where I needed to make changes, instead of
grep.

Noted by Jeff Davis.
2012-04-18 11:29:34 -04:00
Robert Haas 53c5b869b4 Tighten up error recovery for fast-path locking.
The previous code could cause a backend crash after BEGIN; SAVEPOINT a;
LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO
SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts
in other situations.

Report by Zoltán Böszörményi; patch review by Jeff Davis.
2012-04-18 11:17:30 -04:00
Robert Haas ab77b2da8b Fix incorrect comment in SetBufferCommitInfoNeedsSave().
Noah Misch spotted the fact that the old comment is in fact incorrect, due
to memory ordering hazards.
2012-04-18 10:55:40 -04:00
Robert Haas e93c0b820f After PageSetAllVisible, use MarkBufferDirty.
Previously, we used SetBufferCommitInfoNeedsSave, but that's really
intended for dirty-marks we can theoretically afford to lose, such as
hint bits.  As for 9.2, the PD_ALL_VISIBLE mustn't be lost in this
way, since we could then end up with a heap page that isn't
all-visible and a visibility map page that is all visible, causing
index-only scans to return wrong answers.
2012-04-18 10:49:37 -04:00
Robert Haas b5eccaef2c Fix copyfuncs/equalfuncs support for ReassignOwnedStmt.
Noah Misch
2012-04-18 10:45:18 -04:00
Robert Haas 53bbc681ca Fix various infelicities in node functions.
Mostly, this consists of adding support for fields which exist in the
structure but aren't handled by copy/equal/outfuncs; but the create
foreign table case can actually produce garbage output.

Noah Misch
2012-04-18 10:43:16 -04:00
Heikki Linnakangas fe546f3da6 Don't wait for the commit record to be replicated if we wrote no WAL.
When using synchronous replication, we waited for the commit record to be
replicated, but if we our transaction didn't write any other WAL records,
that's not required because we don't even flush the WAL locally to disk in
that case. This lead to long waits when committing a transaction that only
modified a temporary table. Bug spotted by Thom Brown.
2012-04-17 16:28:31 +03:00
Peter Eisentraut a33fcd7e79 Fix typo
Kyotaro HORIGUCHI
2012-04-16 15:36:40 +03:00
Robert Haas ea6a2d8d47 Rename synchronous_commit='write' to 'remote_write'.
Fujii Masao, per discussion on pgsql-hackers
2012-04-14 10:53:22 -04:00
Robert Haas 4a2d7ad76f pg_size_pretty(numeric)
The output of the new pg_xlog_location_diff function is of type numeric,
since it could theoretically overflow an int8 due to signedness; this
provides a convenient way to format such values.

Fujii Masao, with some beautification by me.
2012-04-14 08:07:25 -04:00
Tom Lane e54b10a62d Remove the "last ditch" code path in join_search_one_level().
So far as I can tell, it is no longer possible for this heuristic to do
anything useful, because the new weaker definition of
have_relevant_joinclause means that any relation with a joinclause must be
considered joinable to at least one other relation.  It would still be
possible for the code block to be entered, for example if there are join
order restrictions that prevent any join of the current level from being
formed; but in that case it's just a waste of cycles to attempt to form
cartesian joins, since the restrictions will still apply.

Furthermore, IMO the existence of this code path can mask bugs elsewhere;
we would have noticed the problem with cartesian joins a lot sooner if
this code hadn't compensated for it in the simplest case.

Accordingly, let's remove it and see what happens.  I'm committing this
separately from the prerequisite changes in have_relevant_joinclause,
just to make the question easier to revisit if there is some fault in
my logic.
2012-04-13 16:07:18 -04:00
Tom Lane e3ffd05b02 Weaken the planner's tests for relevant joinclauses.
We should be willing to cross-join two small relations if that allows us
to use an inner indexscan on a large relation (that is, the potential
indexqual for the large table requires both smaller relations).  This
worked in simple cases but fell apart as soon as there was a join clause
to a fourth relation, because the existence of any two-relation join clause
caused the planner to not consider clauseless joins between other base
relations.  The added regression test shows an example case adapted from
a recent complaint from Benoit Delbosc.

Adjust have_relevant_joinclause, have_relevant_eclass_joinclause, and
has_relevant_eclass_joinclause to consider that a join clause mentioning
three or more relations is sufficient grounds for joining any subset of
those relations, even if we have to do so via a cartesian join.  Since such
clauses are relatively uncommon, this shouldn't affect planning speed on
typical queries; in fact it should help a bit, because the latter two
functions in particular get significantly simpler.

Although this is arguably a bug fix, I'm not going to risk back-patching
it, since it might have currently-unforeseen consequences.
2012-04-13 16:07:17 -04:00
Peter Eisentraut c0cc526e8b Rename bytea_agg to string_agg and add delimiter argument
Per mailing list discussion, we would like to keep the bytea functions
parallel to the text functions, so rename bytea_agg to string_agg,
which already exists for text.

Also, to satisfy the rule that we don't want aggregate functions of
the same name with a different number of arguments, add a delimiter
argument, just like string_agg for text already has.
2012-04-13 21:36:59 +03:00
Peter Eisentraut 64e1309c76 Consistently quote encoding and locale names in messages 2012-04-13 20:37:07 +03:00
Robert Haas 61167bfaf2 Fix typo in comment. 2012-04-13 08:54:13 -04:00
Robert Haas 5630eddf1e Update lazy_scan_heap header comment.
The previous comment described how things worked in PostgreSQL 8.2
and prior.
2012-04-13 08:51:19 -04:00
Tom Lane 732bfa2448 Fix cost estimation for indexscan filter conditions.
cost_index's method for estimating per-tuple costs of evaluating filter
conditions (a/k/a qpquals) was completely wrong in the presence of derived
indexable conditions, such as range conditions derived from a LIKE clause.
This was largely masked in common cases as a result of all simple operator
clauses having about the same costs, but it could show up in a big way when
dealing with functional indexes containing expensive functions, as seen for
example in bug #6579 from Istvan Endredy.  Rejigger the calculation to give
sane answers when the indexquals aren't a subset of the baserestrictinfo
list.  As a side benefit, we now do the calculation properly for cases
involving join clauses (ie, parameterized indexscans), which we always
overestimated before.

There are still cases where this is an oversimplification, such as clauses
that can be dropped because they are implied by a partial index's
predicate.  But we've never accounted for that in cost estimates before,
and I'm not convinced it's worth the cycles to try to do so.
2012-04-11 20:24:17 -04:00
Tom Lane 880bfc3287 Silently ignore any nonexistent schemas that are listed in search_path.
Previously we attempted to throw an error or at least warning for missing
schemas, but this was done inconsistently because of implementation
restrictions (in many cases, GUC settings are applied outside transactions
so that we can't do system catalog lookups).  Furthermore, there were
exceptions to the rule even in the beginning, and we'd been poking more
and more holes in it as time went on, because it turns out that there are
lots of use-cases for having some irrelevant items in a common search_path
value.  It seems better to just adopt a philosophy similar to what's always
been done with Unix PATH settings, wherein nonexistent or unreadable
directories are silently ignored.

This commit also fixes the documentation to point out that schemas for
which the user lacks USAGE privilege are silently ignored.  That's always
been true but was previously not documented.

This is mostly in response to Robert Haas' complaint that 9.1 started to
throw errors or warnings for missing schemas in cases where prior releases
had not.  We won't adopt such a significant behavioral change in a back
branch, so something different will be needed in 9.1.
2012-04-11 12:02:50 -04:00
Tom Lane 3769fa5fc6 Make pg_tablespace_location(0) return the database's default tablespace.
This definition is convenient when applying the function to the
reltablespace column of pg_class, since that's what zero means there;
and it doesn't interfere with any other plausible use of the function.
Per gripe from Bruce Momjian.
2012-04-10 21:43:14 -04:00
Tom Lane 0d9819f7e3 Measure epoch of timestamp-without-time-zone from local not UTC midnight.
This patch reverts commit 191ef2b407
and thereby restores the pre-7.3 behavior of EXTRACT(EPOCH FROM
timestamp-without-tz).  Per discussion, the more recent behavior was
misguided on a couple of grounds: it makes it hard to get a
non-timezone-aware epoch value for a timestamp, and it makes this one
case dependent on the value of the timezone GUC, which is incompatible
with having timestamp_part() labeled as immutable.

The other behavior is still available (in all releases) by explicitly
casting the timestamp to timestamp with time zone before applying EXTRACT.

This will need to be called out as an incompatible change in the 9.2
release notes.  Although having mutable behavior in a function marked
immutable is clearly a bug, we're not going to back-patch such a change.
2012-04-10 12:04:42 -04:00
Tom Lane 65fd91333e Fix an Assert that turns out to be reachable after all.
estimate_num_groups() gets unhappy with
	create table empty();
	select * from empty except select * from empty e2;
I can't see any actual use-case for such a query (and the table is illegal
per SQL spec), but it seems like a good idea that it not cause an assert
failure.
2012-04-09 11:58:24 -04:00
Tom Lane d515365a61 Don't bother copying empty support arrays in a zero-column MergeJoin.
The case could not arise when this code was originally written, but it can
now (since we made zero-column MergeJoins work for the benefit of FULL JOIN
ON TRUE).  I don't think there is any actual bug here, but we might as well
treat it consistently with other uses of COPY_POINTER_FIELD().  Per comment
from Ashutosh Bapat.
2012-04-09 11:41:54 -04:00
Robert Haas 3ae5133b1c Teach SLRU code to avoid replacing I/O-busy pages.
Patch by me; review by Tom Lane and others.
2012-04-08 23:05:55 -04:00
Heikki Linnakangas 03529a3ff9 set_stack_base() no longer needs to be called in PostgresMain.
This was a thinko in previous commit. Now that stack base pointer is now set
in PostmasterMain and SubPostmasterMain, it doesn't need to be set in
PostgresMain anymore.
2012-04-08 19:39:12 +03:00
Heikki Linnakangas ef3883d130 Do stack-depth checking in all postmaster children.
We used to only initialize the stack base pointer when starting up a regular
backend, not in other processes. In particular, autovacuum workers can run
arbitrary user code, and without stack-depth checking, infinite recursion
in e.g an index expression will bring down the whole cluster.

The comment about PL/Java using set_stack_base() is not yet true. As the
code stands, PL/java still modifies the stack_base_ptr variable directly.
However, it's been discussed in the PL/Java mailing list that it should be
changed to use the function, because PL/Java is currently oblivious to the
register stack used on Itanium. There's another issues with PL/Java, namely
that the stack base pointer it sets is not really the base of the stack, it
could be something close to the bottom of the stack. That's a separate issue
that might need some further changes to this code, but that's a different
story.

Backpatch to all supported releases.
2012-04-08 19:07:55 +03:00
Tom Lane 7feecedcce Fix incorrect make maintainer-clean rule. 2012-04-07 18:16:50 -04:00
Tom Lane 95b9c333b2 Further adjustment of comment about qsort_tuple. 2012-04-07 17:48:40 -04:00
Tom Lane a25ef7a5f6 Remove useless variable to suppress compiler warning. 2012-04-07 16:44:43 -04:00
Tom Lane 0ab4db52c0 Fix misleading output from gin_desc().
XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed
with a list link field labeled as "blkno", which was confusing, especially
when the link was empty (InvalidBlockNumber).  Print the metapage block
number instead, since that's what's actually being updated.  We could
include the link values too as a separate field, but not clear it's worth
the trouble.

Back-patch to 8.4 where the dubious code was added.
2012-04-06 18:10:21 -04:00
Tom Lane 17b985b1a0 Fix broken comparetup_datum code.
Commit 337b6f5ecf contained the entirely
fanciful assumption that it had made comparetup_datum unreachable.
Reported and patched by Takashi Yamamoto.

Fix up some not terribly accurate/useful comments from that commit, too.
2012-04-06 16:58:50 -04:00
Tom Lane cea49fe82f Dept of second thoughts: improve the API for AnalyzeForeignTable.
If we make the initially-called function return the table physical-size
estimate, acquire_inherited_sample_rows will be able to use that to
allocate numbers of samples among child tables, when the day comes that
we want to support foreign tables in inheritance trees.
2012-04-06 16:04:10 -04:00
Tom Lane 263d9de66b Allow statistics to be collected for foreign tables.
ANALYZE now accepts foreign tables and allows the table's FDW to control
how the sample rows are collected.  (But only manual ANALYZEs will touch
foreign tables, for the moment, since among other things it's not very
clear how to handle remote permissions checks in an auto-analyze.)

contrib/file_fdw is extended to support this.

Etsuro Fujita, reviewed by Shigeru Hanada, some further tweaking by me.
2012-04-06 15:02:35 -04:00
Simon Riggs 8cb53654db Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLock 2012-04-06 10:21:40 +01:00
Robert Haas 21cc529698 checkopint -> checkpoint
Report by Guillaume Lelarge.
2012-04-05 21:37:33 -04:00
Robert Haas b736aef2ec Publish checkpoint timing information to pg_stat_bgwriter.
Greg Smith, Peter Geoghegan, and Robert Haas
2012-04-05 14:04:37 -04:00
Robert Haas 644828908f Expose track_iotiming data via the statistics collector.
Ants Aasma's original patch to add timing information for buffer I/O
requests exposed this data at the relation level, which was judged too
costly.  I've here exposed it at the database level instead.
2012-04-05 11:40:24 -04:00
Tom Lane c17e863bc7 Fix syslogger to not lose log coherency under high load.
The original coding of the syslogger had an arbitrary limit of 20 large
messages concurrently in progress, after which it would just punt and dump
message fragments to the output file separately.  Our ambitions are a bit
higher than that now, so allow the data structure to expand as necessary.

Reported and patched by Andrew Dunstan; some editing by Tom
2012-04-04 15:05:10 -04:00
Peter Eisentraut 38b9693fd9 Add support for renaming domain constraints 2012-04-03 08:11:51 +03:00
Simon Riggs 68219aaf6b Correct epoch of txid_current() when executed on a Hot Standby server.
Initialise ckptXidEpoch from starting checkpoint and maintain the correct
value as we roll forwards. This allows GetNextXidAndEpoch() to return the
correct epoch when executed during recovery. Backpatch to 9.0 when the
problem is first observable by a user.

Bug report from Daniel Farina
2012-03-29 14:55:30 +01:00
Andrew Dunstan aeca650226 Unbreak Windows builds broken by pgpipe removal. 2012-03-29 04:11:57 -04:00
Heikki Linnakangas 5762a4d909 Inherit max_safe_fds to child processes in EXEC_BACKEND mode.
Postmaster sets max_safe_fds by testing how many open file descriptors it
can open, and that is normally inherited by all child processes at fork().
Not so on EXEC_BACKEND, ie. Windows, however. Because of that, we
effectively ignored max_files_per_process on Windows, and always assumed
a conservative default of 32 simultaneous open files. That could have an
impact on performance, if you need to access a lot of different files
in a query. After this patch, the value is passed to child processes by
save/restore_backend_variables() among many other global variables.

It has been like this forever, but given the lack of complaints about it,
I'm not backpatching this.
2012-03-29 08:19:11 +03:00