Commit Graph

23506 Commits

Author SHA1 Message Date
Heikki Linnakangas 89911b3ab8 Fix GiST buffering build bug, which caused "failed to re-find parent" errors.
We use a hash table to track the parents of inner pages, but when inserting
to a leaf page, the caller of gistbufferinginserttuples() must pass a
correct block number of the leaf's parent page. Before gistProcessItup()
descends to a child page, it checks if the downlink needs to be adjusted to
accommodate the new tuple, and updates the downlink if necessary. However,
updating the downlink might require splitting the page, which might move the
downlink to a page to the right. gistProcessItup() doesn't realize that, so
when it descends to the leaf page, it might pass an out-of-date parent block
number as a result. Fix that by returning the block a tuple was inserted to
from gistbufferinginserttuples().

This fixes the bug reported by Zdeněk Jílovec.
2012-08-16 12:56:24 +03:00
Bruce Momjian d55f1b852d Add C comment about new \c parameter requirement for crashed connections. 2012-08-15 19:17:26 -04:00
Bruce Momjian 41fa3dfb0a Update C comment to NOTICE to reflect previous commit changing the error
level, per report from Tom.
2012-08-15 19:09:37 -04:00
Bruce Momjian fe21fcaf8d In psql, if the is no connection object, e.g. due to a server crash,
require all parameters for \c, rather than using the defaults, which
might be wrong.
2012-08-15 19:05:05 -04:00
Tom Lane 4c5316931f Fix rescan logic in nodeCtescan.
The previous coding essentially assumed that nodes would be rescanned in
the same order they were initialized in; or at least that the "leader" of
a group of CTEscans would be rescanned before any others were required to
execute.  Unfortunately, that isn't even a little bit true.  It's possible
to devise queries in which the leader isn't rescanned until other CTEscans
on the same CTE have run to completion, or even in which the leader never
gets a rescan call at all.

The fix makes the leader specially responsible only for initial creation
and final destruction of the tuplestore; rescan resets are now a
symmetrically shared responsibility.  This means that we might reset the
tuplestore multiple times when restarting a plan subtree containing
multiple CTEscans; but resetting an already-empty tuplestore is cheap
enough that that doesn't seem like a problem.

Per report from Adam Mackler; the new regression test cases are based on
his example query.

Back-patch to 8.4 where CTE scans were introduced.
2012-08-15 19:02:33 -04:00
Bruce Momjian 083b9133aa On second thought, explain why date_trunc("week") on interval values is
not supported in the error message, rather than the docs.
2012-08-15 16:48:05 -04:00
Bruce Momjian 1d9a6ae855 Add C comment that '=' is not documented for plpgsql assignment. 2012-08-15 12:00:56 -04:00
Tom Lane 4d642b5941 Disallow extensions from owning the schema they are assigned to.
This situation creates a dependency loop that confuses pg_dump and probably
other things.  Moreover, since the mental model is that the extension
"contains" schemas it owns, but "is contained in" its extschema (even
though neither is strictly true), having both true at once is confusing for
people too.  So prevent the situation from being set up.

Reported and patched by Thom Brown.  Back-patch to 9.1 where extensions
were added.
2012-08-15 11:28:03 -04:00
Bruce Momjian a973296598 Properly escape usernames in initdb, so names with single-quotes are
supported.  Also add assert to catch future breakage.

Also, improve documentation that "double"-quotes must be used in
pg_hba.conf (not single quotes).
2012-08-15 11:23:15 -04:00
Tom Lane eb919e8fde Resurrect the "last ditch" code path in join_search_one_level().
This essentially reverts commit e54b10a62d,
in which I'd decided that the "last ditch" join logic was useless.  The
folly of that is now exposed by a report from Pavel Stehule: although the
function should always find at least one join in a self-contained join
problem, it can still fail to do so in a sub-problem created by artificial
from_collapse_limit or join_collapse_limit constraints.  Adjust the
comments to describe this, and simplify the code a bit to match the new
coding of the earlier loop in the function.

I'm not terribly happy about this: I still subscribe to the opinion stated
in the previous commit message that the "last ditch" code can obscure logic
bugs elsewhere.  But the alternative seems to be to complicate the earlier
tests for does-this-relation-have-a-join-clause to the point where they can
tell whether the join clauses link outside the current join sub-problem.
And that looks messy, slow, and possibly a source of bugs in itself.
In any case, now is not the time to be inserting experimental code into
9.2, so let's just go back to the time-tested solution.
2012-08-15 00:08:13 -04:00
Tom Lane 17351fce4e Prevent access to external files/URLs via XML entity references.
xml_parse() would attempt to fetch external files or URLs as needed to
resolve DTD and entity references in an XML value, thus allowing
unprivileged database users to attempt to fetch data with the privileges
of the database server.  While the external data wouldn't get returned
directly to the user, portions of it could be exposed in error messages
if the data didn't parse as valid XML; and in any case the mere ability
to check existence of a file might be useful to an attacker.

The ideal solution to this would still allow fetching of references that
are listed in the host system's XML catalogs, so that documents can be
validated according to installed DTDs.  However, doing that with the
available libxml2 APIs appears complex and error-prone, so we're not going
to risk it in a security patch that necessarily hasn't gotten wide review.
So this patch merely shuts off all access, causing any external fetch to
silently expand to an empty string.  A future patch may improve this.

In HEAD and 9.2, also suppress warnings about undefined entities, which
would otherwise occur as a result of not loading referenced DTDs.  Previous
branches don't show such warnings anyway, due to different error handling
arrangements.

Credit to Noah Misch for first reporting the problem, and for much work
towards a solution, though this simplistic approach was not his preference.
Also thanks to Daniel Veillard for consultation.

Security: CVE-2012-3489
2012-08-14 18:31:16 -04:00
Bruce Momjian 03bda4535e Revert "commit_delay" change; just add comment that we don't have
a microsecond specification.
2012-08-14 16:26:08 -04:00
Bruce Momjian e74727440c Add pg_settings units display for "commit_delay" (ms).
Also remove unnecessary units designation in postgresql.conf.sample.
2012-08-14 16:16:45 -04:00
Tom Lane 51fd748e54 Update time zone data files to tzdata release 2012e.
DST law changes in Morocco; Tokelau has relocated to the other side of
the International Date Line; and apparently Olson had Tokelau's GMT
offset wrong by an hour even before that.

There are also a large number of non-significant changes in this update.
Upstream took the opportunity to remove trailing whitespace, and the
SCCS-style version numbers on the individual files are gone too.
2012-08-14 10:54:24 -04:00
Heikki Linnakangas f86e6ba40c Add runtime checks for number of query parameters passed to libpq functions.
The maximum number of parameters supported by the FE/BE protocol is 65535,
as it's transmitted as a 16-bit unsigned integer. However, the nParams
arguments to libpq functions are all of type 'int'. We can't change the
signature of libpq functions, but a simple bounds check is in order to make
it more clear what's going wrong if you try to pass more than 65535
parameters.

Per complaint from Jim Vanns.
2012-08-13 16:36:35 +03:00
Tom Lane c1774d2c81 More fixes for planner's handling of LATERAL.
Re-allow subquery pullup for LATERAL subqueries, except when the subquery
is below an outer join and contains lateral references to relations outside
that outer join.  If we pull up in such a case, we risk introducing lateral
cross-references into outer joins' ON quals, which is something the code is
entirely unprepared to cope with right now; and I'm not sure it'll ever be
worth coping with.

Support lateral refs in VALUES (this seems to be the only additional path
type that needs such support as a consequence of re-allowing subquery
pullup).

Put in a slightly hacky fix for joinpath.c's refusal to consider
parameterized join paths even when there cannot be any unparameterized
ones.  This was causing "could not devise a query plan for the given query"
failures in queries involving more than two FROM items.

Put in an even more hacky fix for distribute_qual_to_rels() being unhappy
with join quals that contain references to rels outside their syntactic
scope; which is to say, disable that test altogether.  Need to think about
how to preserve some sort of debugging cross-check here, while not
expending more cycles than befits a debugging cross-check.
2012-08-12 16:01:26 -04:00
Tom Lane e76af54137 Fix some issues with LATERAL(SELECT UNION ALL SELECT).
The LATERAL marking has to be propagated down to the UNION leaf queries
when we pull them up.  Also, fix the formerly stubbed-off
set_append_rel_pathlist().  It does already have enough smarts to cope with
making a parameterized Append path at need; it just has to not assume that
there *must* be an unparameterized path.
2012-08-11 18:42:56 -04:00
Tom Lane b53800355f Fix dependencies generated during ALTER TABLE ADD CONSTRAINT USING INDEX.
This command generated new pg_depend entries linking the index to the
constraint and the constraint to the table, which match the entries made
when a unique or primary key constraint is built de novo.  However, it did
not bother to get rid of the entries linking the index directly to the
table.  We had considered the issue when the ADD CONSTRAINT USING INDEX
patch was written, and concluded that we didn't need to get rid of the
extra entries.  But this is wrong: ALTER COLUMN TYPE wasn't expecting such
redundant dependencies to exist, as reported by Hubert Depesz Lubaczewski.
On reflection it seems rather likely to break other things as well, since
there are many bits of code that crawl pg_depend for one purpose or
another, and most of them are pretty naive about what relationships they're
expecting to find.  Fortunately it's not that hard to get rid of the extra
dependency entries, so let's do that.

Back-patch to 9.1, where ALTER TABLE ADD CONSTRAINT USING INDEX was added.
2012-08-11 12:51:24 -04:00
Tom Lane a67d6d9a78 Update overlooked comment. 2012-08-10 17:36:54 -04:00
Tom Lane c9b0cbe98b Support having multiple Unix-domain sockets per postmaster.
Replace unix_socket_directory with unix_socket_directories, which is a list
of socket directories, and adjust postmaster's code to allow zero or more
Unix-domain sockets to be created.

This is mostly a straightforward change, but since the Unix sockets ought
to be created after the TCP/IP sockets for safety reasons (better chance
of detecting a port number conflict), AddToDataDirLockFile needs to be
fixed to support out-of-order updates of data directory lockfile lines.
That's a change that had been foreseen to be necessary someday anyway.

Honza Horak, reviewed and revised by Tom Lane
2012-08-10 17:27:15 -04:00
Bruce Momjian 914b1301cc Adjust pgtest coding to be less complex. 2012-08-10 16:46:02 -04:00
Bruce Momjian 99ed473acb Fix pgtest to return proper error code based on 'make' return code. 2012-08-10 14:10:59 -04:00
Tom Lane eaccfded98 Centralize the logic for detecting misplaced aggregates, window funcs, etc.
Formerly we relied on checking after-the-fact to see if an expression
contained aggregates, window functions, or sub-selects when it shouldn't.
This is grotty, easily forgotten (indeed, we had forgotten to teach
DefineIndex about rejecting window functions), and none too efficient
since it requires extra traversals of the parse tree.  To improve matters,
define an enum type that classifies all SQL sub-expressions, store it in
ParseState to show what kind of expression we are currently parsing, and
make transformAggregateCall, transformWindowFuncCall, and transformSubLink
check the expression type and throw error if the type indicates the
construct is disallowed.  This allows removal of a large number of ad-hoc
checks scattered around the code base.  The enum type is sufficiently
fine-grained that we can still produce error messages of at least the
same specificity as before.

Bringing these error checks together revealed that we'd been none too
consistent about phrasing of the error messages, so standardize the wording
a bit.

Also, rewrite checking of aggregate arguments so that it requires only one
traversal of the arguments, rather than up to three as before.

In passing, clean up some more comments left over from add_missing_from
support, and annotate some tests that I think are dead code now that that's
gone.  (I didn't risk actually removing said dead code, though.)
2012-08-10 11:36:15 -04:00
Magnus Hagander b3055ab4fb Fix upper limit of superuser_reserved_connections, add limit for wal_senders
Should be limited to the maximum number of connections excluding
autovacuum workers, not including.

Add similar check for max_wal_senders, which should never be higher than
max_connections.
2012-08-10 14:50:45 +02:00
Simon Riggs da4efa13d8 Turn off WalSender keepalives by default, users can enable if desired 2012-08-09 17:07:03 +01:00
Simon Riggs 87d8bd7c9f Ensure all replication message info is available and correct via WalRcv 2012-08-09 17:03:59 +01:00
Robert Haas be690e291d Make psql -1 < file behave as expected.
Previously, the -1 option was silently ignored.

Also, emit an error if -1 is used in a context where it won't be
respected, to avoid user confusion.

Original patch by Fabien COELHO, but this version is quite different
from the original submission.
2012-08-09 10:02:50 -04:00
Alvaro Herrera 92ec0370eb Fix typo in comment 2012-08-08 17:42:38 -04:00
Tom Lane f630157496 Merge parser's p_relnamespace and p_varnamespace lists into a single list.
Now that we are storing structs in these lists, the distinction between
the two lists can be represented with a couple of extra flags while using
only a single list.  This simplifies the code and should save a little
bit of palloc traffic, since the majority of RTEs are represented in both
lists anyway.
2012-08-08 16:41:31 -04:00
Simon Riggs 8143a56854 Fix minor bug in XLogFileRead() that accidentally worked.
Cascading replication copied the incoming file into pg_xlog but
didn't set path correctly, so the first attempt to open file failed
causing it to loop around and look for file in pg_xlog. So the
earlier coding worked, but accidentally rather than by design.

Spotted by Fujii Masao, fix by Fujii Masao and Simon Riggs
2012-08-08 21:25:23 +01:00
Robert Haas 21786db81f Fix cache flush hazard in event trigger cache.
Bug spotted by Jeff Davis using -DCLOBBER_CACHE_ALWAYS.
2012-08-08 16:38:37 -04:00
Bruce Momjian 2751740ab5 Add additional C comments for to_date/to_char() fixes. 2012-08-08 13:27:01 -04:00
Tom Lane 633f2fbd88 Update isolation tests' README file.
The directions explaining about running the prepared-transactions test
were not updated in commit ae55d9fbe3.
2012-08-08 12:02:07 -04:00
Tom Lane db108349bf Fix TwoPhaseGetDummyBackendId().
This was broken in commit ed0b409d22,
which revised the GlobalTransactionData struct to not include the
associated PGPROC as its first member, but overlooked one place where
a cast was used in reliance on that equivalence.

The most effective way of fixing this seems to be to create a new function
that looks up the GlobalTransactionData struct given the XID, and make
both TwoPhaseGetDummyBackendId and TwoPhaseGetDummyProc rely on that.

Per report from Robert Ross.
2012-08-08 11:52:02 -04:00
Tom Lane 5ebaaa4944 Implement SQL-standard LATERAL subqueries.
This patch implements the standard syntax of LATERAL attached to a
sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM,
since set-returning function calls are expected to be one of the principal
use-cases.

The main change here is a rewrite of the mechanism for keeping track of
which relations are visible for column references while the FROM clause is
being scanned.  The parser "namespace" lists are no longer lists of bare
RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE
pointer as well as some visibility-controlling flags.  Aside from
supporting LATERAL correctly, this lets us get rid of the ancient hacks
that required rechecking subqueries and JOIN/ON and function-in-FROM
expressions for invalid references after they were initially parsed.
Invalid column references are now always correctly detected on sight.

In passing, remove assorted parser error checks that are now dead code by
virtue of our having gotten rid of add_missing_from, as well as some
comments that are obsolete for the same reason.  (It was mainly
add_missing_from that caused so much fudging here in the first place.)

The planner support for this feature is very minimal, and will be improved
in future patches.  It works well enough for testing purposes, though.

catversion bump forced due to new field in RangeTblEntry.
2012-08-07 19:02:54 -04:00
Tom Lane 5078be4804 Tweak new Perl pgindent for compatibility with middle-aged Perls.
We seem to have a rough policy that our Perl scripts should work with
Perl 5.8, so make this one do so.  Main change is to not use the newfangled
\h character class in regexes; "[ \t]" is a serviceable replacement.
2012-08-07 17:52:53 -04:00
Robert Haas eea65943c6 Fix memory leaks in event trigger code.
Spotted by Jeff Davis.
2012-08-07 17:00:16 -04:00
Bruce Momjian ac78c4178b Fix to_char(), to_date(), and to_timestamp() to handle negative/BC
century specifications just like positive/AD centuries.  Previously the
behavior was either wrong or inconsistent with positive/AD handling.

Centuries without years now always assume the first year of the century,
which is now documented.
2012-08-07 13:34:44 -04:00
Alvaro Herrera 3a42a3ffd8 Fix redundant wording 2012-08-07 11:43:51 -04:00
Simon Riggs 0f04fc67f7 fsync backup_label after pg_start_backup()
Dave Kerr
2012-08-07 16:19:13 +01:00
Alvaro Herrera f5f8e7169f Make strings identical 2012-08-06 12:45:08 -04:00
Magnus Hagander 254316f5a2 Complain with proper error message if streaming stops prematurely
In particular, with a controlled shutdown of the master, pg_basebackup
with streaming log could terminate without an error message, even though
the backup is not consistent.

In passing, fix a few cases where walfile wasn't properly set to -1 after
closing.

Fujii Masao
2012-08-06 13:53:46 +02:00
Heikki Linnakangas 3ff15883b1 Perform conversion from Python unicode to string/bytes object via UTF-8.
We used to convert the unicode object directly to a string in the server
encoding by calling Python's PyUnicode_AsEncodedString function. In other
words, we used Python's routines to do the encoding. However, that has a
few problems. First of all, it required keeping a mapping table of Python
encoding names and PostgreSQL encodings. But the real killer was that Python
doesn't support EUC_TW and MULE_INTERNAL encodings at all.

Instead, convert the Python unicode object to UTF-8, and use PostgreSQL's
encoding conversion functions to convert from UTF-8 to server encoding. We
were already doing the same in the other direction in PLyUnicode_FromString,
so this is more consistent, too.

Note: This makes SQL_ASCII to behave more leniently. We used to map
SQL_ASCII to Python's 'ascii', which on Python means strict 7-bit ASCII
only, so you got an error if the python string contained anything but pure
ASCII. You no longer get an error; you get the UTF-8 representation of the
string instead.

Backpatch to 9.0, where these conversions were introduced.

Jan Urbański
2012-08-06 14:09:50 +03:00
Bruce Momjian 149ac7d455 Replace pgindent shell script with Perl script. Update perltidy
instructions to perltidy Perl files that lack Perl file extensions.

pgindent Perl coding by Andrew Dunstan, restructured by me.
2012-08-04 12:41:21 -04:00
Tom Lane 3152bf722f Fix bugs with parsing signed hh:mm and hh:mm:ss fields in interval input.
DecodeInterval() failed to honor the "range" parameter (the special SQL
syntax for indicating which fields appear in the literal string) if the
time was signed.  This seems inappropriate, so make it work like the
not-signed case.  The inconsistency was introduced in my commit
f867339c01, which as noted in its log message
was only really focused on making SQL-compliant literals work per spec.
Including a sign here is not per spec, but if we're going to allow it
then it's reasonable to expect it to work like the not-signed case.

Also, remove bogus setting of tmask, which caused subsequent processing to
think that what had been given was a timezone and not an hh:mm(:ss) field,
thus confusing checks for redundant fields.  This seems to be an aboriginal
mistake in Lockhart's commit 2cf1642461.

Add regression test cases to illustrate the changed behaviors.

Back-patch as far as 8.4, where support for spec-compliant interval
literals was added.

Range problem reported and diagnosed by Amit Kapila, tmask problem by me.
2012-08-03 17:40:43 -04:00
Tom Lane f786e91a75 Improve underdocumented btree_xlog_delete_get_latestRemovedXid() code.
As noted by Noah Misch, btree_xlog_delete_get_latestRemovedXid is
critically dependent on the assumption that it's examining a consistent
state of the database.  This was undocumented though, so the
seemingly-unrelated check for no active HS sessions might be thought to be
merely an optional optimization.  Improve comments, and add an explicit
check of reachedConsistency just to be sure.

This function returns InvalidTransactionId (thereby killing all HS
transactions) in several cases that are not nearly unlikely enough for my
taste.  This commit doesn't attempt to fix those deficiencies, just
document them.

Back-patch to 9.2, not from any real functional need but just to keep the
branches more closely synced to simplify possible future back-patching.
2012-08-03 15:41:18 -04:00
Tom Lane c1793f2e0c In SPGiST replay, do conflict resolution before modifying the page.
In yesterday's commit 962e0cc71e, I added the
ResolveRecoveryConflictWithSnapshot call in the wrong place.  I correctly
put it before spgRedoVacuumRedirect itself would modify the index page ---
but not before RestoreBkpBlocks, so replay of a record with a full-page
image would modify the page before kicking off any conflicting HS
transactions.  Oops.
2012-08-03 15:23:14 -04:00
Tom Lane 962e0cc71e Fix race conditions associated with SPGiST redirection tuples.
The correct test for whether a redirection tuple is removable is whether
tuple's xid < RecentGlobalXmin, not OldestXmin; the previous coding
failed to protect index searches being done in concurrent transactions that
have no XID.  This mirrors the recent fix in btree's page recycling logic
made in commit d3abbbebe5.

Also, WAL-log the newest XID of any removed redirection tuple on an index
page, and apply ResolveRecoveryConflictWithSnapshot during InHotStandby WAL
replay.  This protects against concurrent Hot Standby transactions possibly
needing to see the redirection tuple(s).

Per my query of 2012-03-12 and subsequent discussion.
2012-08-02 15:34:14 -04:00
Tom Lane 41b9c8452b Replace libpq's "row processor" API with a "single row" mode.
After taking awhile to digest the row-processor feature that was added to
libpq in commit 92785dac2e, we've concluded
it is over-complicated and too hard to use.  Leave the core infrastructure
changes in place (that is, there's still a row processor function inside
libpq), but remove the exposed API pieces, and instead provide a "single
row" mode switch that causes PQgetResult to return one row at a time in
separate PGresult objects.

This approach incurs more overhead than proper use of a row processor
callback would, since construction of a PGresult per row adds extra cycles.
However, it is far easier to use and harder to break.  The single-row mode
still affords applications the primary benefit that the row processor API
was meant to provide, namely not having to accumulate large result sets in
memory before processing them.  Preliminary testing suggests that we can
probably buy back most of the extra cycles by micro-optimizing construction
of the extra results, but that task will be left for another day.

Marko Kreen
2012-08-02 13:10:30 -04:00
Tom Lane f6ce81f55a Fix WITH attached to a nested set operation (UNION/INTERSECT/EXCEPT).
Parse analysis neglected to cover the case of a WITH clause attached to an
intermediate-level set operation; it only handled WITH at the top level
or WITH attached to a leaf-level SELECT.  Per report from Adam Mackler.

In HEAD, I rearranged the order of SelectStmt's fields to put withClause
with the other fields that can appear on non-leaf SelectStmts.  In back
branches, leave it alone to avoid a possible ABI break for third-party
code.

Back-patch to 8.4 where WITH support was added.
2012-07-31 17:56:21 -04:00
Tom Lane b76356ac22 Fix syslogger so that log_truncate_on_rotation works in the first rotation.
In the original coding of the log rotation stuff, we did not bother to make
the truncation logic work for the very first rotation after postmaster
start (or after a syslogger crash and restart).  It just always appended
in that case.  It did not seem terribly important at the time, but we've
recently had two separate complaints from people who expected it to work
unsurprisingly.  (Both users tend to restart the postmaster about as often
as a log rotation is configured to happen, which is maybe not typical use,
but still...)  Since the initial log file is opened in the postmaster,
fixing this requires passing down some more state to the syslogger child
process.

It's always been like this, so back-patch to all supported branches.
2012-07-31 14:36:54 -04:00
Alvaro Herrera 2f29f011c8 pg_basebackup: stylistic adjustments
The most user-visible part of this is to change the long options
--statusint and --noloop to --status-interval and --no-loop,
respectively, per discussion.

Also, consistently enclose file names in double quotes, per our
conventions; and consistently use the term "transaction log file" to
talk about WAL segments.  (Someday we may need to go over this
terminology and make it consistent across the whole source code.)

Finally, reflow the code to better fit in 80 columns, and have pgindent
fix it up some more.
2012-07-31 11:02:39 -04:00
Tom Lane 9ae8ebe0b2 Improve reporting of error situations in find_other_exec().
This function suppressed any stderr output from the called program, which
is unnecessary in the normal case and unhelpful in error cases.  It also
gave a rather opaque message along the lines of "fgets failure: Success"
in case the called program failed to return anything on stdout.  Since
we've seen multiple reports of people not understanding what's wrong when
pg_ctl reports this, improve the message.

Back-patch to all active branches.
2012-07-27 19:31:13 -04:00
Tom Lane 26b438694c Only allow autovacuum to be auto-canceled by a directly blocked process.
In the original coding of the autovacuum cancel feature, commit
acac68b2bc, an autovacuum process was
considered a target for cancellation if it was found to hard-block any
process examined in the deadlock search.  This patch tightens the test so
that the autovacuum must directly hard-block the current process.  This
should make the behavior more predictable in general, and in particular
it ensures that an autovacuum will not be canceled with less than
deadlock_timeout grace period.  In the old coding, it was possible for an
autovacuum to be canceled almost instantly, given unfortunate timing of two
or more other processes' lock attempts.

This also justifies the logging methodology in the recent commit
d7318d43d891bd63e82dcfc27948113ed7b1db80; without this restriction, that
patch isn't providing enough information to see the connection of the
canceling process to the autovacuum.  Like that one, patch all the way
back.
2012-07-26 14:29:22 -04:00
Robert Haas d20cdd31c0 Tab complete table names after ALTER TABLE x [NO] INHERIT.
Jeff Janes
2012-07-26 10:16:55 -04:00
Robert Haas d7318d43d8 Log a better message when canceling autovacuum.
The old message was at DEBUG2, so typically it didn't show up in the
log at all.  As a result, in most cases where autovacuum was canceled,
the only information that was logged was the table being vacuumed,
with no indication as to what problem caused the cancel.  Crank up
the level to LOG and add some more details to assist with debugging.

Back-patch all the way, per discussion on pgsql-hackers.
2012-07-26 09:19:03 -04:00
Tom Lane af026b5d9b Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.

More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence.  (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.)  The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record.  However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false).  Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record.  A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.

To fix, get rid of the idea of logging an is_called status different from
reality.  This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible.  In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results.  This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.

In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details.  It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.

Back-patch to all supported branches.
2012-07-25 17:42:23 -04:00
Alvaro Herrera 58f17dcf83 Add translator comments to module names 2012-07-25 00:02:49 -04:00
Alvaro Herrera d7b47e5155 Change syntax of new CHECK NO INHERIT constraints
The initially implemented syntax, "CHECK NO INHERIT (expr)" was not
deemed very good, so switch to "CHECK (expr) NO INHERIT" instead.  This
way it looks similar to SQL-standards compliant constraint attribute.

Backport to 9.2 where the new syntax and feature was introduced.

Per discussion.
2012-07-24 16:01:32 -04:00
Peter Eisentraut d61d9aa750 Update information schema to SQL:2011
This is just a section renumbering for now.  Some details might be
filled in later.
2012-07-23 22:32:56 +03:00
Tom Lane b71258af56 Fix name collision between concurrent regression tests.
Commit f5bcd398ad introduced a test using
a table named "circles" in inherit.sql.  Unfortunately, the concurrently
executed constraints test was already using that table name, so the
parallel regression tests would sometimes fail.  Rename table to dodge
the problem.  Per buildfarm.
2012-07-22 00:01:19 -04:00
Tom Lane 2d46a57ddc Improve copydir() code for the case that fsync is off.
We should avoid calling sync_file_range or posix_fadvise in this case,
since (a) we don't really care if the data gets synced, and might as
well save the kernel calls; (b) at least on Linux we know that the
kernel might block us until it's scheduled the write.

Also, avoid making a useless second traversal of the directory tree
if we're not actually going to call fsync(2) after all.
2012-07-21 20:10:29 -04:00
Tom Lane 2c4f5b4bc5 Use --nosync during make check's initdb call.
We left this out of commit b966dd6c42
so as to get some more buildfarm testing of the new fsync code in initdb.
But since no problems have turned up, it's probably time to save the
cycles.
2012-07-21 19:56:22 -04:00
Tom Lane 1f115d98b9 Suppress volatile-related warning seen in some compilers.
Antique versions of gcc complain about vars that are initialized outside
PG_TRY and then modified within it.  Rather than marking the var volatile,
expend one more line of code.
2012-07-21 19:39:03 -04:00
Tom Lane 31c7c642b6 Account for SRFs in targetlists in planner rowcount estimates.
We made use of the ROWS estimate for set-returning functions used in FROM,
but not for those used in SELECT targetlists; which is a bit of an
oversight considering there are common usages that require the latter
approach.  Improve that.  (I had initially thought it might be worth
folding this into cost_qual_eval, but after investigation concluded that
that wouldn't be very helpful, so just do it separately.)  Per complaint
from David Johnston.

Back-patch to 9.2, but not further, for fear of destabilizing plan choices
in existing releases.
2012-07-21 17:45:07 -04:00
Robert Haas ed0af33247 Revert temporary patch to debug Windows breakage.
This reverts commit 0a248208a0.
2012-07-20 22:31:19 -04:00
Robert Haas 0635c0b524 Repair plpgsql_validator breakage.
Commit 3a0e4d36eb arranged to
reference stack-allocated variables after they were out of scope.
That's no good, so let's arrange to not do that after all.
2012-07-20 21:28:26 -04:00
Andrew Dunstan a1e5705c9f Remove now unneeded results file for disabled prepared transactions case. 2012-07-20 16:30:34 -04:00
Robert Haas 0a248208a0 Temporary patch to try to debug why event trigger patch broke Windows.
Apologies for the ugliness.
2012-07-20 16:22:11 -04:00
Andrew Dunstan ae55d9fbe3 Remove prepared transactions from main isolation test schedule.
There is no point in running this test when prepared transactions are disabled,
which is the default. New make targets that include the test are provided. This
will save some useless waste of cycles on buildfarm machines.

Backpatch to 9.1 where these tests were introduced.
2012-07-20 15:51:40 -04:00
Peter Eisentraut 8ca03aa414 pg_dump: Simplify mkdir() error checking
mkdir() can check for errors itself.  We don't need to code that
ourselves again.
2012-07-20 22:34:11 +03:00
Alvaro Herrera f5bcd398ad connoinherit may be true only for CHECK constraints
The code was setting it true for other constraints, which is
bogus.  Doing so caused bogus catalog entries for such constraints, and
in particular caused an error to be raised when trying to drop a
constraint of types other than CHECK from a table that has children,
such as reported in bug #6712.

In 9.2, additionally ignore connoinherit=true for other constraint
types, to avoid having to force initdb; existing databases might already
contain bogus catalog entries.

Includes a catversion bump (in HEAD only).

Bug report from Miroslav Šulc
Analysis from Amit Kapila and Noah Misch; Amit also contributed the patch.
2012-07-20 14:08:07 -04:00
Tom Lane 8e617e29aa Fix whole-row Var evaluation to cope with resjunk columns (again).
When a whole-row Var is reading the result of a subquery, we need it to
ignore any "resjunk" columns that the subquery might have evaluated for
GROUP BY or ORDER BY purposes.  We've hacked this area before, in commit
68e40998d0, but that fix only covered
whole-row Vars of named composite types, not those of RECORD type; and it
was mighty klugy anyway, since it just assumed without checking that any
extra columns in the result must be resjunk.  A proper fix requires getting
hold of the subquery's targetlist so we can actually see which columns are
resjunk (whereupon we can use a JunkFilter to get rid of them).  So bite
the bullet and add some infrastructure to make that possible.

Per report from Andrew Dunstan and additional testing by Merlin Moncure.
Back-patch to all supported branches.  In 8.3, also back-patch commit
292176a118, which for some reason I had
not done at the time, but it's a prerequisite for this change.
2012-07-20 13:10:58 -04:00
Robert Haas 3a0e4d36eb Make new event trigger facility actually do something.
Commit 3855968f32 added syntax, pg_dump,
psql support, and documentation, but the triggers didn't actually fire.
With this commit, they now do.  This is still a pretty basic facility
overall because event triggers do not get a whole lot of information
about what the user is trying to do unless you write them in C; and
there's still no option to fire them anywhere except at the very
beginning of the execution sequence, but it's better than nothing,
and a good building block for future work.

Along the way, add a regression test for ALTER LARGE OBJECT, since
testing of event triggers reveals that we haven't got one.

Dimitri Fontaine and Robert Haas
2012-07-20 11:39:01 -04:00
Tom Lane be86e3dd5b Rethink checkpointer's fsync-request table representation.
Instead of having one hash table entry per relation/fork/segment, just have
one per relation, and use bitmapsets to represent which specific segments
need to be fsync'd.  This eliminates the need to scan the whole hash table
to implement FORGET_RELATION_FSYNC, which fixes the O(N^2) behavior
recently demonstrated by Jeff Janes for cases involving lots of TRUNCATE or
DROP TABLE operations during a single checkpoint cycle.  Per an idea from
Robert Haas.

(FORGET_DATABASE_FSYNC still sucks, but since dropping a database is a
pretty expensive operation anyway, we'll live with that.)

In passing, improve the delayed-unlink code: remove the pass over the list
in mdpreckpt, since it wasn't doing anything for us except supporting a
useless Assert in mdpostckpt, and fix mdpostckpt so that it will absorb
fsync requests every so often when clearing a large backlog of deletion
requests.
2012-07-19 19:28:22 -04:00
Tom Lane 3072b7bade Send only one FORGET_RELATION_FSYNC request when dropping a relation.
We were sending one per fork, but a little bit of refactoring allows us
to send just one request with forknum == InvalidForkNumber.  This not only
reduces pressure on the shared-memory request queue, but saves repeated
traversals of the checkpointer's hash table.
2012-07-19 13:07:33 -04:00
Heikki Linnakangas a7a4add6c4 Refactor the way code is shared between some range type functions.
Functions like range_eq, range_before etc. are exposed at the SQL-level, but
they're also used internally by the GiST consistent support function. The
code sharing was done by a hack, TrickFunctionCall2, which relied on the
knowledge that all the functions used fn_extra the same way. This commit
splits the functions into internal versions that take a TypeCacheEntry as
argument, and thin wrappers to expose the functions at the SQL-level. The
internal versions can then be called directly and in a less hacky way from
the GiST consistent function.

This is just cosmetic, but backpatch to 9.2 anyway, to avoid having a
different version of this code in the 9.2 branch. That would make
backpatching fixes in this area more difficult.

Alexander Korotkov
2012-07-18 23:14:56 +03:00
Tom Lane 80e373c3a8 Fix statistics breakage from bgwriter/checkpointer process split.
ForwardFsyncRequest() supposed that it could only be called in regular
backends, which used to be true; but since the splitup of bgwriter and
checkpointer, it is also called in the bgwriter.  We do not want to count
such calls in pg_stat_bgwriter.buffers_backend statistics, so fix things
so that they aren't.

(It's worth noting here that this implies an alarmingly large increase in
the expected amount of cross-process fsync request traffic, which may well
mean that the process splitup was not such a hot idea.)
2012-07-18 15:40:31 -04:00
Tom Lane 4a9c30a8a1 Fix management of pendingOpsTable in auxiliary processes.
mdinit() was misusing IsBootstrapProcessingMode() to decide whether to
create an fsync pending-operations table in the current process.  This led
to creating a table not only in the startup and checkpointer processes as
intended, but also in the bgwriter process, not to mention other auxiliary
processes such as walwriter and walreceiver.  Creation of the table in the
bgwriter is fatal, because it absorbs fsync requests that should have gone
to the checkpointer; instead they just sit in bgwriter local memory and are
never acted on.  So writes performed by the bgwriter were not being fsync'd
which could result in data loss after an OS crash.  I think there is no
live bug with respect to walwriter and walreceiver because those never
perform any writes of shared buffers; but the potential is there for
future breakage in those processes too.

To fix, make AuxiliaryProcessMain() export the current process's
AuxProcType as a global variable, and then make mdinit() test directly for
the types of aux process that should have a pendingOpsTable.  Having done
that, we might as well also get rid of the random bool flags such as
am_walreceiver that some of the aux processes had grown.  (Note that we
could not have fixed the bug by examining those variables in mdinit(),
because it's called from BaseInit() which is run by AuxiliaryProcessMain()
before entering any of the process-type-specific code.)

Back-patch to 9.2, where the problem was introduced by the split-up of
bgwriter and checkpointer processes.  The bogus pendingOpsTable exists
in walwriter and walreceiver processes in earlier branches, but absent
any evidence that it causes actual problems there, I'll leave the older
branches alone.
2012-07-18 15:28:10 -04:00
Robert Haas 3855968f32 Syntax support and documentation for event triggers.
They don't actually do anything yet; that will get fixed in a
follow-on commit.  But this gets the basic infrastructure in place,
including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT,
SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER;
pg_dump and psql support; and documentation for the anticipated
initial feature set.

Dimitri Fontaine, with review and a bunch of additional hacking by me.
Thom Brown extensively reviewed earlier versions of this patch set,
but there's not a whole lot of that code left in this commit, as it
turns out.
2012-07-18 10:16:16 -04:00
Tom Lane 73b796a52c Improve coding around the fsync request queue.
In all branches back to 8.3, this patch fixes a questionable assumption in
CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are
no uninitialized pad bytes in the request queue structs.  This would only
cause trouble if (a) there were such pad bytes, which could happen in 8.4
and up if the compiler makes enum ForkNumber narrower than 32 bits, but
otherwise would require not-currently-planned changes in the widths of
other typedefs; and (b) the kernel has not uniformly initialized the
contents of shared memory to zeroes.  Still, it seems a tad risky, and we
can easily remove any risk by pre-zeroing the request array for ourselves.
In addition to that, we need to establish a coding rule that struct
RelFileNode can't contain any padding bytes, since such structs are copied
into the request array verbatim.  (There are other places that are assuming
this anyway, it turns out.)

In 9.1 and up, the risk was a bit larger because we were also effectively
assuming that struct RelFileNodeBackend contained no pad bytes, and with
fields of different types in there, that would be much easier to break.
However, there is no good reason to ever transmit fsync or delete requests
for temp files to the bgwriter/checkpointer, so we can revert the request
structs to plain RelFileNode, getting rid of the padding risk and saving
some marginal number of bytes and cycles in fsync queue manipulation while
we are at it.  The savings might be more than marginal during deletion of
a temp relation, because the old code transmitted an entirely useless but
nonetheless expensive-to-process ForgetRelationFsync request to the
background process, and also had the background process perform the file
deletion even though that can safely be done immediately.

In addition, make some cleanup of nearby comments and small improvements to
the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.
2012-07-17 16:56:54 -04:00
Peter Eisentraut 71f2dd2321 PL/Python: Remove PLy_result_ass_item
It is apparently no longer used after the new slicing support was
implemented (a97207b690), so let's
remove the dead code and see if anything cares.
2012-07-17 23:26:49 +03:00
Alvaro Herrera 65558995a2 Remove recently added PL/Perl encoding tests
These only pass cleanly on UTF8 and SQL_ASCII encodings, besides the
Japanese encoding in which they were originally written, which is clearly
not good enough.  Since the functionality they test has not ever been
tested from PL/Perl, the best answer seems to be to remove the new tests
completely.

Per buildfarm results and ensuing discussion.
2012-07-17 13:26:25 -04:00
Tom Lane 57b9bdda39 Put back storage/proc.h in postmaster.c.
I took this out thinking it wasn't needed anymore, but the EXEC_BACKEND
code still needs it.  Per buildfarm.
2012-07-17 10:14:06 -04:00
Alvaro Herrera f34c68f096 Introduce timeout handling framework
Management of timeouts was getting a little cumbersome; what we
originally had was more than enough back when we were only concerned
about deadlocks and query cancel; however, when we added timeouts for
standby processes, the code got considerably messier.  Since there are
plans to add more complex timeouts, this seems a good time to introduce
a central timeout handling module.

External modules register their timeout handlers during process
initialization, and later enable and disable them as they see fit using
a simple API; timeout.c is in charge of keeping track of which timeouts
are in effect at any time, installing a common SIGALRM signal handler,
and calling setitimer() as appropriate to ensure timely firing of
external handlers.

timeout.c additionally supports pluggable modules to add their own
timeouts, though this capability isn't exercised anywhere yet.

Additionally, as of this commit, walsender processes are aware of
timeouts; we had a preexisting bug there that made those ignore SIGALRM,
thus being subject to unhandled deadlocks, particularly during the
authentication phase.  This has already been fixed in back branches in
commit 0bf8eb2a, which see for more details.

Main author: Zoltán Böszörményi
Some review and cleanup by Álvaro Herrera
Extensive reworking by Tom Lane
2012-07-16 22:55:33 -04:00
Peter Eisentraut dd16f9480a Remove unreachable code
The Solaris Studio compiler warns about these instances, unlike more
mainstream compilers such as gcc.  But manual inspection showed that
the code is clearly not reachable, and we hope no worthy compiler will
complain about removing this code.
2012-07-16 22:15:03 +03:00
Peter Eisentraut a76c857eba Add comment why seemingly dead code is necessary 2012-07-16 22:08:04 +03:00
Tom Lane c92be3c059 Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index.  This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.

To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index.  (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.)  Now we don't need preassignment of index
names in any situation.

I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.

Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches.  The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 13:25:18 -04:00
Tom Lane 54fd196ffc Prevent corner-case core dump in rfree().
rfree() failed to cope with the case that pg_regcomp() had initialized the
regex_t struct but then failed to allocate any memory for re->re_guts (ie,
the first malloc call in pg_regcomp() failed).  It would try to touch the
guts struct anyway, and thus dump core.  This is a sufficiently narrow
corner case that it's not surprising it's never been seen in the field;
but still a bug is a bug, so patch all active branches.

Noted while investigating whether we need to call pg_regfree after a
failure return from pg_regcomp.  Other than this bug, it turns out we
don't, so adjust comments appropriately.
2012-07-15 13:27:54 -04:00
Heikki Linnakangas 2686da9db2 Don't initialize TLI variable to -1, as TimeLineID is unsigned.
This was causing a compiler warning with Solaris compiler. Use 0 instead.
The variable is initialized just for the sake of tidyness  and/or debugging,
it's not used for anything before setting it to a real value.

Per report and suggestion from Peter Eisentraut.
2012-07-14 21:04:53 +03:00
Heikki Linnakangas 6c349a565a Print the name of the WAL file containing latest REDO ptr in pg_controldata.
This makes it easier to determine how far back you need to keep archived WAL
files, to restore from a backup.

Fujii Masao
2012-07-14 14:22:57 +03:00
Tom Lane b966dd6c42 Add fsync capability to initdb, and use sync_file_range() if available.
Historically we have not worried about fsync'ing anything during initdb
(in fact, initdb intentionally passes -F to each backend launch to prevent
it from fsync'ing).  But with filesystems getting more aggressive about
caching data, that's not such a good plan anymore.  Make initdb do a pass
over the finished data directory tree to fsync everything.  For testing
purposes, the -N/--nosync flag can be used to restore the old behavior.

Also, testing shows that on Linux, sync_file_range() is much faster than
posix_fadvise() for hinting to the kernel that an fsync is coming,
apparently because the latter blocks on a rather small request queue while
the former doesn't.  So use this function if available in initdb, and also
in the backend's pg_flush_data() (where it currently will affect only the
speed of CREATE DATABASE's cloning step).

We will later make pg_regress invoke initdb with the --nosync flag
to avoid slowing down cases such as "make check" in contrib.  But
let's not do so until we've shaken out any portability issues in this
patch.

Jeff Davis, reviewed by Andres Freund
2012-07-13 17:16:58 -04:00
Tom Lane 1a9405d265 Cosmetic cleanup of ginInsertValue().
Make it clearer that the passed stack mustn't be empty, and that we
are not supposed to fall off the end of the stack in the main loop.
Tighten the loop that extracts the root block number, too.

Markus Wanner and Tom Lane
2012-07-13 11:37:39 -04:00
Peter Eisentraut a84bf4922e Avoid extra newlines in XML mapping in table forest mode
found by P. Broennimann
2012-07-12 23:52:50 +03:00
Tom Lane a36088bcfa Skip text->binary conversion of unnecessary columns in contrib/file_fdw.
When reading from a text- or CSV-format file in file_fdw, the datatype
input routines can consume a significant fraction of the runtime.
Often, the query does not need all the columns, so we can get a useful
speed boost by skipping I/O conversion for unnecessary columns.

To support this, add a "convert_selectively" option to the core COPY code.
This is undocumented and not accessible from SQL (for now, anyway).

Etsuro Fujita, reviewed by KaiGai Kohei
2012-07-12 16:26:59 -04:00
Bruce Momjian 76720bdf1a Remove 'x =- 1' check for pgindent, not needed, per report from Andrew
Dunstan.
2012-07-12 14:37:47 -04:00
Magnus Hagander 058a050ec7 Fix memory and file descriptor leaks in pg_receivexlog/pg_basebackup
When the internal loop mode was added, freeing memory and closing
filedescriptors before returning became important, and a few cases
in the code missed that.

Fujii Masao
2012-07-12 13:33:58 +02:00
Tom Lane 84a42560c8 Add array_remove() and array_replace() functions.
These functions support removing or replacing array element value(s)
matching a given search value.  Although intended mainly to support a
future array-foreign-key feature, they seem useful in their own right.

Marco Nenciarini and Gabriele Bartolini, reviewed by Alex Hunsaker
2012-07-11 13:59:35 -04:00
Tom Lane 01215d61a7 Fix bogus macro definition.
Per buildfarm complaints.
2012-07-10 22:36:11 -04:00
Tatsuo Ishii 1c7a7faa5b Add comments about additional mule-internal charsets from emacs's
source code(lisp/international/mule-conf.el).  These charsets have not
been supported up to now anyway, so this is just for adding
commentary.  Also add mention that we follow emacs's implementation,
not xemacs's.
2012-07-11 08:10:50 +09:00