Commit Graph

1090 Commits

Author SHA1 Message Date
Simon Riggs 8cb53654db Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLock 2012-04-06 10:21:40 +01:00
Tom Lane a40fa613b5 Add some infrastructure for contrib/pg_stat_statements.
Add a queryId field to Query and PlannedStmt.  This is not used by the
core backend, except for being copied around at appropriate times.
It's meant to allow plug-ins to track a particular query forward from
parse analysis to execution.

The queryId is intentionally not dumped into stored rules (and hence this
commit doesn't bump catversion).  You could argue that choice either way,
but it seems better that stored rule strings not have any dependency
on plug-ins that might or might not be present.

Also, add a post_parse_analyze_hook that gets invoked at the end of
parse analysis (but only for top-level analysis of complete queries,
not cases such as analyzing a domain's default-value expression).
This is mainly meant to be used to compute and assign a queryId,
but it could have other applications.

Peter Geoghegan
2012-03-27 15:17:40 -04:00
Tom Lane 9dbf2b7d75 Restructure SELECT INTO's parsetree representation into CreateTableAsStmt.
Making this operation look like a utility statement seems generally a good
idea, and particularly so in light of the desire to provide command
triggers for utility statements.  The original choice of representing it as
SELECT with an IntoClause appendage had metastasized into rather a lot of
places, unfortunately, so that this patch is a great deal more complicated
than one might at first expect.

In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS
subcommands required restructuring some EXPLAIN-related APIs.  Add-on code
that calls ExplainOnePlan or ExplainOneUtility, or uses
ExplainOneQuery_hook, will need adjustment.

Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO,
which formerly were accepted though undocumented, are no longer accepted.
The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE.
The CREATE RULE case doesn't seem to have much real-world use (since the
rule would work only once before failing with "table already exists"),
so we'll not bother with that one.

Both SELECT INTO and CREATE TABLE AS still return a command tag of
"SELECT nnnn".  There was some discussion of returning "CREATE TABLE nnnn",
but for the moment backwards compatibility wins the day.

Andres Freund and Tom Lane
2012-03-19 21:38:12 -04:00
Peter Eisentraut 86947e666d Add more detail to error message for invalid arguments for server process
It now prints the argument that was at fault.

Also fix a small misbehavior where the error message issued by
getopt() would complain about a program named "--single", because
that's what argv[0] is in the server process.
2012-03-11 02:03:52 +02:00
Tom Lane 4bfe68dfab Run a portal's cleanup hook immediately when pushing it to FAILED state.
This extends the changes of commit 6252c4f9e2
so that we run the cleanup hook earlier for failure cases as well as
success cases.  As before, the point is to avoid an assertion failure from
an Assert I added in commit a874fe7b4c, which
was meant to check that no user-written code can be called during portal
cleanup.  This fixes a case reported by Pavan Deolasee in which the Assert
could be triggered during backend exit (see the new regression test case),
and also prevents the possibility that the cleanup hook is run after
portions of the portal's state have already been recycled.  That doesn't
really matter in current usage, but it foreseeably could matter in the
future.

Back-patch to 9.1 where the Assert in question was added.
2012-02-15 16:19:01 -05:00
Simon Riggs b8a91d9d1c ALTER <thing> [IF EXISTS] ... allows silent DDL if required,
e.g. ALTER FOREIGN TABLE IF EXISTS foo RENAME TO bar

Pavel Stehule
2012-01-23 23:25:04 +00:00
Magnus Hagander 4f42b546fd Separate state from query string in pg_stat_activity
This separates the state (running/idle/idleintransaction etc) into
it's own field ("state"), and leaves the query field containing just
query text.

The query text will now mean "current query" when a query is running
and "last query" in other states. Accordingly,the field has been
renamed from current_query to query.

Since backwards compatibility was broken anyway to make that, the procpid
field has also been renamed to pid - along with the same field in
pg_stat_replication for consistency.

Scott Mead and Magnus Hagander, review work from Greg Smith
2012-01-19 14:19:20 +01:00
Robert Haas 1489e2f26a Improve behavior of concurrent ALTER TABLE, and do some refactoring.
ALTER TABLE (and ALTER VIEW, ALTER SEQUENCE, etc.) now use a
RangeVarGetRelid callback to check permissions before acquiring a table
lock.  We also now use the same callback for all forms of ALTER TABLE,
rather than having separate, almost-identical callbacks for ALTER TABLE
.. SET SCHEMA and ALTER TABLE .. RENAME, and no callback at all for
everything else.

I went ahead and changed the code so that no form of ALTER TABLE works
on foreign tables; you must use ALTER FOREIGN TABLE instead.  In 9.1,
it was possible to use ALTER TABLE .. SET SCHEMA or ALTER TABLE ..
RENAME on a foreign table, but not any other form of ALTER TABLE, which
did not seem terribly useful or consistent.

Patch by me; review by Noah Misch.
2012-01-06 22:42:26 -05:00
Peter Eisentraut 104e7dac28 Improve ALTER DOMAIN / DROP CONSTRAINT with nonexistent constraint
ALTER DOMAIN / DROP CONSTRAINT on a nonexistent constraint name did
not report any error.  Now it reports an error.  The IF EXISTS option
was added to get the usual behavior of ignoring nonexistent objects to
drop.
2012-01-05 19:48:55 +02:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Robert Haas d573e239f0 Take fewer snapshots.
When a PORTAL_ONE_SELECT query is executed, we can opportunistically
reuse the parse/plan shot for the execution phase.  This cuts down the
number of snapshots per simple query from 2 to 1 for the simple
protocol, and 3 to 2 for the extended protocol.  Since we are only
reusing a snapshot taken early in the processing of the same protocol
message, the change shouldn't be user-visible, except that the remote
possibility of the planning and execution snapshots being different is
eliminated.

Note that this change does not make it safe to assume that the parse/plan
snapshot will certainly be reused; that will currently only happen if
PortalStart() decides to use the PORTAL_ONE_SELECT strategy.  It might
be worth trying to provide some stronger guarantees here in the future,
but for now we don't.

Patch by me; review by Dimitri Fontaine.
2011-12-21 09:16:55 -05:00
Heikki Linnakangas 5d8a894e30 Cancel running query if it is detected that the connection to the client is
lost. The only way we detect that at the moment is when write() fails when
we try to write to the socket.

Florian Pflug with small changes by me, reviewed by Greg Jaskiewicz.
2011-12-09 14:21:36 +02:00
Robert Haas 2ad36c4e44 Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).

To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock.  If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds.  RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro.  Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well.  The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.

There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.

Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 10:27:00 -05:00
Tom Lane b985d48779 Further code review for range types patch.
Fix some bugs in coercion logic and pg_dump; more comment cleanup;
minor cosmetic improvements.
2011-11-20 23:50:27 -05:00
Robert Haas fc6d1006bd Further consolidation of DROP statement handling.
This gets rid of an impressive amount of duplicative code, with only
minimal behavior changes.  DROP FOREIGN DATA WRAPPER now requires object
ownership rather than superuser privileges, matching the documentation
we already have.  We also eliminate the historical warning about dropping
a built-in function as unuseful.  All operations are now performed in the
same order for all object types handled by dropcmds.c.

KaiGai Kohei, with minor revisions by me
2011-11-17 21:32:34 -05:00
Heikki Linnakangas 4429f6a9e3 Support range data types.
Selectivity estimation functions are missing for some range type operators,
which is a TODO.

Jeff Davis
2011-11-03 13:42:15 +02:00
Robert Haas 82a4a777d9 Consolidate DROP handling for some object types.
This gets rid of a significant amount of duplicative code.

KaiGai Kohei, reviewed in earlier versions by Dimitri Fontaine, with
further review and cleanup by me.
2011-10-19 23:27:19 -04:00
Tom Lane a2822fb933 Support index-only scans using the visibility map to avoid heap fetches.
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page.  This patch depends
on the previous patches that made the visibility map reliable.

There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.

Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
2011-10-07 20:14:13 -04:00
Bruce Momjian aaa6e1def2 Add postmaster -C option to query configuration parameters, and have
pg_ctl use that to query the data directory for config-only installs.
This fixes awkward or impossible pg_ctl operation for config-only
installs.
2011-10-06 09:38:39 -04:00
Tom Lane e6faf910d7 Redesign the plancache mechanism for more flexibility and efficiency.
Rewrite plancache.c so that a "cached plan" (which is rather a misnomer
at this point) can support generation of custom, parameter-value-dependent
plans, and can make an intelligent choice between using custom plans and
the traditional generic-plan approach.  The specific choice algorithm
implemented here can probably be improved in future, but this commit is
all about getting the mechanism in place, not the policy.

In addition, restructure the API to greatly reduce the amount of extraneous
data copying needed.  The main compromise needed to make that possible was
to split the initial creation of a CachedPlanSource into two steps.  It's
worth noting in particular that SPI_saveplan is now deprecated in favor of
SPI_keepplan, which accomplishes the same end result with zero data
copying, and no need to then spend even more cycles throwing away the
original SPIPlan.  The risk of long-term memory leaks while manipulating
SPIPlans has also been greatly reduced.  Most of this improvement is based
on use of the recently-added MemoryContextSetParent primitive.
2011-09-16 00:43:52 -04:00
Tom Lane ca4af308c3 Simplify handling of the timezone GUC by making initdb choose the default.
We were doing some amazingly complicated things in order to avoid running
the very expensive identify_system_timezone() procedure during GUC
initialization.  But there is an obvious fix for that, which is to do it
once during initdb and have initdb install the system-specific default into
postgresql.conf, as it already does for most other GUC variables that need
system-environment-dependent defaults.  This means that the timezone (and
log_timezone) settings no longer have any magic behavior in the server.
Per discussion.
2011-09-09 17:59:11 -04:00
Tom Lane a7801b62f2 Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h.
As per my recent proposal, this refactors things so that these typedefs and
macros are available in a header that can be included in frontend-ish code.
I also changed various headers that were undesirably including
utils/timestamp.h to include datatype/timestamp.h instead.  Unsurprisingly,
this showed that half the system was getting utils/timestamp.h by way of
xlog.h.

No actual code changes here, just header refactoring.
2011-09-09 13:23:41 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Tom Lane 4dab3d5ae1 Change the autovacuum launcher to use WaitLatch instead of a poll loop.
In pursuit of this (and with the expectation that WaitLatch will be needed
in more places), convert the latch field that was already added to PGPROC
for sync rep into a generic latch that is activated for all PGPROC-owning
processes, and change many of the standard backend signal handlers to set
that latch when a signal happens.  This will allow WaitLatch callers to be
wakened properly by these signals.

In passing, fix a whole bunch of signal handlers that had been hacked to do
things that might change errno, without adding the necessary save/restore
logic for errno.  Also make some minor fixes in unix_latch.c, and clean
up bizarre and unsafe scheme for disowning the process's latch.  Much of
this has to be back-patched into 9.1.

Peter Geoghegan, with additional work by Tom
2011-08-10 12:22:21 -04:00
Robert Haas 367bc426a1 Avoid index rebuild for no-rewrite ALTER TABLE .. ALTER TYPE.
Noah Misch.  Review and minor cosmetic changes by me.
2011-07-18 11:04:43 -04:00
Tom Lane 1af37ec96d Replace errdetail("%s", ...) with errdetail_internal("%s", ...).
There may be some other places where we should use errdetail_internal,
but they'll have to be evaluated case-by-case.  This commit just hits
a bunch of places where invoking gettext is obviously a waste of cycles.
2011-07-16 14:22:18 -04:00
Robert Haas 4240e429d0 Try to acquire relation locks in RangeVarGetRelid.
In the previous coding, we would look up a relation in RangeVarGetRelid,
lock the resulting OID, and then AcceptInvalidationMessages().  While
this was sufficient to ensure that we noticed any changes to the
relation definition before building the relcache entry, it didn't
handle the possibility that the name we looked up no longer referenced
the same OID.  This was particularly problematic in the case where a
table had been dropped and recreated: we'd latch on to the entry for
the old relation and fail later on.  Now, we acquire the relation lock
inside RangeVarGetRelid, and retry the name lookup if we notice that
invalidation messages have been processed meanwhile.  Many operations
that would previously have failed with an error in the presence of
concurrent DDL will now succeed.

There is a good deal of work remaining to be done here: many callers
of RangeVarGetRelid still pass NoLock for one reason or another.  In
addition, nothing in this patch guards against the possibility that
the meaning of an unqualified name might change due to the creation
of a relation in a schema earlier in the user's search path than the
one where it was previously found.  Furthermore, there's nothing at
all here to guard against similar race conditions for non-relations.
For all that, it's a start.

Noah Misch and Robert Haas
2011-07-08 22:19:30 -04:00
Alvaro Herrera 897795240c Enable CHECK constraints to be declared NOT VALID
This means that they can initially be added to a large existing table
without checking its initial contents, but new tuples must comply to
them; a separate pass invoked by ALTER TABLE / VALIDATE can verify
existing data and ensure it complies with the constraint, at which point
it is marked validated and becomes a normal part of the table ecosystem.

An non-validated CHECK constraint is ignored in the planner for
constraint_exclusion purposes; when validated, cached plans are
recomputed so that partitioning starts working right away.

This patch also enables domains to have unvalidated CHECK constraints
attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT
VALID, which can later be validated with ALTER DOMAIN / VALIDATE
CONSTRAINT.

Thanks to Thom Brown, Dean Rasheed and Jaime Casanova for the various
reviews, and Robert Hass for documentation wording improvement
suggestions.

This patch was sponsored by Enova Financial.
2011-06-30 11:24:31 -04:00
Peter Eisentraut 21f1e15aaf Unify spelling of "canceled", "canceling", "cancellation"
We had previously (af26857a27)
established the U.S. spellings as standard.
2011-06-29 09:28:46 +03:00
Robert Haas 68ef051f5c Refactor broken CREATE TABLE IF NOT EXISTS support.
Per bug #5988, reported by Marko Tiikkaja, and further analyzed by Tom
Lane, the previous coding was broken in several respects: even if the
target table already existed, a subsequent CREATE TABLE IF NOT EXISTS
might try to add additional constraints or sequences-for-serial
specified in the new CREATE TABLE statement.

In passing, this also fixes a minor information leak: it's no longer
possible to figure out whether a schema to which you don't have CREATE
access contains a sequence named like "x_y_seq" by attempting to create a
table in that schema called "x" with a serial column called "y".

Some more refactoring of this code in the future might be warranted,
but that will need to wait for a later major release.
2011-04-25 16:55:11 -04:00
Bruce Momjian 76dd09bbec Add postmaster/postgres undocumented -b option for binary upgrades.
This option turns off autovacuum, prevents non-super-user connections,
and enables oid setting hooks in the backend.  The code continues to use
the old autoavacuum disable settings for servers with earlier catalog
versions.

This includes a catalog version bump to identify servers that support
the -b option.
2011-04-25 12:00:21 -04:00
Heikki Linnakangas b5bb040da6 On IA64 architecture, we check the depth of the register stack in addition
to the regular stack. The code to do that is platform and compiler specific,
add support for the HP-UX native compiler.
2011-04-13 11:50:55 +03:00
Tom Lane d64713df7e Pass collations to functions in FunctionCallInfoData, not FmgrInfo.
Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings.  Fix by passing it in FunctionCallInfoData
instead.  In particular this allows a clean fix for bug #5970 (record_cmp
not working).  This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
2011-04-12 19:19:24 -04:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane 2594cf0e8c Revise the API for GUC variable assign hooks.
The previous functions of assign hooks are now split between check hooks
and assign hooks, where the former can fail but the latter shouldn't.
Aside from being conceptually clearer, this approach exposes the
"canonicalized" form of the variable value to guc.c without having to do
an actual assignment.  And that lets us fix the problem recently noted by
Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus
log messages about "parameter "wal_buffers" cannot be changed without
restarting the server".  There may be some speed advantage too, because
this design lets hook functions avoid re-parsing variable values when
restoring a previous state after a rollback (they can store a pre-parsed
representation of the value instead).  This patch also resolves a
longstanding annoyance about custom error messages from variable assign
hooks: they should modify, not appear separately from, guc.c's own message
about "invalid parameter value".
2011-04-07 00:12:02 -04:00
Robert Haas 9a56dc3389 Fix various possible problems with synchronous replication.
1. Don't ignore query cancel interrupts.  Instead, if the user asks to
cancel the query after we've already committed it, but before it's on
the standby, just emit a warning and let the COMMIT finish.

2. Don't ignore die interrupts (pg_terminate_backend or fast shutdown).
Instead, emit a warning message and close the connection without
acknowledging the commit.  Other backends will still see the effect of
the commit, but there's no getting around that; it's too late to abort
at this point, and ignoring die interrupts altogether doesn't seem like
a good idea.

3. If synchronous_standby_names becomes empty, wake up all backends
waiting for synchronous replication to complete.  Without this, someone
attempting to shut synchronous replication off could easily wedge the
entire system instead.

4. Avoid depending on the assumption that if a walsender updates
MyProc->syncRepState, we'll see the change even if we read it without
holding the lock.  The window for this appears to be quite narrow (and
probably doesn't exist at all on machines with strong memory ordering)
but protecting against it is practically free, so do that.

5. Remove useless state SYNC_REP_MUST_DISCONNECT, which isn't needed and
doesn't actually do anything.

There's still some further work needed here to make the behavior of fast
shutdown plausible, but that looks complex, so I'm leaving it for a
separate commit.  Review by Fujii Masao.
2011-03-17 13:12:21 -04:00
Tom Lane 6252c4f9e2 Run a portal's cleanup hook immediately when pushing it to DONE state.
This works around the problem noted by Yamamoto Takashi in bug #5906,
that there were code paths whereby we could reach AtCleanup_Portals
with a portal's cleanup hook still unexecuted.  The changes I made
a few days ago were intended to prevent that from happening, and
I think that on balance it's still a good thing to avoid, so I don't
want to remove the Assert in AtCleanup_Portals.  Hence do this instead.
2011-03-03 13:04:06 -05:00
Tom Lane c0b0076036 Rearrange snapshot handling to make rule expansion more consistent.
With this patch, portals, SQL functions, and SPI all agree that there
should be only a CommandCounterIncrement between the queries that are
generated from a single SQL command by rule expansion.  Fetching a whole
new snapshot now happens only between original queries.  This is equivalent
to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the
best choice since it eliminates one source of concurrency hazards for
rules.  The patch should also make things marginally faster by reducing the
number of snapshot push/pop operations.

The patch removes pg_parse_and_rewrite(), which is no longer used anywhere.
There was considerable discussion about more aggressive refactoring of the
query-processing functions exported by postgres.c, but for the moment
nothing more has been done there.

I also took the opportunity to refactor snapmgr.c's API slightly: the
former PushUpdatedSnapshot() has been split into two functions.

Marko Tiikkaja, reviewed by Steve Singer and Tom Lane
2011-02-28 23:28:06 -05:00
Tom Lane a874fe7b4c Refactor the executor's API to support data-modifying CTEs better.
The originally committed patch for modifying CTEs didn't interact well
with EXPLAIN, as noted by myself, and also had corner-case problems with
triggers, as noted by Dean Rasheed.  Those problems show it is really not
practical for ExecutorEnd to call any user-defined code; so split the
cleanup duties out into a new function ExecutorFinish, which must be called
between the last ExecutorRun call and ExecutorEnd.  Some Asserts have been
added to these functions to help verify correct usage.

It is no longer necessary for callers of the executor to call
AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now
done by ExecutorStart/ExecutorFinish respectively.  If you really need to
suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to
ExecutorStart.

Also, refactor portal commit processing to allow for the possibility that
PortalDrop will invoke user-defined code.  I think this is not actually
necessary just yet, since the portal-execution-strategy logic forces any
non-pure-SELECT query to be run to completion before we will consider
committing.  But it seems like good future-proofing.
2011-02-27 13:44:12 -05:00
Tom Lane 389af95155 Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH.
This patch implements data-modifying WITH queries according to the
semantics that the updates all happen with the same command counter value,
and in an unspecified order.  Therefore one WITH clause can't see the
effects of another, nor can the outer query see the effects other than
through the RETURNING values.  And attempts to do conflicting updates will
have unpredictable results.  We'll need to document all that.

This commit just fixes the code; documentation updates are waiting on
author.

Marko Tiikkaja and Hitoshi Harada
2011-02-25 18:58:02 -05:00
Peter Eisentraut b313bca0af DDL support for collations
- collowner field
- CREATE COLLATION
- ALTER COLLATION
- DROP COLLATION
- COMMENT ON COLLATION
- integration with extensions
- pg_dump support for the above
- dependency management
- psql tab completion
- psql \dO command
2011-02-12 15:55:18 +02:00
Tom Lane 1214749901 Add support for multiple versions of an extension and ALTER EXTENSION UPDATE.
This follows recent discussions, so it's quite a bit different from
Dimitri's original.  There will probably be more changes once we get a bit
of experience with it, but let's get it in and start playing with it.

This is still just core code.  I'll start converting contrib modules
shortly.

Dimitri Fontaine and Tom Lane
2011-02-11 21:25:57 -05:00
Tom Lane 01467d3e4f Extend "ALTER EXTENSION ADD object" to permit "DROP object" as well.
Per discussion, this is something we should have sooner rather than later,
and it doesn't take much additional code to support it.
2011-02-10 17:37:22 -05:00
Tom Lane 5bc178b89f Implement "ALTER EXTENSION ADD object".
This is an essential component of making the extension feature usable;
first because it's needed in the process of converting an existing
installation containing "loose" objects of an old contrib module into
the extension-based world, and second because we'll have to use it
in pg_dump --binary-upgrade, as per recent discussion.

Loosely based on part of Dimitri Fontaine's ALTER EXTENSION UPGRADE
patch.
2011-02-09 11:56:37 -05:00
Tom Lane d9572c4e3b Core support for "extensions", which are packages of SQL objects.
This patch adds the server infrastructure to support extensions.
There is still one significant loose end, namely how to make it play nice
with pg_upgrade, so I am not yet committing the changes that would make
all the contrib modules depend on this feature.

In passing, fix a disturbingly large amount of breakage in
AlterObjectNamespace() and callers.

Dimitri Fontaine, reviewed by Anssi Kääriäinen,
Itagaki Takahiro, Tom Lane, and numerous others
2011-02-08 16:13:22 -05:00
Heikki Linnakangas dafaa3efb7 Implement genuine serializable isolation level.
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.

To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.

A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.

Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.

We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.

Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.

Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
2011-02-08 00:09:08 +02:00
Simon Riggs 56b21b7ae3 Re-classify ERRCODE_DATABASE_DROPPED to 57P04 2011-02-01 08:44:01 +00:00
Simon Riggs 9e95c9ad55 Create new errcode for recovery conflict caused by db drop on master.
Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now
reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change.
Unlikely to happen on most servers, so low impact change to allow
session poolers to correctly handle this situation.

Tatsuo Ishii, edits by me, review by Robert Haas
2011-02-01 00:20:53 +00:00
Tom Lane 0ac8c8df85 Don't include <asm/ia64regs.h> unnecessarily.
We only need that header when compiling with icc, since the gcc variant of
ia64_get_bsp() uses in-line assembly code.  Per report from Frank Brendel,
the header doesn't exist on all IA64 platforms; so don't include it unless
we need it.
2011-01-27 16:27:27 -05:00
Magnus Hagander 40d9e94bd7 Add views and functions to monitor hot standby query conflicts
Add the view pg_stat_database_conflicts and a column to pg_stat_database,
and the underlying functions to provide the information.
2011-01-03 12:46:03 +01:00
Robert Haas 0d692a0dc9 Basic foreign table support.
Foreign tables are a core component of SQL/MED.  This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried.  Support for foreign table scans will need to
be added in a future patch.  However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.

Shigeru Hanada, heavily revised by Robert Haas
2011-01-01 23:48:11 -05:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Alvaro Herrera 3026027ec3 set_ps_display when calling functions via fastpath
This improves tag output by log_line_prefix
2010-12-17 18:51:22 -03:00
Tom Lane 61b53695fb Remove optreset from src/port/ implementations of getopt and getopt_long.
We don't actually need optreset, because we can easily fix the code to
ensure that it's cleanly restartable after having completed a scan over the
argv array; which is the only case we need to restart in.  Getting rid of
it avoids a class of interactions with the system libraries and allows
reversion of my change of yesterday in postmaster.c and postgres.c.

Back-patch to 8.4.  Before that the getopt code was a bit different anyway.
2010-12-16 16:23:05 -05:00
Tom Lane 5cdd65f324 Fix up getopt() reset management so it works on recent mingw.
The mingw people don't appear to care about compatibility with non-GNU
versions of getopt, so force use of our own copy of getopt on Windows.
Also, ensure that we make use of optreset when using our own copy.

Per report from Andrew Dunstan.  Back-patch to all versions supported
on Windows.
2010-12-15 23:50:41 -05:00
Robert Haas 55109313f9 Add more ALTER <object> .. SET SCHEMA commands.
This adds support for changing the schema of a conversion, operator,
operator class, operator family, text search configuration, text search
dictionary, text search parser, or text search template.

Dimitri Fontaine, with assorted corrections and other kibitzing.
2010-11-26 17:31:54 -05:00
Tom Lane d7a2ce4905 Add support for detecting register-stack overrun on IA64.
Per recent investigation, the register stack can grow faster than the
regular stack depending on compiler and choice of options.  To avoid
crashes we must check both stacks in check_stack_depth().

Since this is poorly-tested code, committing only to HEAD for the
moment ... but we might want to consider back-patching later.
2010-11-06 19:36:29 -04:00
Tom Lane dd1c781903 Make get_stack_depth_rlimit() handle RLIM_INFINITY more sanely.
Rather than considering this result as meaning "unknown", report LONG_MAX.
This won't change what superusers can set max_stack_depth to, but it will
cause InitializeGUCOptions() to set the built-in default to 2MB not 100kB.
The latter seems like a fairly unreasonable interpretation of "infinity".
Per my investigation of odd buildfarm results as well as an old complaint
from Heikki.

Since this should persuade all the buildfarm animals to use a reasonable
stack depth setting during "make check", revert previous patch that dumbed
down a recursive regression test to only 5 levels.
2010-11-06 16:50:18 -04:00
Tom Lane 6736916f5f Include the current value of max_stack_depth in stack depth complaints.
I'm mainly interested in finding out what it is on buildfarm machines,
but including the active value in the message seems like good practice
in any case.  Add the info to the HINT, not the ERROR string, so as not
to change the regression tests' expected output.
2010-11-04 17:15:38 -04:00
Tom Lane 84c123be1d Allow new values to be added to an existing enum type.
After much expenditure of effort, we've got this to the point where the
performance penalty is pretty minimal in typical cases.

Andrew Dunstan, reviewed by Brendan Jurd, Dean Rasheed, and Tom Lane
2010-10-24 23:05:41 -04:00
Robert Haas 4d355a8336 Add a SECURITY LABEL command.
This is intended as infrastructure to support integration with label-based
mandatory access control systems such as SE-Linux. Further changes (mostly
hooks) will be needed, but this is a big chunk of it.

KaiGai Kohei and Robert Haas
2010-09-27 20:55:27 -04:00
Peter Eisentraut e440e12c56 Add ALTER TYPE ... ADD/DROP/ALTER/RENAME ATTRIBUTE
Like with tables, this also requires allowing the existence of
composite types with zero attributes.

reviewed by KaiGai Kohei
2010-09-26 14:41:03 +03:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Joe Conway 5eb15c9942 SERIALIZABLE transactions are actually implemented beneath the covers with
transaction snapshots, i.e. a snapshot registered at the beginning of
a transaction. Change variable naming and comments to reflect this reality
in preparation for a future, truly serializable mode, e.g.
Serializable Snapshot Isolation (SSI).

For the moment transaction snapshots are still used to implement
SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin
Grittner and Dan Ports with review and some minor wording changes by me.
2010-09-11 18:38:58 +00:00
Tom Lane 8ab6a6b456 In HEAD only, revert kluge solution for preventing misuse of pg_get_expr().
A data-type-based solution, which is much cleaner and more bulletproof,
will follow shortly.  It seemed best to make this a separate commit though.
2010-09-03 01:26:52 +00:00
Tom Lane b5565bca11 Fix failure of "ALTER TABLE t ADD COLUMN c serial" when done by non-owner.
The implicitly created sequence was created as owned by the current user,
who could be different from the table owner, eg if current user is a
superuser or some member of the table's owning role.  This caused sanity
checks in the SEQUENCE OWNED BY code to spit up.  Although possibly we
don't need those sanity checks, the safest fix seems to be to make sure
the implicit sequence is assigned the same owner role as the table has.
(We still do all permissions checks as the current user, however.)
Per report from Josh Berkus.

Back-patch to 9.0.  The bug goes back to the invention of SEQUENCE OWNED BY
in 8.2, but the fix requires an API change for DefineRelation(), which seems
to have potential for breaking third-party code if done in a minor release.
Given the lack of prior complaints, it's probably not worth fixing in the
stable branches.
2010-08-18 18:35:21 +00:00
Robert Haas 30c22eb8fc Correct sundry errors in Hot Standby-related comments.
Fujii Masao
2010-08-12 23:24:54 +00:00
Robert Haas a3b012b560 CREATE TABLE IF NOT EXISTS.
Reviewed by Bernd Helmle.
2010-07-25 23:21:22 +00:00
Bruce Momjian 239d769e7e pgindent run for 9.0, second run 2010-07-06 19:19:02 +00:00
Heikki Linnakangas 350ab443be stringToNode() and deparse_expression_pretty() crash on invalid input,
but we have nevertheless exposed them to users via pg_get_expr(). It would
be too much maintenance effort to rigorously check the input, so put a hack
in place instead to restrict pg_get_expr() so that the argument must come
from one of the system catalog columns known to contain valid expressions.

Per report from Rushabh Lathia. Backpatch to 7.4 which is the oldest
supported version at the moment.
2010-06-30 18:10:23 +00:00
Simon Riggs 66035734ec Give most recovery conflict errors a retryable error code. From recent
requests and discussions with Yeb Havinga and Kevin Grittner.
2010-05-12 19:45:02 +00:00
Tom Lane c670410e7f Move the responsibility for calling StartupXLOG into InitPostgres, for
those process types that go through InitPostgres; in particular, bootstrap
and standalone-backend cases.  This ensures that we have set up a PGPROC
and done some other basic initialization steps (corresponding to the
if (IsUnderPostmaster) block in AuxiliaryProcessMain) before we attempt to
run WAL recovery in a standalone backend.  As was discovered last September,
this is necessary for some corner-case code paths during WAL recovery,
particularly end-of-WAL cleanup.

Moving the bootstrap case here too is not necessary for correctness, but it
seems like a good idea since it reduces the number of distinct code paths.
2010-04-20 01:38:52 +00:00
Peter Eisentraut c248d17120 Message tuning 2010-03-21 00:17:59 +00:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Tom Lane 05d8a561ff Clean up handling of XactReadOnly and RecoveryInProgress checks.
Add some checks that seem logically necessary, in particular let's make
real sure that HS slave sessions cannot create temp tables.  (If they did
they would think that temp tables belonging to the master's session with
the same BackendId were theirs.  We *must* not allow myTempNamespace to
become set in a slave session.)

Change setval() and nextval() so that they are only allowed on temp sequences
in a read-only transaction.  This seems consistent with what we allow for
table modifications in read-only transactions.  Since an HS slave can't have a
temp sequence, this also provides a nicer cure for the setval PANIC reported
by Erik Rijkers.

Make the error messages more uniform, and have them mention the specific
command being complained of.  This seems worth the trifling amount of extra
code, since people are likely to see such messages a lot more than before.
2010-02-20 21:24:02 +00:00
Tom Lane d1e027221d Replace the pg_listener-based LISTEN/NOTIFY mechanism with an in-memory queue.
In addition, add support for a "payload" string to be passed along with
each notify event.

This implementation should be significantly more efficient than the old one,
and is also more compatible with Hot Standby usage.  There is not yet any
facility for HS slaves to receive notifications generated on the master,
although such a thing is possible in future.

Joachim Wieland, reviewed by Jeff Davis; also hacked on by me.
2010-02-16 22:34:57 +00:00
Bruce Momjian aa7e7ae9a6 Have SELECT and CREATE TABLE AS queries return a row count. While this
is invisible in psql, other interfaces, like libpq, make this value
visible.

Boszormenyi Zoltan
2010-02-16 20:58:14 +00:00
Bruce Momjian 93a57c3b57 Clarify documentation on the behavior of unnamed bind queries. 2010-02-16 20:15:14 +00:00
Robert Haas e26c539e9f Wrap calls to SearchSysCache and related functions using macros.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4).  This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.

Design and review by Tom Lane.
2010-02-14 18:42:19 +00:00
Bruce Momjian bbdf72b095 Improve C comment about why we return "0 0" for some tags. 2010-02-13 22:45:41 +00:00
Simon Riggs b95a720a48 Re-enable max_standby_delay = -1 using deadlock detection on startup
process. If startup waits on a buffer pin we send a request to all
backends to cancel themselves if they are holding the buffer pin
required and they are also waiting on a lock. If not, startup waits
until max_standby_delay before cancelling any backend waiting for
the requested buffer pin.
2010-02-13 01:32:20 +00:00
Tom Lane eb88926625 Avoid performing encoding conversion on command tag strings during EndCommand.
Since all current and foreseeable future command tags will be pure ASCII,
there is no need to do conversion on them.  This saves a few cycles and also
avoids polluting otherwise-pristine subtransaction memory contexts, which
is the cause of the backend memory leak exhibited in bug #5302.  (Someday
we'll probably want to have a better method of determining whether
subtransaction contexts need to be kept around, but today is not that day.)

Backpatch to 8.0.  The cycle-shaving aspect of this would work in 7.4
too, but without subtransactions the memory-leak aspect doesn't apply,
so it doesn't seem worth touching 7.4.
2010-01-30 20:09:53 +00:00
Itagaki Takahiro 7efd71f843 Fix command tag for ALTER LARGE OBJECT. 2010-01-29 06:03:15 +00:00
Simon Riggs a06ea6f532 Add explanatory detail to Hot Standby cancelation error messages
with errdetail(). Add errhint() to suggest retry in certain cases.
2010-01-23 17:04:05 +00:00
Simon Riggs 959ac58c04 In HS, Startup process sets SIGALRM when waiting for buffer pin. If
woken by alarm we send SIGUSR1 to all backends requesting that they
check to see if they are blocking Startup process. If so, they throw
ERROR/FATAL as for other conflict resolutions. Deadlock stop gap
removed. max_standby_delay = -1 option removed to prevent deadlock.
2010-01-23 16:37:12 +00:00
Simon Riggs ed1d3f5ecf Add missing flag reset to ensure subsequent manual cancelation gives correct reason. 2010-01-21 09:30:36 +00:00
Tom Lane 9a915e596f Improve the handling of SET CONSTRAINTS commands by having them search
pg_constraint before searching pg_trigger.  This allows saner handling of
corner cases; in particular we now say "constraint is not deferrable"
rather than "constraint does not exist" when the command is applied to
a constraint that's inherently non-deferrable.  Per a gripe several months
ago from hubert depesz lubaczewski.

To make this work without breaking user-defined constraint triggers,
we have to add entries for them to pg_constraint.  However, in return
we can remove the pgconstrname column from pg_constraint, which represents
a fairly sizable space savings.  I also replaced the tgisconstraint column
with tgisinternal; the old meaning of tgisconstraint can now be had by
testing for nonzero tgconstraint, while there is no other way to get
the old meaning of nonzero tgconstraint, namely that the trigger was
internally generated rather than being user-created.

In passing, fix an old misstatement in the docs and comments, namely that
pg_trigger.tgdeferrable is exactly redundant with pg_constraint.condeferrable.
Actually, we mark RI action triggers as nondeferrable even when they belong to
a nominally deferrable FK constraint.  The SET CONSTRAINTS code now relies on
that instead of hard-coding a list of exception OIDs.
2010-01-17 22:56:23 +00:00
Tom Lane 0ae19c11f4 Remove unnecessary, inconsistent flag resets in ProcessInterrupts. 2010-01-17 04:27:54 +00:00
Simon Riggs a8ce974cdd Teach standby conflict resolution to use SIGUSR1
Conflict reason is passed through directly to the backend, so we can
take decisions about the effect of the conflict based upon the local
state. No specific changes, as yet, though this prepares for later work.
CancelVirtualTransaction() sends signals while holding ProcArrayLock.
Introduce errdetail_abort() to give message detail explaining that the
abort was caused by conflict processing. Remove CONFLICT_MODE states
in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.
2010-01-16 10:05:59 +00:00
Tom Lane 08f8d478eb Do parse analysis of an EXPLAIN's contained statement during the normal
parse analysis phase, rather than at execution time.  This makes parameter
handling work the same as it does in ordinary plannable queries, and in
particular fixes the incompatibility that Pavel pointed out with plpgsql's
new handling of variable references.  plancache.c gets a little bit
grottier, but the alternatives seem worse.
2010-01-15 22:36:35 +00:00
Heikki Linnakangas 40f908bdcd Introduce Streaming Replication.
This includes two new kinds of postmaster processes, walsenders and
walreceiver. Walreceiver is responsible for connecting to the primary server
and streaming WAL to disk, while walsender runs in the primary server and
streams WAL from disk to the client.

Documentation still needs work, but the basics are there. We will probably
pull the replication section to a new chapter later on, as well as the
sections describing file-based replication. But let's do that as a separate
patch, so that it's easier to see what has been added/changed. This patch
also adds a new section to the chapter about FE/BE protocol, documenting the
protocol used by walsender/walreceivxer.

Bump catalog version because of two new functions,
pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
monitoring the progress of replication.

Fujii Masao, with additional hacking by me
2010-01-15 09:19:10 +00:00
Tom Lane 82170c747b Fix (some of the) breakage introduced into query-cancel processing by HS.
It is absolutely not okay to throw an ereport(ERROR) in any random place in
the code just because DoingCommandRead is set; interrupting, say, OpenSSL
in the midst of its activities is guaranteed to result in heartache.

Instead of that, undo the original optimizations that threw away
QueryCancelPending anytime we were starting or finishing a command read, and
instead discard the cancel request within ProcessInterrupts if we find that
there is no HS reason for forcing a cancel and we are DoingCommandRead.

In passing, may I once again condemn the practice of changing the code
and not fixing the adjacent comment that you just turned into a lie?
2010-01-07 16:29:58 +00:00
Bruce Momjian f98fbc78c3 Preserve relfilenodes:
Add support to pg_dump --binary-upgrade to preserve all relfilenodes,
for use by pg_migrator.
2010-01-06 03:04:03 +00:00
Robert Haas d86d51a958 Support ALTER TABLESPACE name SET/RESET ( tablespace_options ).
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.

Thanks to Tom Lane for design help and Alvaro Herrera for the review.
2010-01-05 21:54:00 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Simon Riggs efc16ea520 Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.

New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.

This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.

Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.

Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 01:32:45 +00:00
Peter Eisentraut d6de43099a Don't unblock SIGQUIT in the SIGQUIT handler
This was possibly linked to a deadlock-like situation in glibc syslog code
invoked by the ereport call in quickdie().  In any case, a signal handler
should not unblock its own signal unless there is a specific reason to.
2009-12-16 23:05:00 +00:00
Peter Eisentraut b63b967a7e If there is no sigdelset(), define it as a macro.
This removes some duplicate code that recreated the identical workaround
when the newer signal API is missing.
2009-12-16 22:55:34 +00:00
Tom Lane a5495cd841 Add a hook to let loadable modules get control at ProcessUtility execution,
and use it to extend contrib/pg_stat_statements to track utility commands.

Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
2009-12-15 20:04:49 +00:00
Robert Haas cddca5ec13 Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics.
This patch also removes buffer-usage statistics from the track_counts
output, since this (or the global server statistics) is deemed to be a better
interface to this information.

Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
2009-12-15 04:57:48 +00:00