Commit Graph

1178 Commits

Author SHA1 Message Date
Joe Conway 5eb15c9942 SERIALIZABLE transactions are actually implemented beneath the covers with
transaction snapshots, i.e. a snapshot registered at the beginning of
a transaction. Change variable naming and comments to reflect this reality
in preparation for a future, truly serializable mode, e.g.
Serializable Snapshot Isolation (SSI).

For the moment transaction snapshots are still used to implement
SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin
Grittner and Dan Ports with review and some minor wording changes by me.
2010-09-11 18:38:58 +00:00
Tom Lane db2d9c602c Fix ExecMakeTableFunctionResult to verify that all rows returned by a SRF
returning "record" actually do have the same rowtype.  This is needed because
the parser can't realistically enforce that they will all have the same typmod,
as seen in a recent example from David Wheeler.

Back-patch to 8.0, which is as far back as we have the notion of RECORD
subtypes being distinguished by typmod.  Wheeler's example depends on
8.4-and-up features, but I suspect there may be ways to provoke similar
failures before 8.4.
2010-08-26 18:54:37 +00:00
Tom Lane 3573c8346d Reset the per-output-tuple exprcontext each time through the main loop in
ExecModifyTable().  This avoids memory leakage when trigger functions leave
junk behind in that context (as they more or less must).  Problem and solution
identified by Dean Rasheed.

I'm a bit concerned about the longevity of this solution --- once a plan can
have multiple ModifyTable nodes, we are very possibly going to have to do
something different.  But it should hold up for 9.0.
2010-08-18 21:52:24 +00:00
Robert Haas 2a6ef3445c Standardize get_whatever_oid functions for object types with
unqualified names.

- Add a missing_ok parameter to get_tablespace_oid.
- Avoid duplicating get_tablespace_od guts in objectNamesToOids.
- Add a missing_ok parameter to get_database_oid.
- Replace get_roleid and get_role_checked with get_role_oid.
- Add get_namespace_oid, get_language_oid, get_am_oid.
- Refactor existing code to use new interfaces.

Thanks to KaiGai Kohei for the review.
2010-08-05 14:45:09 +00:00
Tom Lane 77c75076f3 Fix oversight in new EvalPlanQual logic: the second loop over the ExecRowMark
list in ExecLockRows() forgot to allow for the possibility that some of the
rowmarks are for child tables that aren't relevant to the current row.
Per report from Kenichiro Tanaka.
2010-07-28 17:21:56 +00:00
Tom Lane 133924e13e Fix potential failure when hashing the output of a subplan that produces
a pass-by-reference datatype with a nontrivial projection step.
We were using the same memory context for the projection operation as for
the temporary context used by the hashtable routines in execGrouping.c.
However, the hashtable routines feel free to reset their temp context at
any time, which'd lead to destroying input data that was still needed.
Report and diagnosis by Tao Ma.

Back-patch to 8.1, where the problem was introduced by the changes that
allowed us to work with "virtual" tuples instead of materializing intermediate
tuple values everywhere.  The earlier code looks quite similar, but it doesn't
suffer the problem because the data gets copied into another context as a
result of having to materialize ExecProject's output tuple.
2010-07-28 04:50:50 +00:00
Robert Haas a3b012b560 CREATE TABLE IF NOT EXISTS.
Reviewed by Bernd Helmle.
2010-07-25 23:21:22 +00:00
Robert Haas b8c6c71d1c Centralize DML permissions-checking logic.
Remove bespoke code in DoCopy and RI_Initial_Check, which now instead
fabricate call ExecCheckRTPerms with a manufactured RangeTblEntry.
This is intended to make it feasible for an enhanced security provider
to actually make use of ExecutorCheckPerms_hook, but also has the
advantage that RI_Initial_Check can allow use of the fast-path when
column-level but not table-level permissions are present.

KaiGai Kohei.  Reviewed (in an earlier version) by Stephen Frost, and by me.
Some further changes to the comments by me.
2010-07-22 00:47:59 +00:00
Tom Lane e11cfa87be Remove a sanity check in the exclusion-constraint code that prevented users
from defining non-self-conflicting constraints.

Jeff Davis

Note: I (tgl) objected to removing this check in 9.0 on the grounds that it
was an important sanity check in new, poorly tested code.  However, it should
be all right to remove it for 9.1, since we'll get field testing from the
9.0 branch.
2010-07-16 00:45:30 +00:00
Tom Lane 53e757689c Make NestLoop plan nodes pass outer-relation variables into their inner
relation using the general PARAM_EXEC executor parameter mechanism, rather
than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
The previous method was hard to understand and could never be extended to
handle parameters coming from multiple join levels.  This patch doesn't
change the set of possible plans nor have any significant performance effect,
but it's necessary infrastructure for future generalization of the concept
of an inner indexscan plan.

ExecReScan's second parameter is now unused, so it's removed.
2010-07-12 17:01:06 +00:00
Robert Haas f4122a8d50 Add a hook in ExecCheckRTPerms().
This hook allows a loadable module to gain control when table permissions
are checked.  It is expected to be used by an eventual SE-PostgreSQL
implementation, but there are other possible applications as well.  A
sample contrib module can be found in the archives at:

http://archives.postgresql.org/pgsql-hackers/2010-05/msg01095.php

Robert Haas and Stephen Frost
2010-07-09 14:06:01 +00:00
Bruce Momjian 239d769e7e pgindent run for 9.0, second run 2010-07-06 19:19:02 +00:00
Bruce Momjian 7190220555 Add C comment that we will have to remove an exclusion constraint check
if we ever implement '<>' index opclasses.

Jeff Davis
2010-05-29 02:32:08 +00:00
Tom Lane f39d57b83c Rejigger mergejoin logic so that a tuple with a null in the first merge column
is treated like end-of-input, if nulls sort last in that column and we are not
doing outer-join filling for that input.  In such a case, the tuple cannot
join to anything from the other input (because we assume mergejoinable
operators are strict), and neither can any tuple following it in the sort
order.  If we're not interested in doing outer-join filling we can just
pretend the tuple and its successors aren't there at all.  This can save a
great deal of time in situations where there are many nulls in the join
column, as in a recent example from Scott Marlowe.  Also, since the planner
tends to not count nulls in its mergejoin scan selectivity estimates, this
is an important fix to make the runtime behavior more like the estimate.

I regard this as an omission in the patch I wrote years ago to teach mergejoin
that tuples containing nulls aren't joinable, so I'm back-patching it.  But
only to 8.3 --- in older versions, we didn't have a solid notion of whether
nulls sort high or low, so attempting to apply this optimization could break
things.
2010-05-28 01:14:03 +00:00
Heikki Linnakangas 9b8a73326e Introduce wal_level GUC to explicitly control if information needed for
archival or hot standby should be WAL-logged, instead of deducing that from
other options like archive_mode. This replaces recovery_connections GUC in
the primary, where it now has no effect, but it's still used in the standby
to enable/disable hot standby.

Remove the WAL-logging of "unlogged operations", like creating an index
without WAL-logging and fsyncing it at the end. Instead, we keep a copy of
the wal_mode setting and the settings that affect how much shared memory a
hot standby server needs to track master transactions (max_connections,
max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings
change, at server restart, write a WAL record noting the new settings and
update pg_control. This allows us to notice the change in those settings in
the standby at the right moment, they used to be included in checkpoint
records, but that meant that a changed value was not reflected in the
standby until the first checkpoint after the change.

Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to
the sequence it used to follow, before hot standby and subsequent patches
changed it to 0x9003.
2010-04-28 16:10:43 +00:00
Peter Eisentraut c248d17120 Message tuning 2010-03-21 00:17:59 +00:00
Tom Lane a836abe9f6 Modify error context callback functions to not assume that they can fetch
catalog entries via SearchSysCache and related operations.  Although, at the
time that these callbacks are called by elog.c, we have not officially aborted
the current transaction, it still seems rather risky to initiate any new
catalog fetches.  In all these cases the needed information is readily
available in the caller and so it's just a matter of a bit of extra notation
to pass it to the callback.

Per crash report from Dennis Koegel.  I've concluded that the real fix for
his problem is to clear the error context stack at entry to proc_exit, but
it still seems like a good idea to make the callbacks a bit less fragile
for other cases.

Backpatch to 8.4.  We could go further back, but the patch doesn't apply
cleanly.  In the absence of proof that this fixes something and isn't just
paranoia, I'm not going to expend the effort.
2010-03-19 22:54:41 +00:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Tom Lane 05d8a561ff Clean up handling of XactReadOnly and RecoveryInProgress checks.
Add some checks that seem logically necessary, in particular let's make
real sure that HS slave sessions cannot create temp tables.  (If they did
they would think that temp tables belonging to the master's session with
the same BackendId were theirs.  We *must* not allow myTempNamespace to
become set in a slave session.)

Change setval() and nextval() so that they are only allowed on temp sequences
in a read-only transaction.  This seems consistent with what we allow for
table modifications in read-only transactions.  Since an HS slave can't have a
temp sequence, this also provides a nicer cure for the setval PANIC reported
by Erik Rijkers.

Make the error messages more uniform, and have them mention the specific
command being complained of.  This seems worth the trifling amount of extra
code, since people are likely to see such messages a lot more than before.
2010-02-20 21:24:02 +00:00
Tom Lane 11d5ba97f8 Fix ExecEvalArrayRef to pass down the old value of the array element or slice
being assigned to, in case the expression to be assigned is a FieldStore that
would need to modify that value.  The need for this was foreseen some time
ago, but not implemented then because we did not have arrays of composites.
Now we do, but the point evidently got overlooked in that patch.  Net result
is that updating a field of an array element doesn't work right, as
illustrated if you try the new regression test on an unpatched backend.
Noted while experimenting with EXPLAIN VERBOSE, which has also got some issues
in this area.

Backpatch to 8.3, where arrays of composites were introduced.
2010-02-18 18:41:47 +00:00
Robert Haas e26c539e9f Wrap calls to SearchSysCache and related functions using macros.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4).  This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.

Design and review by Tom Lane.
2010-02-14 18:42:19 +00:00
Tom Lane ec4be2ee68 Extend the set of frame options supported for window functions.
This patch allows the frame to start from CURRENT ROW (in either RANGE or
ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING
start and end points.  (RANGE value PRECEDING/FOLLOWING isn't there yet ---
the grammar works, but that's all.)

Hitoshi Harada, reviewed by Pavel Stehule
2010-02-12 17:33:21 +00:00
Tom Lane cbe9d6beb4 Fix up rickety handling of relation-truncation interlocks.
Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr
relation entries, so that they will get reset to InvalidBlockNumber whenever
an smgr-level flush happens.  Because we now send smgr invalidation messages
immediately (not at end of transaction) when a relation truncation occurs,
this ensures that other backends will reset their values before they next
access the relation.  We no longer need the unreliable assumption that a
VACUUM that's doing a truncation will hold its AccessExclusive lock until
commit --- in fact, we can intentionally release that lock as soon as we've
completed the truncation.  This patch therefore reverts (most of) Alvaro's
patch of 2009-11-10, as well as my marginal hacking on it yesterday.  We can
also get rid of assorted no-longer-needed relcache flushes, which are far more
expensive than an smgr flush because they kill a lot more state.

In passing this patch fixes smgr_redo's failure to perform visibility-map
truncation, and cleans up some rather dubious assumptions in freespace.c and
visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of
date.
2010-02-09 21:43:30 +00:00
Tom Lane d5768dce10 Create an official API function for C functions to use to check if they are
being called as aggregates, and to get the aggregate transition state memory
context if needed.  Use it instead of poking directly into AggState and
WindowAggState in places that shouldn't know so much.

We should have done this in 8.4, probably, but better late than never.

Revised version of a patch by Hitoshi Harada.
2010-02-08 20:39:52 +00:00
Tom Lane 0a469c8769 Remove old-style VACUUM FULL (which was known for a little while as
VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
Per discussion, the use case for this method of vacuuming is no longer large
enough to justify maintaining it; not to mention that we don't wish to invest
the work that would be needed to make it play nicely with Hot Standby.

Aside from the code directly related to old-style VACUUM FULL, this commit
removes support for certain WAL record types that could only be generated
within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
nontransactional generation of cache invalidation sinval messages (the last
being the sticking point for Hot Standby).

We still have to retain all code that copes with finding HEAP_MOVED_OFF and
HEAP_MOVED_IN flag bits on existing tuples.  This can't be removed as long
as we want to support in-place update from pre-9.0 databases.
2010-02-08 04:33:55 +00:00
Tom Lane b9b8831ad6 Create a "relation mapping" infrastructure to support changing the relfilenodes
of shared or nailed system catalogs.  This has two key benefits:

* The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.

* We no longer have to use an unsafe reindex-in-place approach for reindexing
  shared catalogs.

CLUSTER on nailed catalogs now works too, although I left it disabled on
shared catalogs because the resulting pg_index.indisclustered update would
only be visible in one database.

Since reindexing shared system catalogs is now fully transactional and
crash-safe, the former special cases in REINDEX behavior have been removed;
shared catalogs are treated the same as non-shared.

This commit does not do anything about the recently-discussed problem of
deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
concurrent queries; will address that in a separate patch.  As a stopgap,
parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
such failures during the regression tests.
2010-02-07 20:48:13 +00:00
Heikki Linnakangas 9de778b24b Move the responsibility of writing a "unlogged WAL operation" record from
heap_sync() to the callers, because heap_sync() is sometimes called even
if the operation itself is WAL-logged. This eliminates the bogus unlogged
records from CLUSTER that Simon Riggs reported, patch by Fujii Masao.
2010-02-03 10:01:30 +00:00
Robert Haas 42a8ab0a14 Augment EXPLAIN output with more details on Hash nodes.
We show the number of buckets, the number of batches (and also the original
number if it has changed), and the peak space used by the hash table.  Minor
executor changes to track peak space used.
2010-02-01 15:43:36 +00:00
Tom Lane 034fffbf31 Fix memory leak created by deferrable-index-constraints patches.
We need to free the OID list returned by ExecInsertIndexTuples to avoid
a query-lifespan memory leak.  When many rows require rechecking, this
can be a significant leak --- it's even more than the space used for the
queued trigger events.

Dean Rasheed
2010-01-31 18:15:39 +00:00
Peter Eisentraut e7b3349a8a Type table feature
This adds the CREATE TABLE name OF type command, per SQL standard.
2010-01-28 23:21:13 +00:00
Heikki Linnakangas 40f908bdcd Introduce Streaming Replication.
This includes two new kinds of postmaster processes, walsenders and
walreceiver. Walreceiver is responsible for connecting to the primary server
and streaming WAL to disk, while walsender runs in the primary server and
streams WAL from disk to the client.

Documentation still needs work, but the basics are there. We will probably
pull the replication section to a new chapter later on, as well as the
sections describing file-based replication. But let's do that as a separate
patch, so that it's easier to see what has been added/changed. This patch
also adds a new section to the chapter about FE/BE protocol, documenting the
protocol used by walsender/walreceivxer.

Bump catalog version because of two new functions,
pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
monitoring the progress of replication.

Fujii Masao, with additional hacking by me
2010-01-15 09:19:10 +00:00
Tom Lane 292176a118 Improve ExecEvalVar's handling of whole-row variables in cases where the
rowtype contains dropped columns.  Sometimes the input tuple will be formed
from a select targetlist in which dropped columns are filled with a NULL
of an arbitrary type (the planner typically uses INT4, since it can't tell
what type the dropped column really was).  So we need to relax the rowtype
compatibility check to not insist on physical compatibility if the actual
column value is NULL.

In principle we might need to do this for functions returning composite
types, too (see tupledesc_match()).  In practice there doesn't seem to be
a bug there, probably because the function will be using the same cached
rowtype descriptor as the caller.  Fixing that code path would require
significant rearrangement, so I left it alone for now.

Per complaint from Filip Rembialkowski.
2010-01-11 15:31:04 +00:00
Tom Lane 85113bcf5a Make ExecEvalFieldSelect throw a more intelligible error if it's asked to
extract a system column, and remove a couple of lines that are useless
in light of the fact that we aren't ever going to support this case.  There
isn't much point in trying to make this work because a tuple Datum does
not carry many of the system columns.  Per experimentation with a case
reported by Dean Rasheed; we'll have to fix his problem somewhere else.
2010-01-09 20:46:19 +00:00
Tom Lane 217dc525c0 Fix oversight in EvalPlanQualFetch: after failing to lock a tuple because
someone else has just updated it, we have to set priorXmax to that tuple's
xmax (ie, the XID of the other xact that updated it) before looping back to
examine the next tuple.  Obviously, the next tuple in the update chain should
have that XID as its xmin, not the same xmin as the preceding tuple that we
had been trying to lock.  The mismatch would cause the EvalPlanQual logic to
decide that the tuple chain ended in a deletion, when actually there was a
live tuple that should have been found.

I inserted this error when recently adding logic to EvalPlanQual to make it
lock tuples before returning them (as opposed to the old method in which the
lock would occur much later, causing a great deal of work to be wasted if we
only then discover someone else updated it).  Sigh.  Per today's report from
Takahiro Itagaki of inconsistent results during pgbench runs.
2010-01-08 02:44:00 +00:00
Bruce Momjian f98fbc78c3 Preserve relfilenodes:
Add support to pg_dump --binary-upgrade to preserve all relfilenodes,
for use by pg_migrator.
2010-01-06 03:04:03 +00:00
Tom Lane 90f4c2d960 Add support for doing FULL JOIN ON FALSE. While this is really a rather
peculiar variant of UNION ALL, and so wouldn't likely get written directly
as-is, it's possible for it to arise as a result of simplification of
less-obviously-silly queries.  In particular, now that we can do flattening
of subqueries that have constant outputs and are underneath an outer join,
it's possible for the case to result from simplification of queries of the
type exhibited in bug #5263.  Back-patch to 8.4 to avoid a functionality
regression for this type of query.
2010-01-05 23:25:36 +00:00
Tom Lane 40608e7f94 When estimating the selectivity of an inequality "column > constant" or
"column < constant", and the comparison value is in the first or last
histogram bin or outside the histogram entirely, try to fetch the actual
column min or max value using an index scan (if there is an index on the
column).  If successful, replace the lower or upper histogram bound with
that value before carrying on with the estimate.  This limits the
estimation error caused by moving min/max values when the comparison
value is close to the min or max.  Per a complaint from Josh Berkus.

It is tempting to consider using this mechanism for mergejoinscansel as well,
but that would inject index fetches into main-line join estimation not just
endpoint cases.  I'm refraining from that until we can get a better handle
on the costs of doing this type of lookup.
2010-01-04 02:44:40 +00:00
Tom Lane 2b59274c09 check_exclusion_constraint didn't actually work correctly for index
expressions: FormIndexDatum requires the estate's scantuple to already point
at the tuple the values are supposedly being extracted from.  Adjust test
case so that this type of confusion will be exposed.
Per report from hubert depesz lubaczewski.
2010-01-02 17:53:57 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Tom Lane 7839d35991 Add an "argisrow" field to NullTest nodes, following a plan made way back in
8.2beta but never carried out.  This avoids repetitive tests of whether the
argument is of scalar or composite type.  Also, be a bit more paranoid about
composite arguments in some places where we previously weren't checking.
2010-01-01 23:03:10 +00:00
Tom Lane 29c4ad9829 Support "x IS NOT NULL" clauses as indexscan conditions. This turns out
to be just a minor extension of the previous patch that made "x IS NULL"
indexable, because we can treat the IS NOT NULL condition as if it were
"x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option),
just like IS NULL is treated like "x = NULL".  Aside from any possible
usefulness in its own right, this is an important improvement for
index-optimized MAX/MIN aggregates: it is now reliably possible to get
a column's min or max value cheaply, even when there are a lot of nulls
cluttering the interesting end of the index.
2010-01-01 21:53:49 +00:00
Tom Lane 649b5ec7c8 Add the ability to store inheritance-tree statistics in pg_statistic,
and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.

autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.
2009-12-29 20:11:45 +00:00
Heikki Linnakangas 84d723b6ce Previous fix for temporary file management broke returning a set from
PL/pgSQL function within an exception handler. Make sure we use the right
resource owner when we create the tuplestore to hold returned tuples.

Simplify tuplestore API so that the caller doesn't need to be in the right
memory context when calling tuplestore_put* functions. tuplestore.c
automatically switches to the memory context used when the tuplestore was
created. Tuplesort was already modified like this earlier. This patch also
removes the now useless MemoryContextSwitch calls from callers.

Report by Aleksei on pgsql-bugs on Dec 22 2009. Backpatch to 8.1, like
the previous patch that broke this.
2009-12-29 17:40:59 +00:00
Tom Lane 34d26872ed Support ORDER BY within aggregate function calls, at long last providing a
non-kluge method for controlling the order in which values are fed to an
aggregate function.  At the same time eliminate the old implementation
restriction that DISTINCT was only supported for single-argument aggregates.

Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
dropped null values of x unconditionally.  Now, it does so only if the
agg transition function is strict; otherwise nulls are treated as DISTINCT
normally would, ie, you get one copy.

Andrew Gierth, reviewed by Hitoshi Harada
2009-12-15 17:57:48 +00:00
Robert Haas cddca5ec13 Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics.
This patch also removes buffer-usage statistics from the track_counts
output, since this (or the global server statistics) is deemed to be a better
interface to this information.

Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
2009-12-15 04:57:48 +00:00
Tom Lane a620d5005d Fix a bug introduced when set-returning SQL functions were made inline-able:
we have to cope with the possibility that the declared result rowtype contains
dropped columns.  This fails in 8.4, as per bug #5240.

While at it, be more paranoid about inserting binary coercions when inlining.
The pre-8.4 code did not really need to worry about that because it could not
inline at all in any case where an added coercion could change the behavior
of the function's statement.  However, when inlining a SRF we allow sorting,
grouping, and set-ops such as UNION.  In these cases, modifying one of the
targetlist entries that the sort/group/setop depends on could conceivably
change the behavior of the function's statement --- so don't inline when
such a case applies.
2009-12-14 02:15:54 +00:00
Tom Lane d8e511fabb Ensure that the result tuple of an EvalPlanQual cycle gets materialized
before we zap the input tuple.  Otherwise, pass-by-reference columns of
the result slot are likely to contain just references to the input
tuple, leading to big trouble if the pfree'd space is reused.  Per
trouble report from Jaime Casanova.  This is a new bug in the recent
rewrite of EvalPlanQual, so nothing to back-patch.
2009-12-11 18:14:43 +00:00
Tom Lane 62aba76568 Prevent indirect security attacks via changing session-local state within
an allegedly immutable index function.  It was previously recognized that
we had to prevent such a function from executing SET/RESET ROLE/SESSION
AUTHORIZATION, or it could trivially obtain the privileges of the session
user.  However, since there is in general no privilege checking for changes
of session-local state, it is also possible for such a function to change
settings in a way that might subvert later operations in the same session.
Examples include changing search_path to cause an unexpected function to
be called, or replacing an existing prepared statement with another one
that will execute a function of the attacker's choosing.

The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against
these threats, which are the same places previously deemed to need protection
against the SET ROLE issue.  GUC changes are still allowed, since there are
many useful cases for that, but we prevent security problems by forcing a
rollback of any GUC change after completing the operation.  Other cases are
handled by throwing an error if any change is attempted; these include temp
table creation, closing a cursor, and creating or deleting a prepared
statement.  (In 7.4, the infrastructure to roll back GUC changes doesn't
exist, so we settle for rejecting changes of "search_path" in these contexts.)

Original report and patch by Gurjeet Singh, additional analysis by
Tom Lane.

Security: CVE-2009-4136
2009-12-09 21:57:51 +00:00
Tom Lane 0cb65564e5 Add exclusion constraints, which generalize the concept of uniqueness to
support any indexable commutative operator, not just equality.  Two rows
violate the exclusion constraint if "row1.col OP row2.col" is TRUE for
each of the columns in the constraint.

Jeff Davis, reviewed by Robert Haas
2009-12-07 05:22:23 +00:00
Tom Lane 7fc0f06221 Add a WHEN clause to CREATE TRIGGER, allowing a boolean expression to be
checked to determine whether the trigger should be fired.

For BEFORE triggers this is mostly a matter of spec compliance; but for AFTER
triggers it can provide a noticeable performance improvement, since queuing of
a deferred trigger event and re-fetching of the row(s) at end of statement can
be short-circuited if the trigger does not need to be fired.

Takahiro Itagaki, reviewed by KaiGai Kohei.
2009-11-20 20:38:12 +00:00
Tom Lane 2ace38d226 Fix WHERE CURRENT OF to work as designed within plpgsql. The argument
can be the name of a plpgsql cursor variable, which formerly was converted
to $N before the core parser saw it, but that's no longer the case.
Deal with plain name references to plpgsql variables, and add a regression
test case that exposes the failure.
2009-11-09 02:36:59 +00:00
Tom Lane 9bedd128d6 Add support for invoking parser callback hooks via SPI and in cached plans.
As proof of concept, modify plpgsql to use the hooks.  plpgsql is still
inserting $n symbols textually, but the "back end" of the parsing process now
goes through the ParamRef hook instead of using a fixed parameter-type array,
and then execution only fetches actually-referenced parameters, using a hook
added to ParamListInfo.

Although there's a lot left to be done in plpgsql, this already cures the
"if (TG_OP = 'INSERT' and NEW.foo ...)"  problem, as illustrated by the
changed regression test.
2009-11-04 22:26:08 +00:00
Tom Lane 8442317beb Make the overflow guards in ExecChooseHashTableSize be more protective.
The original coding ensured nbuckets and nbatch didn't exceed INT_MAX,
which while not insane on its own terms did nothing to protect subsequent
code like "palloc(nbatch * sizeof(BufFile *))".  Since enormous join size
estimates might well be planner error rather than reality, it seems best
to constrain the initial sizes to be not more than work_mem/sizeof(pointer),
thus ensuring the allocated arrays don't exceed work_mem.  We will allow
nbatch to get bigger than that during subsequent ExecHashIncreaseNumBatches
calls, but we should still guard against integer overflow in those palloc
requests.  Per bug #5145 from Bernt Marius Johnsen.

Although the given test case only seems to fail back to 8.2, previous
releases have variants of this issue, so patch all supported branches.
2009-10-30 20:58:45 +00:00
Tom Lane 9f2ee8f287 Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases.  We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries.  If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row.  The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.

Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested.  To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param.  Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.

This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE.  This is needed to avoid the
duplicate-output-tuple problem.  It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 02:26:45 +00:00
Tom Lane 0adaf4cb31 Move the handling of SELECT FOR UPDATE locking and rechecking out of
execMain.c and into a new plan node type LockRows.  Like the recent change
to put table updating into a ModifyTable plan node, this increases planning
flexibility by allowing the operations to occur below the top level of the
plan tree.  It's necessary in any case to restore the previous behavior of
having FOR UPDATE locking occur before ModifyTable does.

This partially refactors EvalPlanQual to allow multiple rows-under-test
to be inserted into the EPQ machinery before starting an EPQ test query.
That isn't sufficient to fix EPQ's general bogosity in the face of plans
that return multiple rows per test row, though.  Since this patch is
mostly about getting some plan node infrastructure in place and not about
fixing ten-year-old bugs, I will leave EPQ improvements for another day.

Another behavioral change that we could now think about is doing FOR UPDATE
before LIMIT, but that too seems like it should be treated as a followon
patch.
2009-10-12 18:10:51 +00:00
Tom Lane 8a5849b7ff Split the processing of INSERT/UPDATE/DELETE operations out of execMain.c.
They are now handled by a new plan node type called ModifyTable, which is
placed at the top of the plan tree.  In itself this change doesn't do much,
except perhaps make the handling of RETURNING lists and inherited UPDATEs a
tad less klugy.  But it is necessary preparation for the intended extension of
allowing RETURNING queries inside WITH.

Marko Tiikkaja
2009-10-10 01:43:50 +00:00
Tom Lane c970292a94 Remove very ancient tuple-counting infrastructure (IncrRetrieved() and
friends).  This code has all been ifdef'd out for many years, and doesn't
seem to have any prospect of becoming any more useful in the future.
EXPLAIN ANALYZE is what people use in practice, and I think if we did want
process-wide counters we'd be more likely to put in dtrace events for that
than try to resurrect this code.  Get rid of it so as to have one less detail
to worry about while refactoring execMain.c.
2009-10-08 22:34:57 +00:00
Tom Lane 249724cb01 Create an ALTER DEFAULT PRIVILEGES command, which allows users to adjust
the privileges that will be applied to subsequently-created objects.

Such adjustments are always per owning role, and can be restricted to objects
created in particular schemas too.  A notable benefit is that users can
override the traditional default privilege settings, eg, the PUBLIC EXECUTE
privilege traditionally granted by default for functions.

Petr Jelinek
2009-10-05 19:24:49 +00:00
Alvaro Herrera caa4cfa369 Ensure that a cursor has an immutable snapshot throughout its lifespan.
The old coding was using a regular snapshot, referenced elsewhere, that was
subject to having its command counter updated.  Fix by creating a private copy
of the snapshot exclusively for the cursor.

Backpatch to 8.4, which is when the bug was introduced during the snapshot
management rewrite.
2009-10-02 17:57:30 +00:00
Tom Lane 421d7d8edb Remove no-longer-needed ExecCountSlots infrastructure. 2009-09-27 21:10:53 +00:00
Tom Lane f92e8a4b5e Replace the array-style TupleTable data structure with a simple List of
TupleTableSlot nodes.  This eliminates the need to count in advance
how many Slots will be needed, which seems more than worth the small
increase in the amount of palloc traffic during executor startup.

The ExecCountSlots infrastructure is now all dead code, but I'll remove it
in a separate commit for clarity.

Per a comment from Robert Haas.
2009-09-27 20:09:58 +00:00
Tom Lane 4985635230 Extend the BKI infrastructure to allow system catalogs to be given
hand-assigned rowtype OIDs, even when they are not "bootstrapped" catalogs
that have handmade type rows in pg_type.h.  Give pg_database such an OID.
Restore the availability of C macros for the rowtype OIDs of the bootstrapped
catalogs.  (These macros are now in the individual catalogs' .h files,
though, not in pg_type.h.)

This commit doesn't do anything especially useful by itself, but it's
necessary infrastructure for reverting some ill-considered changes in
relcache.c.
2009-09-26 22:42:03 +00:00
Tom Lane 9bb342811b Rewrite the planner's handling of materialized plan types so that there is
an explicit model of rescan costs being different from first-time costs.
The costing of Material nodes in particular now has some visible relationship
to the actual runtime behavior, where before it was essentially fantasy.
This also fixes up a couple of places where different materialized plan types
were treated differently for no very good reason (probably just oversights).

A couple of the regression tests are affected, because the planner now chooses
to put the other relation on the inside of a nestloop-with-materialize.
So far as I can see both changes are sane, and the planner is now more
consistently following the expectation that it should prefer to materialize
the smaller of two relations.

Per a recent discussion with Robert Haas.
2009-09-12 22:12:09 +00:00
Tom Lane c38b75947e Tweak ExecIndexEvalRuntimeKeys to forcibly detoast any toasted comparison
values before they get passed to the index access method.  This avoids
repeated detoastings that will otherwise ensue as the comparison value
is examined by various index support functions.  We have seen a couple of
reports of cases where repeated detoastings result in an order-of-magnitude
slowdown, so it seems worth adding a bit of extra logic to prevent this.

I had previously proposed trying to avoid duplicate detoastings in general,
but this fix takes care of what seems the most important case in practice
with very little effort or risk.

Back-patch to 8.4 so that the PostGIS folk won't have to wait a year to
have this fix in a production release.  (The issue exists further back,
of course, but the code's diverged enough to make backpatching further a
higher-risk action.  Also it appears that the possible gains may be limited
in prior releases because of different handling of lossy operators.)
2009-08-23 18:26:08 +00:00
Tom Lane dcb2bda9b7 Improve plpgsql's ability to cope with rowtypes containing dropped columns,
by supporting conversions in places that used to demand exact rowtype match.

Since this issue is certain to come up elsewhere (in fact, already has,
in ExecEvalConvertRowtype), factor out the support code into new core
functions for tuple conversion.  I chose to put these in a new source
file since heaptuple.c is already overly long.

Heavily revised version of a patch by Pavel Stehule.
2009-08-06 20:44:32 +00:00
Tom Lane 25d9bf2e3e Support deferrable uniqueness constraints.
The current implementation fires an AFTER ROW trigger for each tuple that
looks like it might be non-unique according to the index contents at the
time of insertion.  This works well as long as there aren't many conflicts,
but won't scale to massive unique-key reassignments.  Improving that case
is a TODO item.

Dean Rasheed
2009-07-29 20:56:21 +00:00
Tom Lane adfa04293b Save a few cycles in EXPLAIN and related commands by not bothering to form
a physical tuple in do_tup_output().  A virtual tuple is easier to set up
and also easier for most tuple receivers to process.  Per my comment on
Robert Haas' recent patch in this code.
2009-07-23 21:27:10 +00:00
Tom Lane 6a0865e4bb In a non-hashed Agg node, reset the "aggcontext" at group boundaries, instead
of individually pfree'ing pass-by-reference transition values.  This should
be at least as fast as the prior coding, and it has the major advantage of
clearing out any working data an aggregate function may have stored in or
underneath the aggcontext.  This avoids memory leakage when an aggregate
such as array_agg() is used in GROUP BY mode.  Per report from Chris Spotts.

Back-patch to 8.4.  In principle the problem could arise in prior versions,
but since they didn't have array_agg the issue seems not critical.
2009-07-23 20:45:27 +00:00
Tom Lane 846c364dd4 Change do_tup_output() to take Datum/isnull arrays instead of a char * array,
so it doesn't go through BuildTupleFromCStrings.  This is more or less a
wash for current uses, but will avoid inefficiency for planned changes to
EXPLAIN.

Robert Haas
2009-07-22 17:00:23 +00:00
Tom Lane 011eae60ef Fix error cleanup failure caused by 8.4 changes in plpgsql to try to avoid
memory leakage in error recovery.  We were calling FreeExprContext, and
therefore invoking ExprContextCallback callbacks, in both normal and error
exits from subtransactions.  However this isn't very safe, as shown in
recent trouble report from Frank van Vugt, in which releasing a tupledesc
refcount failed.  It's also unnecessary, since the resources that callbacks
might wish to release should be cleaned up by other error recovery mechanisms
(ie the resource owners).  We only really want FreeExprContext to release
memory attached to the exprcontext in the error-exit case.  So, add a bool
parameter to FreeExprContext to tell it not to call the callbacks.

A more general solution would be to pass the isCommit bool parameter on to
the callbacks, so they could do only safe things during error exit.  But
that would make the patch significantly more invasive and possibly break
third-party code that registers ExprContextCallback callbacks.  We might want
to do that later in HEAD, but for now I'll just do what seems reasonable to
back-patch.
2009-07-18 19:15:42 +00:00
Tom Lane 82480e28f5 Fix things so that array_agg_finalfn does not modify or free its input
ArrayBuildState, per trouble report from Merlin Moncure.  By adopting
this fix, we are essentially deciding that aggregate final-functions
should not modify their inputs ever.  Adjust documentation and comments
to match that conclusion.
2009-06-20 18:45:28 +00:00
Tom Lane e8d78d35f4 ExecAgg() failed to finish running out set-returning functions in the last
aggregated tuple of a run.  Per report from Laurenz Albe.  This is a new
bug in 8.4, but only because prior versions rejected SRFs in an Agg plan
node altogether.
2009-06-17 16:05:34 +00:00
Tom Lane 44aa60fa7c Revisit AlterTableCreateToastTable's API once again, hoping to make it what
pg_migrator actually needs and not just a partial solution.  We have to be
able to specify the OID that the new toast table should be created with.
2009-06-11 20:46:11 +00:00
Tom Lane 0c19f05803 Fix things so that you can still do "select foo()" where foo is a SQL
function returning setof record.  This used to work, more or less
accidentally, but I had broken it while extending the code to allow
materialize-mode functions to be called in select lists.  Add a regression
test case so it doesn't get broken again.  Per gripe from Greg Davidson.
2009-06-11 17:25:39 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Peter Eisentraut 9b7304bc25 Fix xmlattribute escaping XML special characters twice (bug #4822).
Author: Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>
2009-06-09 22:00:57 +00:00
Tom Lane 76d4abf2d9 Improve the recently-added support for properly pluralized error messages
by extending the ereport() API to cater for pluralization directly.  This
is better than the original method of calling ngettext outside the elog.c
code because (1) it avoids double translation, which wastes cycles and in
the worst case could give a wrong result; and (2) it avoids having to use
a different coding method in PL code than in the core backend.  The
client-side uses of ngettext are not touched since neither of these concerns
is very pressing in the client environment.  Per my proposal of yesterday.
2009-06-04 18:33:08 +00:00
Tom Lane 1e06ed1abe Add an option to AlterTableCreateToastTable() to allow its caller to force
a toast table to be built, even if the sum-of-column-widths calculation
indicates one isn't needed.  This is needed by pg_migrator because if the
old table has a toast table, we have to migrate over the toast table since
it might contain some live data, even though subsequent column drops could
mean that no recently-added rows could require toasting.
2009-05-07 22:58:28 +00:00
Peter Eisentraut 77d67a4a3b XMLATTRIBUTES() should send the attribute values through
map_sql_value_to_xml_value() instead of directly through the data type output
function.  This is per SQL standard, and consistent with XMLELEMENT().
2009-04-08 21:51:38 +00:00
Tom Lane eb4c723e56 Make ExecInitExpr build the list of SubPlans found in a plan tree in order
of discovery, rather than reverse order.  This doesn't matter functionally
(I suppose the previous coding dates from the time when lcons was markedly
cheaper than lappend).  However now that EXPLAIN is labeling subplans with
IDs that are based on order of creation, this may help produce a slightly
less surprising printout.
2009-04-05 20:32:06 +00:00
Tom Lane 85369f888e Refactor ExecProject and associated routines so that fast-path code is used
for simple Var targetlist entries all the time, even when there are other
entries that are not simple Vars.  Also, ensure that we prefetch attributes
(with slot_getsomeattrs) for all Vars in the targetlist, even those buried
within expressions.  In combination these changes seem to significantly
reduce the runtime for cases where tlists are mostly but not exclusively
Vars.  Per my proposal of yesterday.
2009-04-02 22:39:30 +00:00
Bruce Momjian 0e550ff617 Revert DTrace patch from Robert Lor 2009-04-02 20:59:10 +00:00
Bruce Momjian 227f817c1f Add support for additional DTrace probes.
Robert Lor
2009-04-02 19:14:34 +00:00
Tom Lane 793d5662e8 Fix an oversight in the support for storing/retrieving "minimal tuples" in
TupleTableSlots.  We have functions for retrieving a minimal tuple from a slot
after storing a regular tuple in it, or vice versa; but these were implemented
by converting the internal storage from one format to the other.  The problem
with that is it invalidates any pass-by-reference Datums that were already
fetched from the slot, since they'll be pointing into the just-freed version
of the tuple.  The known problem cases involve fetching both a whole-row
variable and a pass-by-reference value from a slot that is fed from a
tuplestore or tuplesort object.  The added regression tests illustrate some
simple cases, but there may be other failure scenarios traceable to the same
bug.  Note that the added tests probably only fail on unpatched code if it's
built with --enable-cassert; otherwise the bug leads to fetching from freed
memory, which will not have been overwritten without additional conditions.

Fix by allowing a slot to contain both formats simultaneously; which turns out
not to complicate the logic much at all, if anything it seems less contorted
than before.

Back-patch to 8.2, where minimal tuples were introduced.
2009-03-30 04:08:43 +00:00
Tom Lane 25bf7f8b9b Fix possible failures when a tuplestore switches from in-memory to on-disk
mode while callers hold pointers to in-memory tuples.  I reported this for
the case of nodeWindowAgg's primary scan tuple, but inspection of the code
shows that all of the calls in nodeWindowAgg and nodeCtescan are at risk.
For the moment, fix it with a rather brute-force approach of copying
whenever one of the at-risk callers requests a tuple.  Later we might
think of some sort of reference-count approach to reduce tuple copying.
2009-03-27 18:30:21 +00:00
Peter Eisentraut 8032d76b5b Gettext plural support
In the backend, I changed only a handful of exemplary or important-looking
instances to make use of the plural support; there is probably more work
there.  For the rest of the source, this should cover all relevant cases.
2009-03-26 22:26:08 +00:00
Tom Lane 596efd27ed Optimize multi-batch hash joins when the outer relation has a nonuniform
distribution, by creating a special fast path for the (first few) most common
values of the outer relation.  Tuples having hashvalues matching the MCVs
are effectively forced to be in the first batch, so that we never write
them out to the batch temp files.

Bryce Cutt and Ramon Lawrence, with some editorialization by me.
2009-03-21 00:04:40 +00:00
Peter Eisentraut 12f87b2c82 Add new SQL:2008 error codes for invalid LIMIT and OFFSET values. Remove
unused nonstandard error code that was perhaps intended for this but never
used.
2009-03-04 10:55:00 +00:00
Tom Lane 3d02cae310 Ensure that INSERT ... SELECT into a table with OIDs never copies row OIDs
from the source table.  This could never happen anyway before 8.4 because
the executor invariably applied a "junk filter" to rows due to be inserted;
but now that we skip doing that when it's not necessary, the case can occur.
Problem noted 2008-11-27 by KaiGai Kohei, though I misunderstood what he
was on about at the time (the opacity of the patch he proposed didn't help).
2009-02-08 18:02:27 +00:00
Alvaro Herrera 3a5b773715 Allow reloption names to have qualifiers, initially supporting a TOAST
qualifier, and add support for this in pg_dump.

This allows TOAST tables to have user-defined fillfactor, and will also
enable us to move the autovacuum parameters to reloptions without taking
away the possibility of setting values for TOAST tables.
2009-02-02 19:31:40 +00:00
Tom Lane 3cb5d6580a Support column-level privileges, as required by SQL standard.
Stephen Frost, with help from KaiGai Kohei and others
2009-01-22 20:16:10 +00:00
Heikki Linnakangas 94136d5a18 Add new SPI_OK_REWRITTEN return code to SPI_execute and friends, for the
case that the command is rewritten into another type of command. The old
behavior to return the command tag of the last executed command was
pretty surprising. In PL/pgSQL, for example, it meant that if a command
was rewritten to a utility statement, FOUND wasn't set at all.
2009-01-21 11:02:40 +00:00
Tom Lane 8a4505013d Tweak order of operations in BitmapHeapNext() to avoid the case of prefetching
the same page we are nanoseconds away from reading for real.  There should be
something left to do on the current page before we consider issuing a prefetch.
2009-01-12 16:00:41 +00:00
Tom Lane b7b8f0b609 Implement prefetching via posix_fadvise() for bitmap index scans. A new
GUC variable effective_io_concurrency controls how many concurrent block
prefetch requests will be issued.

(The best way to handle this for plain index scans is still under debate,
so that part is not applied yet --- tgl)

Greg Stark
2009-01-12 05:10:45 +00:00
Tom Lane 43a57cf365 Revise the TIDBitmap API to support multiple concurrent iterations over a
bitmap.  This is extracted from Greg Stark's posix_fadvise patch; it seems
worth committing separately, since it's potentially useful independently of
posix_fadvise.
2009-01-10 21:08:36 +00:00
Tom Lane d04db37072 Arrange for function default arguments to be processed properly in expressions
that are set up for execution with ExecPrepareExpr rather than going through
the full planner process.  By introducing an explicit notion of "expression
planning", this patch also lays a bit of groundwork for maybe someday
allowing sub-selects in standalone expressions.
2009-01-09 15:46:11 +00:00
Tom Lane deac9488d3 Insert conditional SPI_push/SPI_pop calls into InputFunctionCall,
OutputFunctionCall, and friends.  This allows SPI-using functions to invoke
datatype I/O without concern for the possibility that a SPI-using function
will be called (which could be either the I/O function itself, or a function
used in a domain check constraint).  It's a tad ugly, but not nearly as ugly
as what'd be needed to make this work via retail insertion of push/pop
operations in all the PLs.

This reverts my patch of 2007-01-30 that inserted some retail SPI_push/pop
calls into plpgsql; that approach only fixed plpgsql, and not any other PLs.
But the other PLs have the issue too, as illustrated by a recent gripe from
Christian Schröder.

Back-patch to 8.2, which is as far back as this solution will work.  It's
also as far back as we need to worry about the domain-constraint case, since
earlier versions did not attempt to check domain constraints within datatype
input.  I'm not aware of any old I/O functions that use SPI themselves, so
this should be sufficient for a back-patch.
2009-01-07 20:38:56 +00:00
Tom Lane 1cfd9e8834 Fix executor/spi.h to follow our usual conventions for include files, ie,
not include postgres.h nor anything else it doesn't directly need.  Add
#includes to calling files as needed to compensate.  Per my proposal of
yesterday.

This should be noted as a source code change in the 8.4 release notes,
since it's likely to require changes in add-on modules.
2009-01-07 13:44:37 +00:00
Tom Lane bbeb0bbf6b Include a pointer to the query's source text in QueryDesc structs. This is
practically free given prior 8.4 changes in plancache and portal management,
and it makes it a lot easier for ExecutorStart/Run/End hooks to get at the
query text.  Extracted from Itagaki Takahiro's pg_stat_statements patch,
with minor editorialization.
2009-01-02 20:42:00 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00