Commit Graph

551 Commits

Author SHA1 Message Date
Robert Haas bfc78d7196 Rewrite interaction of parallel mode with parallel executor support.
In the previous coding, before returning from ExecutorRun, we'd shut
down all parallel workers.  This was dead wrong if ExecutorRun was
called with a non-zero tuple count; it had the effect of truncating
the query output.  To fix, give ExecutePlan control over whether to
enter parallel mode, and have it refuse to do so if the tuple count
is non-zero.  Rewrite the Gather logic so that it can cope with being
called outside parallel mode.

Commit 7aea8e4f2d is largely to blame
for this problem, though this patch modifies some subsequently-committed
code which relied on the guarantees it purported to make.
2015-10-16 11:56:02 -04:00
Tom Lane a31e64d065 Fix some issues in new hashtable size calculations in nodeHash.c.
Limit the size of the hashtable pointer array to not more than
MaxAllocSize, per reports from Kouhei Kaigai and others of "invalid memory
alloc request size" failures.  There was discussion of allowing the array
to get larger than that by using the "huge" palloc API, but so far no proof
that that is actually a good idea, and at this point in the 9.5 cycle major
changes from old behavior don't seem like the way to go.

Fix a rather serious secondary bug in the new code, which was that it
didn't ensure nbuckets remained a power of 2 when recomputing it for the
multiple-batch case.

Clean up sloppy division of labor between ExecHashIncreaseNumBuckets and
its sole call site.
2015-10-04 14:06:50 -04:00
Robert Haas 3bd909b220 Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream.  It can also run the plan itself, if the workers are
unavailable or haven't started up yet.  It is intended to work with
the Partial Seq Scan node which will be added in future commits.

It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used.  In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results.  So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes.  Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.

There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne.  But we're getting
close.

Amit Kapila.  Some designs suggestions were provided by me, and I also
reviewed the patch.  Single-copy mode, documentation, and other minor
changes also by me.
2015-09-30 19:23:36 -04:00
Robert Haas d1b7c1ffe7 Parallel executor support.
This code provides infrastructure for a parallel leader to start up
parallel workers to execute subtrees of the plan tree being executed
in the master.  User-supplied parameters from ParamListInfo are passed
down, but PARAM_EXEC parameters are not.  Various other constructs,
such as initplans, subplans, and CTEs, are also not currently shared.
Nevertheless, there's enough here to support a basic implementation of
parallel query, and we can lift some of the current restrictions as
needed.

Amit Kapila and Robert Haas
2015-09-28 21:55:57 -04:00
Robert Haas 4a4e6893aa Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends.  However, shm_mq itself only deals in raw
bytes.  Since commit 2bd9e412f9, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them.  This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.

The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion.  Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.

This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.

Extracted from Amit Kapila's parallel sequential scan patch.  This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:56:58 -04:00
Alvaro Herrera 8c3d63c521 Remove ExecGetScanType function
This became unused in a191a169d6.
2015-08-21 14:11:58 -03:00
Tom Lane dd7a8f66ed Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM.  (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.)  Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type.  This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.

Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.

Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.

Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Simon Riggs f6d208d6e5 TABLESAMPLE, SQL Standard and extensible
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.

Petr Jelinek

Reviewed by Michael Paquier and Simon Riggs
2015-05-15 14:37:10 -04:00
Tom Lane 1dc5ebc907 Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation.  Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype.  There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.

The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.

In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables.  We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work.  (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)

Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad.  In any case most applications should see a net win.

Tom Lane, reviewed by Andres Freund
2015-05-14 12:08:49 -04:00
Tom Lane afb9249d06 Add support for doing late row locking in FDWs.
Previously, FDWs could only do "early row locking", that is lock a row as
soon as it's fetched, even though local restriction/join conditions might
discard the row later.  This patch adds callbacks that allow FDWs to do
late locking in the same way that it's done for regular tables.

To make use of this feature, an FDW must support the "ctid" column as a
unique row identifier.  Currently, since ctid has to be of type TID,
the feature is of limited use, though in principle it could be used by
postgres_fdw.  We may eventually allow FDWs to specify another data type
for ctid, which would make it possible for more FDWs to use this feature.

This commit does not modify postgres_fdw to use late locking.  We've
tested some prototype code for that, but it's not in committable shape,
and besides it's quite unclear whether it actually makes sense to do late
locking against a remote server.  The extra round trips required are likely
to outweigh any benefit from improved concurrency.

Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me
2015-05-12 14:10:17 -04:00
Tom Lane 1a8a4e5cde Code review for foreign/custom join pushdown patch.
Commit e7cb7ee145 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments.  Clean up
as follows:

* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function.  In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs.  Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.

* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.

* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that.  Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.

* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries.  The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks.  It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway.  I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.

* Avoid ad-hocery in ExecAssignScanProjectionInfo.  It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.

* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.

* Lots of cleanup of documentation and missed comments.  Re-order some
code additions into more logical places.
2015-05-10 14:36:36 -04:00
Andres Freund 168d5805e4 Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:43:10 +02:00
Stephen Frost e89bd02f58 Perform RLS WITH CHECK before constraints, etc
The RLS capability is built on top of the WITH CHECK OPTION
system which was added for auto-updatable views, however, unlike
WCOs on views (which are mandated by the SQL spec to not fire until
after all other constraints and checks are done), it makes much more
sense for RLS checks to happen earlier than constraint and uniqueness
checks.

This patch reworks the structure which holds the WCOs a bit to be
explicitly either VIEW or RLS checks and the RLS-related checks are
done prior to the constraint and uniqueness checks.  This also allows
better error reporting as we are now reporting when a violation is due
to a WITH CHECK OPTION and when it's due to an RLS policy violation,
which was independently noted by Craig Ringer as being confusing.

The documentation is also updated to include a paragraph about when RLS
WITH CHECK handling is performed, as there have been a number of
questions regarding that and the documentation was previously silent on
the matter.

Author: Dean Rasheed, with some kabitzing and comment changes by me.
2015-04-24 20:34:26 -04:00
Heikki Linnakangas 62420ae7d6 Move functions related to index maintenance to separate source file.
There is enough code here to deserve a file of their own, not be buried
in the middle of execUtils.c.
2015-04-24 09:33:23 +03:00
Andres Freund 62e2a8dc2c Define integer limits independently from the system definitions.
In 83ff1618 we defined integer limits iff they're not provided by the
system. That turns out not to be the greatest idea because there's
different ways some datatypes can be represented. E.g. on OSX PG's 64bit
datatype will be a 'long int', but OSX unconditionally uses 'long
long'. That disparity then can lead to warnings, e.g. around printf
formats.

One way to fix that would be to back int64 using stdint.h's
int64_t. While a good idea it's not that easy to implement. We would
e.g. need to include stdint.h in our external headers, which we don't
today. Also computing the correct int64 printf formats in that case is
nontrivial.

Instead simply prefix the integer limits with PG_ and define them
unconditionally. I've adjusted all the references to them in code, but
not the ones in comments; the latter seems unnecessary to me.

Discussion: 20150331141423.GK4878@alap3.anarazel.de
2015-04-02 17:43:35 +02:00
Andres Freund 83ff1618bc Centralize definition of integer limits.
Several submitted and even committed patches have run into the problem
that C89, our baseline, does not provide minimum/maximum values for
various integer datatypes. C99's stdint.h does, but we can't rely on
it.

Several parts of the code defined limits locally, so instead centralize
the definitions to c.h.

This patch also changes the more obvious usages of literal limit values;
there's more places that could be changed, but it's less clear whether
it's beneficial to change those.

Author: Andrew Gierth
Discussion: 87619tc5wc.fsf@news-spur.riddles.org.uk
2015-03-25 22:39:42 +01:00
Tom Lane 9fac5fd741 Move LockClauseStrength, LockWaitPolicy into new file nodes/lockoptions.h.
Commit df630b0dd5 moved enum LockWaitPolicy
into its very own header file utils/lockwaitpolicy.h, which does not seem
like a great idea from here.  First, it's still a node-related declaration,
and second, a file named like that can never sensibly be used for anything
else.  I do not think we want to encourage a one-typedef-per-header-file
approach.  The upcoming foreign table inheritance patch was doubling down
on this bad idea by moving enum LockClauseStrength into its *own*
can-never-be-used-for-anything-else file.  Instead, let's put them both in
a file named nodes/lockoptions.h.  (They do seem to need a separate header
file because we need them in both parsenodes.h and plannodes.h, and we
don't want either of those including the other.  Past practice might
suggest adding them to nodes/nodes.h, but they don't seem sufficiently
globally useful to justify that.)

Committed separately since there's no functional change here, just some
header-file refactoring.
2015-03-15 15:19:04 -04:00
Tom Lane 09d8d110a6 Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.
Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
Aside from being more self-documenting, this should help prevent bogus
warnings from static code analyzers and perhaps compiler misoptimizations.

This patch is just a down payment on eliminating the whole problem, but
it gets rid of a lot of easy-to-fix cases.

Note that the main problem with doing this is that one must no longer rely
on computing sizeof(the containing struct), since the result would be
compiler-dependent.  Instead use offsetof(struct, lastfield).  Autoconf
also warns against spelling that offsetof(struct, lastfield[0]).

Michael Paquier, review and additional fixes by me.
2015-02-20 00:11:42 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Tom Lane 447770404c Rearrange CustomScan API.
Make it work more like FDW plans do: instead of assuming that there are
expressions in a CustomScan plan node that the core code doesn't know
about, insist that all subexpressions that need planner attention be in
a "custom_exprs" list in the Plan representation.  (Of course, the
custom plugin can break the list apart again at executor initialization.)
This lets us revert the parts of the patch that exposed setrefs.c and
subselect.c processing to the outside world.

Also revert the GetSpecialCustomVar stuff in ruleutils.c; that concept
may work in future, but it's far from fully baked right now.
2014-11-21 18:21:46 -05:00
Tom Lane adbfab119b Remove dead code supporting mark/restore in SeqScan, TidScan, ValuesScan.
There seems no prospect that any of this will ever be useful, and indeed
it's questionable whether some of it would work if it ever got called;
it's certainly not been exercised in a very long time, if ever. So let's
get rid of it, and make the comments about mark/restore in execAmi.c less
wishy-washy.

The mark/restore support for Result nodes is also currently dead code,
but that's due to planner limitations not because it's impossible that
it could be useful.  So I left it in.
2014-11-20 20:20:54 -05:00
Tom Lane a34fa8ee7c Initial code review for CustomScan patch.
Get rid of the pernicious entanglement between planner and executor headers
introduced by commit 0b03e5951b.

Also, rearrange the CustomFoo struct/typedef definitions so that all the
typedef names are seen as used by the compiler.  Without this pgindent
will mess things up a bit, which is not so important perhaps, but it also
removes a bizarre discrepancy between the declaration arrangement used for
CustomExecMethods and that used for CustomScanMethods and
CustomPathMethods.

Clean up the commentary around ExecSupportsMarkRestore to reflect the
rather large change in its API.

Const-ify register_custom_path_provider's argument.  This necessitates
casting away const in the function, but that seems better than forcing
callers of the function to do so (or else not const-ify their method
pointer structs, which was sort of the whole point).

De-export fix_expr_common.  I don't like the exporting of fix_scan_expr
or replace_nestloop_params either, but this one surely has got little
excuse.
2014-11-20 18:36:07 -05:00
Tom Lane bf7ca15875 Ensure that RowExprs and whole-row Vars produce the expected column names.
At one time it wasn't terribly important what column names were associated
with the fields of a composite Datum, but since the introduction of
operations like row_to_json(), it's important that looking up the rowtype
ID embedded in the Datum returns the column names that users would expect.
That did not work terribly well before this patch: you could get the column
names of the underlying table, or column aliases from any level of the
query, depending on minor details of the plan tree.  You could even get
totally empty field names, which is disastrous for cases like row_to_json().

To fix this for whole-row Vars, look to the RTE referenced by the Var, and
make sure its column aliases are applied to the rowtype associated with
the result Datums.  This is a tad scary because we might have to return
a transient RECORD type even though the Var is declared as having some
named rowtype.  In principle it should be all right because the record
type will still be physically compatible with the named rowtype; but
I had to weaken one Assert in ExecEvalConvertRowtype, and there might be
third-party code containing similar assumptions.

Similarly, RowExprs have to be willing to override the column names coming
from a named composite result type and produce a RECORD when the column
aliases visible at the site of the RowExpr differ from the underlying
table's column names.

In passing, revert the decision made in commit 398f70ec07 to add
an alias-list argument to ExecTypeFromExprList: better to provide that
functionality in a separate function.  This also reverts most of the code
changes in d685814835, which we don't need because we're no longer
depending on the tupdesc found in the child plan node's result slot to be
blessed.

Back-patch to 9.4, but not earlier, since this solution changes the results
in some cases that users might not have realized were buggy.  We'll apply a
more restricted form of this patch in older branches.
2014-11-10 15:21:09 -05:00
Robert Haas 0b03e5951b Introduce custom path and scan providers.
This allows extension modules to define their own methods for
scanning a relation, and get the core code to use them.  It's
unclear as yet how much use this capability will find, but we
won't find out if we never commit it.

KaiGai Kohei, reviewed at various times and in various levels
of detail by Shigeru Hanada, Tom Lane, Andres Freund, Álvaro
Herrera, and myself.
2014-11-07 17:34:36 -05:00
Kevin Grittner 30d7ae3c76 Increase number of hash join buckets for underestimate.
If we expect batching at the very beginning, we size nbuckets for
"full work_mem" (see how many tuples we can get into work_mem,
while not breaking NTUP_PER_BUCKET threshold).

If we expect to be fine without batching, we start with the 'right'
nbuckets and track the optimal nbuckets as we go (without actually
resizing the hash table). Once we hit work_mem (considering the
optimal nbuckets value), we keep the value.

At the end of the first batch, we check whether (nbuckets !=
nbuckets_optimal) and resize the hash table if needed. Also, we
keep this value for all batches (it's OK because it assumes full
work_mem, and it makes the batchno evaluation trivial). So the
resize happens only once.

There could be cases where it would improve performance to allow
the NTUP_PER_BUCKET threshold to be exceeded to keep everything in
one batch rather than spilling to a second batch, but attempts to
generate such a case have so far been unsuccessful; that issue may
be addressed with a follow-on patch after further investigation.

Tomas Vondra with minor format and comment cleanup by me
Reviewed by Robert Haas, Heikki Linnakangas, and Kevin Grittner
2014-10-13 10:16:36 -05:00
Alvaro Herrera df630b0dd5 Implement SKIP LOCKED for row-level locks
This clause changes the behavior of SELECT locking clauses in the
presence of locked rows: instead of causing a process to block waiting
for the locks held by other processes (or raise an error, with NOWAIT),
SKIP LOCKED makes the new reader skip over such rows.  While this is not
appropriate behavior for general purposes, there are some cases in which
it is useful, such as queue-like tables.

Catalog version bumped because this patch changes the representation of
stored rules.

Reviewed by Craig Ringer (based on a previous attempt at an
implementation by Simon Riggs, who also provided input on the syntax
used in the current patch), David Rowley, and Álvaro Herrera.

Author: Thomas Munro
2014-10-07 17:23:34 -03:00
Heikki Linnakangas 45f6240a8f Pack tuples in a hash join batch densely, to save memory.
Instead of palloc'ing each HashJoinTuple individually, allocate 32kB chunks
and pack the tuples densely in the chunks. This avoids the AllocChunk
header overhead, and the space wasted by standard allocator's habit of
rounding sizes up to the nearest power of two.

This doesn't contain any planner changes, because the planner's estimate of
memory usage ignores the palloc overhead. Now that the overhead is smaller,
the planner's estimates are in fact more accurate.

Tomas Vondra, reviewed by Robert Haas.
2014-09-10 21:24:52 +03:00
Alvaro Herrera 1c9701cfe5 Fix FOR UPDATE NOWAIT on updated tuple chains
If SELECT FOR UPDATE NOWAIT tries to lock a tuple that is concurrently
being updated, it might fail to honor its NOWAIT specification and block
instead of raising an error.

Fix by adding a no-wait flag to EvalPlanQualFetch which it can pass down
to heap_lock_tuple; also use it in EvalPlanQualFetch itself to avoid
blocking while waiting for a concurrent transaction.

Authors: Craig Ringer and Thomas Munro, tweaked by Álvaro
http://www.postgresql.org/message-id/51FB6703.9090801@2ndquadrant.com

Per Thomas Munro in the course of his SKIP LOCKED feature submission,
who also provided one of the isolation test specs.

Backpatch to 9.4, because that's as far back as it applies without
conflicts (although the bug goes all the way back).  To that branch also
backpatch Thomas Munro's new NOWAIT test cases, committed in master by
Heikki as commit 9ee16b49f0 .
2014-08-27 19:15:18 -04:00
Tom Lane 45b0f35723 Avoid leaking memory while evaluating arguments for a table function.
ExecMakeTableFunctionResult evaluated the arguments for a function-in-FROM
in the query-lifespan memory context.  This is insignificant in simple
cases where the function relation is scanned only once; but if the function
is in a sub-SELECT or is on the inside of a nested loop, any memory
consumed during argument evaluation can add up quickly.  (The potential for
trouble here had been foreseen long ago, per existing comments; but we'd
not previously seen a complaint from the field about it.)  To fix, create
an additional temporary context just for this purpose.

Per an example from MauMau.  Back-patch to all active branches.
2014-06-19 22:14:26 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Tom Lane 3d13623d75 Prevent leakage of SPI tuple tables during subtransaction abort.
plpgsql often just remembers SPI-result tuple tables in local variables,
and has no mechanism for freeing them if an ereport(ERROR) causes an escape
out of the execution function whose local variable it is.  In the original
coding, that wasn't a problem because the tuple table would be cleaned up
when the function's SPI context went away during transaction abort.
However, once plpgsql grew the ability to trap exceptions, repeated
trapping of errors within a function could result in significant
intra-function-call memory leakage, as illustrated in bug #8279 from
Chad Wagner.

We could fix this locally in plpgsql with a bunch of PG_TRY/PG_CATCH
coding, but that would be tedious, probably slow, and prone to bugs of
omission; moreover it would do nothing for similar risks elsewhere.
What seems like a better plan is to make SPI itself responsible for
freeing tuple tables at subtransaction abort.  This patch attacks the
problem that way, keeping a list of live tuple tables within each SPI
function context.  Currently, such freeing is automatic for tuple tables
made within the failed subtransaction.  We might later add a SPI call to
mark a tuple table as not to be freed this way, allowing callers to opt
out; but until someone exhibits a clear use-case for such behavior, it
doesn't seem worth bothering.

A very useful side-effect of this change is that SPI_freetuptable() can
now defend itself against bad calls, such as duplicate free requests;
this should make things more robust in many places.  (In particular,
this reduces the risks involved if a third-party extension contains
now-redundant SPI_freetuptable() calls in error cleanup code.)

Even though the leakage problem is of long standing, it seems imprudent
to back-patch this into stable branches, since it does represent an API
semantics change for SPI users.  We'll patch this in 9.3, but live with
the leakage in older branches.
2013-07-25 16:46:14 -04:00
Stephen Frost 4cbe3ac3e8 WITH CHECK OPTION support for auto-updatable VIEWs
For simple views which are automatically updatable, this patch allows
the user to specify what level of checking should be done on records
being inserted or updated.  For 'LOCAL CHECK', new tuples are validated
against the conditionals of the view they are being inserted into, while
for 'CASCADED CHECK' the new tuples are validated against the
conditionals for all views involved (from the top down).

This option is part of the SQL specification.

Dean Rasheed, reviewed by Pavel Stehule
2013-07-18 17:10:16 -04:00
Tom Lane 5194024d72 Incidental cleanup of matviews code.
Move checking for unscannable matviews into ExecOpenScanRelation, which is
a better place for it first because the open relation is already available
(saving a relcache lookup cycle), and second because this eliminates the
problem of telling the difference between rangetable entries that will or
will not be scanned by the query.  In particular we can get rid of the
not-terribly-well-thought-out-or-implemented isResultRel field that the
initial matviews patch added to RangeTblEntry.

Also get rid of entirely unnecessary scannability check in the rewriter,
and a bogus decision about whether RefreshMatViewStmt requires a parse-time
snapshot.

catversion bump due to removal of a RangeTblEntry field, which changes
stored rules.
2013-04-27 17:48:57 -04:00
Kevin Grittner 3bf3ab8c56 Add a materialized view relations.
A materialized view has a rule just like a view and a heap and
other physical properties like a table.  The rule is only used to
populate the table, references in queries refer to the
materialized data.

This is a minimal implementation, but should still be useful in
many cases.  Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed.  At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.

Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.
2013-03-03 18:23:31 -06:00
Tom Lane 0900ac2d0d Fix plpgsql's reporting of plan-time errors in possibly-simple expressions.
exec_simple_check_plan and exec_eval_simple_expr attempted to call
GetCachedPlan directly.  This meant that if an error was thrown during
planning, the resulting context traceback would not include the line
normally contributed by _SPI_error_callback.  This is already inconsistent,
but just to be really odd, a re-execution of the very same expression
*would* show the additional context line, because we'd already have cached
the plan and marked the expression as non-simple.

The problem is easy to demonstrate in 9.2 and HEAD because planning of a
cached plan doesn't occur at all until GetCachedPlan is done.  In earlier
versions, it could only be an issue if initial planning had succeeded, then
a replan was forced (already somewhat improbable for a simple expression),
and the replan attempt failed.  Since the issue is mainly cosmetic in older
branches anyway, it doesn't seem worth the risk of trying to fix it there.
It is worth fixing in 9.2 since the instability of the context printout can
affect the results of GET STACKED DIAGNOSTICS, as per a recent discussion
on pgsql-novice.

To fix, introduce a SPI function that wraps GetCachedPlan while installing
the correct callback function.  Use this instead of calling GetCachedPlan
directly from plpgsql.

Also introduce a wrapper function for extracting a SPI plan's
CachedPlanSource list.  This lets us stop including spi_priv.h in
pl_exec.c, which was never a very good idea from a modularity standpoint.

In passing, fix a similar inconsistency that could occur in SPI_cursor_open,
which was also calling GetCachedPlan without setting up a context callback.
2013-01-30 20:02:23 -05:00
Alvaro Herrera 0ac5ad5134 Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE".  These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE".  UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.

Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.

The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid.  Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates.  This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed.  pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.

Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header.  This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.

Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)

With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.

As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.

Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane.  There's probably room for several more tests.

There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it.  Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.

This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
	AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
	1290721684-sup-3951@alvh.no-ip.org
	1294953201-sup-2099@alvh.no-ip.org
	1320343602-sup-2290@alvh.no-ip.org
	1339690386-sup-8927@alvh.no-ip.org
	4FE5FF020200002500048A3D@gw.wicourts.gov
	4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 12:04:59 -03:00
Tom Lane 94afbd5831 Invent a "one-shot" variant of CachedPlans for better performance.
SPI_execute() and related functions create a CachedPlan, execute it once,
and immediately discard it, so that the functionality offered by
plancache.c is of no value in this code path.  And performance measurements
show that the extra data copying and invalidation checking done by
plancache.c slows down simple queries by 10% or more compared to 9.1.
However, enough of the SPI code is shared with functions that do need plan
caching that it seems impractical to bypass plancache.c altogether.
Instead, let's invent a variant version of cached plans that preserves
99% of the API but doesn't offer any of the actual functionality, nor the
overhead.  This puts SPI_execute() performance back on par, or maybe even
slightly better, than it was before.  This change should resolve recent
complaints of performance degradation from Dong Ye, Pavel Stehule, and
others.

By avoiding data copying, this change also reduces the amount of memory
needed to execute many-statement SPI_execute() strings, as for instance in
a recent complaint from Tomas Vondra.

An additional benefit of this change is that multi-statement SPI_execute()
query strings are now processed fully serially, that is we complete
execution of earlier statements before running parse analysis and planning
on following ones.  This eliminates a long-standing POLA violation, in that
DDL that affects the behavior of a later statement will now behave as
expected.

Back-patch to 9.2, since this was a performance regression compared to 9.1.
(In 9.2, place the added struct fields so as to avoid changing the offsets
of existing fields.)

Heikki Linnakangas and Tom Lane
2013-01-04 17:42:19 -05:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Alvaro Herrera c219d9b0a5 Split tuple struct defs from htup.h to htup_details.h
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.

I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h.  In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
2012-08-30 16:52:35 -04:00
Bruce Momjian 927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Tom Lane 1dd89eadcd Rename I/O timing statistics columns to blk_read_time and blk_write_time.
This seems more consistent with the pre-existing choices for names of
other statistics columns.  Rename assorted internal identifiers to match.
2012-04-29 18:13:33 -04:00
Robert Haas 40b9b95769 New GUC, track_iotiming, to track I/O timings.
Currently, the only way to see the numbers this gathers is via
EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through
the stats collector and pg_stat_statements in subsequent patches.

Ants Aasma, reviewed by Greg Smith, with some further changes by me.
2012-03-27 14:55:02 -04:00
Tom Lane 9dbf2b7d75 Restructure SELECT INTO's parsetree representation into CreateTableAsStmt.
Making this operation look like a utility statement seems generally a good
idea, and particularly so in light of the desire to provide command
triggers for utility statements.  The original choice of representing it as
SELECT with an IntoClause appendage had metastasized into rather a lot of
places, unfortunately, so that this patch is a great deal more complicated
than one might at first expect.

In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS
subcommands required restructuring some EXPLAIN-related APIs.  Add-on code
that calls ExplainOnePlan or ExplainOneUtility, or uses
ExplainOneQuery_hook, will need adjustment.

Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO,
which formerly were accepted though undocumented, are no longer accepted.
The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE.
The CREATE RULE case doesn't seem to have much real-world use (since the
rule would work only once before failing with "table already exists"),
so we'll not bother with that one.

Both SELECT INTO and CREATE TABLE AS still return a command tag of
"SELECT nnnn".  There was some discussion of returning "CREATE TABLE nnnn",
but for the moment backwards compatibility wins the day.

Andres Freund and Tom Lane
2012-03-19 21:38:12 -04:00
Robert Haas 2254367435 Make EXPLAIN (BUFFERS) track blocks dirtied, as well as those written.
Also expose the new counters through pg_stat_statements.

Patch by me.  Review by Fujii Masao and Greg Smith.
2012-02-22 20:33:05 -05:00
Tom Lane 398f70ec07 Preserve column names in the execution-time tupledesc for a RowExpr.
The hstore and json datatypes both have record-conversion functions that
pay attention to column names in the composite values they're handed.
We used to not worry about inserting correct field names into tuple
descriptors generated at runtime, but given these examples it seems
useful to do so.  Observe the nicer-looking results in the regression
tests whose results changed.

catversion bump because there is a subtle change in requirements for stored
rule parsetrees: RowExprs from ROW() constructs now have to include field
names.

Andrew Dunstan and Tom Lane
2012-02-14 17:34:56 -05:00
Robert Haas af7914c662 Add TIMING option to EXPLAIN, to allow eliminating of timing overhead.
Sometimes it may be useful to get actual row counts out of EXPLAIN
(ANALYZE) without paying the cost of timing every node entry/exit.
With this patch, you can say EXPLAIN (ANALYZE, TIMING OFF) to get that.

Tomas Vondra, reviewed by Eric Theise, with minor doc changes by me.
2012-02-07 11:23:04 -05:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Tom Lane a0185461dd Rearrange the implementation of index-only scans.
This commit changes index-only scans so that data is read directly from the
index tuple without first generating a faux heap tuple.  The only immediate
benefit is that indexes on system columns (such as OID) can be used in
index-only scans, but this is necessary infrastructure if we are ever to
support index-only scans on expression indexes.  The executor is now ready
for that, though the planner still needs substantial work to recognize
the possibility.

To do this, Vars in index-only plan nodes have to refer to index columns
not heap columns.  I introduced a new special varno, INDEX_VAR, to mark
such Vars to avoid confusion.  (In passing, this commit renames the two
existing special varnos to OUTER_VAR and INNER_VAR.)  This allows
ruleutils.c to handle them with logic similar to what we use for subplan
reference Vars.

Since index-only scans are now fundamentally different from regular
indexscans so far as their expression subtrees are concerned, I also chose
to change them to have their own plan node type (and hence, their own
executor source file).
2011-10-11 14:21:30 -04:00
Tom Lane f197272365 Make EXPLAIN ANALYZE report the numbers of rows rejected by filter steps.
This provides information about the numbers of tuples that were visited
but not returned by table scans, as well as the numbers of join tuples
that were considered and discarded within a join plan node.

There is still some discussion going on about the best way to report counts
for outer-join situations, but I think most of what's in the patch would
not change if we revise that, so I'm going to go ahead and commit it as-is.

Documentation changes to follow (they weren't in the submitted patch
either).

Marko Tiikkaja, reviewed by Marc Cousin, somewhat revised by Tom
2011-09-22 11:30:11 -04:00
Tom Lane e6faf910d7 Redesign the plancache mechanism for more flexibility and efficiency.
Rewrite plancache.c so that a "cached plan" (which is rather a misnomer
at this point) can support generation of custom, parameter-value-dependent
plans, and can make an intelligent choice between using custom plans and
the traditional generic-plan approach.  The specific choice algorithm
implemented here can probably be improved in future, but this commit is
all about getting the mechanism in place, not the policy.

In addition, restructure the API to greatly reduce the amount of extraneous
data copying needed.  The main compromise needed to make that possible was
to split the initial creation of a CachedPlanSource into two steps.  It's
worth noting in particular that SPI_saveplan is now deprecated in favor of
SPI_keepplan, which accomplishes the same end result with zero data
copying, and no need to then spend even more cycles throwing away the
original SPIPlan.  The risk of long-term memory leaks while manipulating
SPIPlans has also been greatly reduced.  Most of this improvement is based
on use of the recently-added MemoryContextSetParent primitive.
2011-09-16 00:43:52 -04:00
Tom Lane 1609797c25 Clean up the #include mess a little.
walsender.h should depend on xlog.h, not vice versa.  (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.)  Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h.  Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.

This episode reinforces my feeling that pgrminclude should not be run
without adult supervision.  Inclusion changes in header files in particular
need to be reviewed with great care.  More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.
2011-09-04 01:13:16 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Tom Lane 299d171652 Install defenses against overflow in BuildTupleHashTable().
The planner can sometimes compute very large values for numGroups, and in
cases where we have no alternative to building a hashtable, such a value
will get fed directly to BuildTupleHashTable as its nbuckets parameter.
There were two ways in which that could go bad.  First, BuildTupleHashTable
declared the parameter as "int" but most callers were passing "long"s,
so on 64-bit machines undetected overflow could occur leading to a bogus
negative value.  The obvious fix for that is to change the parameter to
"long", which is what I've done in HEAD.  In the back branches that seems a
bit risky, though, since third-party code might be calling this function.
So for them, just put in a kluge to treat negative inputs as INT_MAX.
Second, hash_create can go nuts with extremely large requested table sizes
(notably, my_log2 becomes an infinite loop for inputs larger than
LONG_MAX/2).  What seems most appropriate to avoid that is to bound the
initial table size request to work_mem.

This fixes bug #6035 reported by Daniel Schreiber.  Although the reported
case only occurs back to 8.4 since it involves WITH RECURSIVE, I think
it's a good idea to install the defenses in all supported branches.
2011-05-23 12:52:46 -04:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane 27dc7e240b Fix handling of collation in SQL-language functions.
Ensure that parameter symbols receive collation from the function's
resolved input collation, and fix inlining to behave properly.

BTW, this commit lays about 90% of the infrastructure needed to support
use of argument names in SQL functions.  Parsing of parameters is now
done via the parser-hook infrastructure ... we'd just need to supply
a column-ref hook ...
2011-03-24 20:30:23 -04:00
Tom Lane a874fe7b4c Refactor the executor's API to support data-modifying CTEs better.
The originally committed patch for modifying CTEs didn't interact well
with EXPLAIN, as noted by myself, and also had corner-case problems with
triggers, as noted by Dean Rasheed.  Those problems show it is really not
practical for ExecutorEnd to call any user-defined code; so split the
cleanup duties out into a new function ExecutorFinish, which must be called
between the last ExecutorRun call and ExecutorEnd.  Some Asserts have been
added to these functions to help verify correct usage.

It is no longer necessary for callers of the executor to call
AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now
done by ExecutorStart/ExecutorFinish respectively.  If you really need to
suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to
ExecutorStart.

Also, refactor portal commit processing to allow for the possibility that
PortalDrop will invoke user-defined code.  I think this is not actually
necessary just yet, since the portal-execution-strategy logic forces any
non-pure-SELECT query to be run to completion before we will consider
committing.  But it seems like good future-proofing.
2011-02-27 13:44:12 -05:00
Tom Lane 389af95155 Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH.
This patch implements data-modifying WITH queries according to the
semantics that the updates all happen with the same command counter value,
and in an unspecified order.  Therefore one WITH clause can't see the
effects of another, nor can the outer query see the effects other than
through the RETURNING values.  And attempts to do conflicting updates will
have unpredictable results.  We'll need to document all that.

This commit just fixes the code; documentation updates are waiting on
author.

Marko Tiikkaja and Hitoshi Harada
2011-02-25 18:58:02 -05:00
Tom Lane 2e852e541c Remove ExecRemoveJunk(), which is no longer used anywhere.
This was a leftover from the pre-8.1 design of junkfilters.  It doesn't
seem to have any reason to live, since it's merely a combination of two
easy function calls, and not a well-designed combination at that (it
encourages callers to leak the result tuple).
2011-02-21 21:41:08 -05:00
Tom Lane bb74240794 Implement an API to let foreign-data wrappers actually be functional.
This commit provides the core code and documentation needed.  A contrib
module test case will follow shortly.

Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
2011-02-20 00:18:14 -05:00
Tom Lane d487afbb81 Fix PlanRowMark/ExecRowMark structures to handle inheritance correctly.
In an inherited UPDATE/DELETE, each target table has its own subplan,
because it might have a column set different from other targets.  This
means that the resjunk columns we add to support EvalPlanQual might be
at different physical column numbers in each subplan.  The EvalPlanQual
rewrite I did for 9.0 failed to account for this, resulting in possible
misbehavior or even crashes during concurrent updates to the same row,
as seen in a recent report from Gordon Shannon.  Revise the data structure
so that we track resjunk column numbers separately for each subplan.

I also chose to move responsibility for identifying the physical column
numbers back to executor startup, instead of assuming that numbers derived
during preprocess_targetlist would stay valid throughout subsequent
massaging of the plan.  That's a bit slower, so we might want to consider
undoing it someday; but it would complicate the patch considerably and
didn't seem justifiable in a bug fix that has to be back-patched to 9.0.
2011-01-12 20:47:02 -05:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Tom Lane 7b46401557 Move symbols for ExecMergeJoin's state machine into nodeMergejoin.c.
There's no reason for these values to be known anywhere else.  After
doing this, executor/execdefs.h is vestigial and can be removed.
2010-12-30 22:12:40 -05:00
Tom Lane f4e4b32743 Support RIGHT and FULL OUTER JOIN in hash joins.
This is advantageous first because it allows us to hash the smaller table
regardless of the outer-join type, and second because hash join can be more
flexible than merge join in dealing with arbitrary join quals in a FULL
join.  For merge join all the join quals have to be mergejoinable, but hash
join will work so long as there's at least one hashjoinable qual --- the
others can be any condition.  (This is true essentially because we don't
keep per-inner-tuple match flags in merge join, while hash join can do so.)

To do this, we need a has-it-been-matched flag for each tuple in the
hashtable, not just one for the current outer tuple.  The key idea that
makes this practical is that we can store the match flag in the tuple's
infomask, since there are lots of bits there that are of no interest for a
MinimalTuple.  So we aren't increasing the size of the hashtable at all for
the feature.

To write this without turning the hash code into even more of a pile of
spaghetti than it already was, I rewrote ExecHashJoin in a state-machine
style, similar to ExecMergeJoin.  Other than that decision, it was pretty
straightforward.
2010-12-30 20:26:08 -05:00
Tom Lane 37b61a69f3 Fix failure of executor/hashjoin.h to compile standalone.
Noted while experimenting with cpluspluscheck.
2010-12-27 12:20:09 -05:00
Tom Lane d583f10b7e Create core infrastructure for KNNGIST.
This is a heavily revised version of builtin_knngist_core-0.9.  The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.

Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too.  Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values.  AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.

Teodor Sigaev and Tom Lane
2010-12-02 20:51:37 -05:00
Tom Lane 11cad29c91 Support MergeAppend plans, to allow sorted output from append relations.
This patch eliminates the former need to sort the output of an Append scan
when an ordered scan of an inheritance tree is wanted.  This should be
particularly useful for fast-start cases such as queries with LIMIT.

Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig,
Robert Haas, and Tom Lane.
2010-10-14 16:57:57 -04:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Robert Haas b8c6c71d1c Centralize DML permissions-checking logic.
Remove bespoke code in DoCopy and RI_Initial_Check, which now instead
fabricate call ExecCheckRTPerms with a manufactured RangeTblEntry.
This is intended to make it feasible for an enhanced security provider
to actually make use of ExecutorCheckPerms_hook, but also has the
advantage that RI_Initial_Check can allow use of the fast-path when
column-level but not table-level permissions are present.

KaiGai Kohei.  Reviewed (in an earlier version) by Stephen Frost, and by me.
Some further changes to the comments by me.
2010-07-22 00:47:59 +00:00
Tom Lane 53e757689c Make NestLoop plan nodes pass outer-relation variables into their inner
relation using the general PARAM_EXEC executor parameter mechanism, rather
than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
The previous method was hard to understand and could never be extended to
handle parameters coming from multiple join levels.  This patch doesn't
change the set of possible plans nor have any significant performance effect,
but it's necessary infrastructure for future generalization of the concept
of an inner indexscan plan.

ExecReScan's second parameter is now unused, so it's removed.
2010-07-12 17:01:06 +00:00
Robert Haas f4122a8d50 Add a hook in ExecCheckRTPerms().
This hook allows a loadable module to gain control when table permissions
are checked.  It is expected to be used by an eventual SE-PostgreSQL
implementation, but there are other possible applications as well.  A
sample contrib module can be found in the archives at:

http://archives.postgresql.org/pgsql-hackers/2010-05/msg01095.php

Robert Haas and Stephen Frost
2010-07-09 14:06:01 +00:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Tom Lane 0a469c8769 Remove old-style VACUUM FULL (which was known for a little while as
VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
Per discussion, the use case for this method of vacuuming is no longer large
enough to justify maintaining it; not to mention that we don't wish to invest
the work that would be needed to make it play nicely with Hot Standby.

Aside from the code directly related to old-style VACUUM FULL, this commit
removes support for certain WAL record types that could only be generated
within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
nontransactional generation of cache invalidation sinval messages (the last
being the sticking point for Hot Standby).

We still have to retain all code that copes with finding HEAP_MOVED_OFF and
HEAP_MOVED_IN flag bits on existing tuples.  This can't be removed as long
as we want to support in-place update from pre-9.0 databases.
2010-02-08 04:33:55 +00:00
Robert Haas 42a8ab0a14 Augment EXPLAIN output with more details on Hash nodes.
We show the number of buckets, the number of batches (and also the original
number if it has changed), and the peak space used by the hash table.  Minor
executor changes to track peak space used.
2010-02-01 15:43:36 +00:00
Itagaki Takahiro 83a5a338bf pgBufferUsage needs PGDLLIMPORT for pg_stat_statements on Windows. 2010-01-08 00:48:56 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Robert Haas cddca5ec13 Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics.
This patch also removes buffer-usage statistics from the track_counts
output, since this (or the global server statistics) is deemed to be a better
interface to this information.

Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
2009-12-15 04:57:48 +00:00
Tom Lane a620d5005d Fix a bug introduced when set-returning SQL functions were made inline-able:
we have to cope with the possibility that the declared result rowtype contains
dropped columns.  This fails in 8.4, as per bug #5240.

While at it, be more paranoid about inserting binary coercions when inlining.
The pre-8.4 code did not really need to worry about that because it could not
inline at all in any case where an added coercion could change the behavior
of the function's statement.  However, when inlining a SRF we allow sorting,
grouping, and set-ops such as UNION.  In these cases, modifying one of the
targetlist entries that the sort/group/setop depends on could conceivably
change the behavior of the function's statement --- so don't inline when
such a case applies.
2009-12-14 02:15:54 +00:00
Tom Lane 0cb65564e5 Add exclusion constraints, which generalize the concept of uniqueness to
support any indexable commutative operator, not just equality.  Two rows
violate the exclusion constraint if "row1.col OP row2.col" is TRUE for
each of the columns in the constraint.

Jeff Davis, reviewed by Robert Haas
2009-12-07 05:22:23 +00:00
Tom Lane 9bedd128d6 Add support for invoking parser callback hooks via SPI and in cached plans.
As proof of concept, modify plpgsql to use the hooks.  plpgsql is still
inserting $n symbols textually, but the "back end" of the parsing process now
goes through the ParamRef hook instead of using a fixed parameter-type array,
and then execution only fetches actually-referenced parameters, using a hook
added to ParamListInfo.

Although there's a lot left to be done in plpgsql, this already cures the
"if (TG_OP = 'INSERT' and NEW.foo ...)"  problem, as illustrated by the
changed regression test.
2009-11-04 22:26:08 +00:00
Tom Lane 9f2ee8f287 Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases.  We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries.  If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row.  The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.

Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested.  To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param.  Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.

This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE.  This is needed to avoid the
duplicate-output-tuple problem.  It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 02:26:45 +00:00
Tom Lane 0adaf4cb31 Move the handling of SELECT FOR UPDATE locking and rechecking out of
execMain.c and into a new plan node type LockRows.  Like the recent change
to put table updating into a ModifyTable plan node, this increases planning
flexibility by allowing the operations to occur below the top level of the
plan tree.  It's necessary in any case to restore the previous behavior of
having FOR UPDATE locking occur before ModifyTable does.

This partially refactors EvalPlanQual to allow multiple rows-under-test
to be inserted into the EPQ machinery before starting an EPQ test query.
That isn't sufficient to fix EPQ's general bogosity in the face of plans
that return multiple rows per test row, though.  Since this patch is
mostly about getting some plan node infrastructure in place and not about
fixing ten-year-old bugs, I will leave EPQ improvements for another day.

Another behavioral change that we could now think about is doing FOR UPDATE
before LIMIT, but that too seems like it should be treated as a followon
patch.
2009-10-12 18:10:51 +00:00
Tom Lane 8a5849b7ff Split the processing of INSERT/UPDATE/DELETE operations out of execMain.c.
They are now handled by a new plan node type called ModifyTable, which is
placed at the top of the plan tree.  In itself this change doesn't do much,
except perhaps make the handling of RETURNING lists and inherited UPDATEs a
tad less klugy.  But it is necessary preparation for the intended extension of
allowing RETURNING queries inside WITH.

Marko Tiikkaja
2009-10-10 01:43:50 +00:00
Tom Lane c970292a94 Remove very ancient tuple-counting infrastructure (IncrRetrieved() and
friends).  This code has all been ifdef'd out for many years, and doesn't
seem to have any prospect of becoming any more useful in the future.
EXPLAIN ANALYZE is what people use in practice, and I think if we did want
process-wide counters we'd be more likely to put in dtrace events for that
than try to resurrect this code.  Get rid of it so as to have one less detail
to worry about while refactoring execMain.c.
2009-10-08 22:34:57 +00:00
Tom Lane 421d7d8edb Remove no-longer-needed ExecCountSlots infrastructure. 2009-09-27 21:10:53 +00:00
Tom Lane f92e8a4b5e Replace the array-style TupleTable data structure with a simple List of
TupleTableSlot nodes.  This eliminates the need to count in advance
how many Slots will be needed, which seems more than worth the small
increase in the amount of palloc traffic during executor startup.

The ExecCountSlots infrastructure is now all dead code, but I'll remove it
in a separate commit for clarity.

Per a comment from Robert Haas.
2009-09-27 20:09:58 +00:00
Tom Lane 9bb342811b Rewrite the planner's handling of materialized plan types so that there is
an explicit model of rescan costs being different from first-time costs.
The costing of Material nodes in particular now has some visible relationship
to the actual runtime behavior, where before it was essentially fantasy.
This also fixes up a couple of places where different materialized plan types
were treated differently for no very good reason (probably just oversights).

A couple of the regression tests are affected, because the planner now chooses
to put the other relation on the inside of a nestloop-with-materialize.
So far as I can see both changes are sane, and the planner is now more
consistently following the expectation that it should prefer to materialize
the smaller of two relations.

Per a recent discussion with Robert Haas.
2009-09-12 22:12:09 +00:00
Tom Lane 25d9bf2e3e Support deferrable uniqueness constraints.
The current implementation fires an AFTER ROW trigger for each tuple that
looks like it might be non-unique according to the index contents at the
time of insertion.  This works well as long as there aren't many conflicts,
but won't scale to massive unique-key reassignments.  Improving that case
is a TODO item.

Dean Rasheed
2009-07-29 20:56:21 +00:00
Tom Lane 846c364dd4 Change do_tup_output() to take Datum/isnull arrays instead of a char * array,
so it doesn't go through BuildTupleFromCStrings.  This is more or less a
wash for current uses, but will avoid inefficiency for planned changes to
EXPLAIN.

Robert Haas
2009-07-22 17:00:23 +00:00
Tom Lane 011eae60ef Fix error cleanup failure caused by 8.4 changes in plpgsql to try to avoid
memory leakage in error recovery.  We were calling FreeExprContext, and
therefore invoking ExprContextCallback callbacks, in both normal and error
exits from subtransactions.  However this isn't very safe, as shown in
recent trouble report from Frank van Vugt, in which releasing a tupledesc
refcount failed.  It's also unnecessary, since the resources that callbacks
might wish to release should be cleaned up by other error recovery mechanisms
(ie the resource owners).  We only really want FreeExprContext to release
memory attached to the exprcontext in the error-exit case.  So, add a bool
parameter to FreeExprContext to tell it not to call the callbacks.

A more general solution would be to pass the isCommit bool parameter on to
the callbacks, so they could do only safe things during error exit.  But
that would make the patch significantly more invasive and possibly break
third-party code that registers ExprContextCallback callbacks.  We might want
to do that later in HEAD, but for now I'll just do what seems reasonable to
back-patch.
2009-07-18 19:15:42 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Tom Lane 793d5662e8 Fix an oversight in the support for storing/retrieving "minimal tuples" in
TupleTableSlots.  We have functions for retrieving a minimal tuple from a slot
after storing a regular tuple in it, or vice versa; but these were implemented
by converting the internal storage from one format to the other.  The problem
with that is it invalidates any pass-by-reference Datums that were already
fetched from the slot, since they'll be pointing into the just-freed version
of the tuple.  The known problem cases involve fetching both a whole-row
variable and a pass-by-reference value from a slot that is fed from a
tuplestore or tuplesort object.  The added regression tests illustrate some
simple cases, but there may be other failure scenarios traceable to the same
bug.  Note that the added tests probably only fail on unpatched code if it's
built with --enable-cassert; otherwise the bug leads to fetching from freed
memory, which will not have been overwritten without additional conditions.

Fix by allowing a slot to contain both formats simultaneously; which turns out
not to complicate the logic much at all, if anything it seems less contorted
than before.

Back-patch to 8.2, where minimal tuples were introduced.
2009-03-30 04:08:43 +00:00
Tom Lane 596efd27ed Optimize multi-batch hash joins when the outer relation has a nonuniform
distribution, by creating a special fast path for the (first few) most common
values of the outer relation.  Tuples having hashvalues matching the MCVs
are effectively forced to be in the first batch, so that we never write
them out to the batch temp files.

Bryce Cutt and Ramon Lawrence, with some editorialization by me.
2009-03-21 00:04:40 +00:00
Heikki Linnakangas 94136d5a18 Add new SPI_OK_REWRITTEN return code to SPI_execute and friends, for the
case that the command is rewritten into another type of command. The old
behavior to return the command tag of the last executed command was
pretty surprising. In PL/pgSQL, for example, it meant that if a command
was rewritten to a utility statement, FOUND wasn't set at all.
2009-01-21 11:02:40 +00:00
Tom Lane deac9488d3 Insert conditional SPI_push/SPI_pop calls into InputFunctionCall,
OutputFunctionCall, and friends.  This allows SPI-using functions to invoke
datatype I/O without concern for the possibility that a SPI-using function
will be called (which could be either the I/O function itself, or a function
used in a domain check constraint).  It's a tad ugly, but not nearly as ugly
as what'd be needed to make this work via retail insertion of push/pop
operations in all the PLs.

This reverts my patch of 2007-01-30 that inserted some retail SPI_push/pop
calls into plpgsql; that approach only fixed plpgsql, and not any other PLs.
But the other PLs have the issue too, as illustrated by a recent gripe from
Christian Schröder.

Back-patch to 8.2, which is as far back as this solution will work.  It's
also as far back as we need to worry about the domain-constraint case, since
earlier versions did not attempt to check domain constraints within datatype
input.  I'm not aware of any old I/O functions that use SPI themselves, so
this should be sufficient for a back-patch.
2009-01-07 20:38:56 +00:00
Tom Lane 1cfd9e8834 Fix executor/spi.h to follow our usual conventions for include files, ie,
not include postgres.h nor anything else it doesn't directly need.  Add
#includes to calling files as needed to compensate.  Per my proposal of
yesterday.

This should be noted as a source code change in the 8.4 release notes,
since it's likely to require changes in add-on modules.
2009-01-07 13:44:37 +00:00
Tom Lane bbeb0bbf6b Include a pointer to the query's source text in QueryDesc structs. This is
practically free given prior 8.4 changes in plancache and portal management,
and it makes it a lot easier for ExecutorStart/Run/End hooks to get at the
query text.  Extracted from Itagaki Takahiro's pg_stat_statements patch,
with minor editorialization.
2009-01-02 20:42:00 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Tom Lane 95b07bc7f5 Support window functions a la SQL:2008.
Hitoshi Harada, with some kibitzing from Heikki and Tom.
2008-12-28 18:54:01 +00:00
Tom Lane ec543db77b Ensure that the contents of a holdable cursor don't depend on out-of-line
toasted values, since those could get dropped once the cursor's transaction
is over.  Per bug #4553 from Andrew Gierth.

Back-patch as far as 8.1.  The bug actually exists back to 7.4 when holdable
cursors were introduced, but this patch won't work before 8.1 without
significant adjustments.  Given the lack of field complaints, it doesn't seem
worth the work (and risk of introducing new bugs) to try to make a patch for
the older branches.
2008-12-01 17:06:21 +00:00
Tom Lane c1f3073333 Clean up the API for DestReceiver objects by eliminating the assumption
that a Portal is a useful and sufficient additional argument for
CreateDestReceiver --- it just isn't, in most cases.  Instead formalize
the approach of passing any needed parameters to the receiver separately.

One unexpected benefit of this change is that we can declare typedef Portal
in a less surprising location.

This patch is just code rearrangement and doesn't change any functionality.
I'll tackle the HOLD-cursor-vs-toast problem in a follow-on patch.
2008-11-30 20:51:25 +00:00
Tom Lane cd35e9d746 Some infrastructure changes for the upcoming auto-explain contrib module:
* Refactor explain.c slightly to export a convenient-to-use subroutine
for printing EXPLAIN results.

* Provide hooks for plugins to get control at ExecutorStart and ExecutorEnd
as well as ExecutorRun.

* Add some minimal support for tracking the total runtime of ExecutorRun.
This code won't actually do anything unless a plugin prods it to.

* Change the API of the DefineCustomXXXVariable functions to allow nonzero
"flags" to be specified for a custom GUC variable.  While at it, also make
the "bootstrap" default value for custom GUCs be explicitly specified as a
parameter to these functions.  This is to eliminate confusion over where the
default comes from, as has been expressed in the past by some users of the
custom-variable facility.

* Refactor GUC code a bit to ensure that a custom variable gets initialized to
something valid (like its default value) even if the placeholder value was
invalid.
2008-11-19 01:10:24 +00:00
Tom Lane df5a99612d Simplify ExecutorRun's API and save some trivial number of cycles by having
it just return void instead of sometimes returning a TupleTableSlot.  SQL
functions don't need that anymore, and noplace else does either.  Eliminating
the return value also means one less hassle for the ExecutorRun hook functions
that will be supported beginning in 8.4.
2008-10-31 21:07:55 +00:00
Tom Lane 9b46abb7c4 Allow SQL-language functions to return the output of an INSERT/UPDATE/DELETE
RETURNING clause, not just a SELECT as formerly.

A side effect of this patch is that when a set-returning SQL function is used
in a FROM clause, performance is improved because the output is collected into
a tuplestore within the function, rather than using the less efficient
value-per-call mechanism.
2008-10-31 19:37:56 +00:00
Tom Lane 05bba3d176 Be more tense about not creating tuplestores with randomAccess = true unless
backwards scan could actually happen.  In particular, pass a flag to
materialize-mode SRFs that tells them whether they need to require random
access.  In passing, also suppress unneeded backward-scan overhead for a
Portal's holdStore tuplestore.  Per my proposal about reducing I/O costs for
tuplestores.
2008-10-29 00:00:39 +00:00
Tom Lane e3e3d2a789 Extend ExecMakeFunctionResult() to support set-returning functions that return
via a tuplestore instead of value-per-call.  Refactor a few things to reduce
ensuing code duplication with nodeFunctionscan.c.  This represents the
reasonably noncontroversial part of my proposed patch to switch SQL functions
over to returning tuplestores.  For the moment, SQL functions still do things
the old way.  However, this change enables PL SRFs to be called in targetlists
(observe changes in plperl regression results).
2008-10-28 22:02:06 +00:00
Tom Lane 44d5be0e53 Implement SQL-standard WITH clauses, including WITH RECURSIVE.
There are some unimplemented aspects: recursive queries must use UNION ALL
(should allow UNION too), and we don't have SEARCH or CYCLE clauses.
These might or might not get done for 8.4, but even without them it's a
pretty useful feature.

There are also a couple of small loose ends and definitional quibbles,
which I'll send a memo about to pgsql-hackers shortly.  But let's land
the patch now so we can get on with other development.

Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
2008-10-04 21:56:55 +00:00
Tom Lane dad4cb6258 Improve tuplestore.c to support multiple concurrent read positions.
This facility replaces the former mark/restore support but is otherwise
upward-compatible with previous uses.  It's expected to be needed for
single evaluation of CTEs and also for window functions, so I'm committing
it separately instead of waiting for either one of those patches to be
finished.  Per discussion with Greg Stark and Hitoshi Harada.

Note: I removed nodeFunctionscan's mark/restore support, instead of bothering
to update it for this change, because it was dead code anyway.
2008-10-01 19:51:50 +00:00
Tom Lane 35c2a3c3cf Allow ShowBufferUsage() to report the number of reads/writes that have
occurred to temporary files.  This replaces the unused
NDirectFileRead/NDirectFileWrite counters.

Itagaki Takahiro
2008-09-17 13:15:55 +00:00
Tom Lane 1cd935609f Fix caching of foreign-key-checking queries so that when a replan is needed,
we regenerate the SQL query text not merely the plan derived from it.  This
is needed to handle contingencies such as renaming of a table or column
used in an FK.  Pre-8.3, such cases worked despite the lack of replanning
(because the cached plan needn't actually change), so this is a regression.
Per bug #4417 from Benjamin Bihler.
2008-09-15 23:37:40 +00:00
Tom Lane d320101b5b Get rid of the last remaining uses of var_is_rel(), to wit some debugging
checks in ExecIndexBuildScanKeys() that were inadequate anyway: it's better
to verify the correct varno on an expected index key, not just reject OUTER
and INNER.

This makes the entire current contents of nodeFuncs.c dead code.  I'll be
replacing it with some other stuff later, as per recent proposal.
2008-08-25 20:20:30 +00:00
Tom Lane bd3daddaf2 Arrange to convert EXISTS subqueries that are equivalent to hashable IN
subqueries into the same thing you'd have gotten from IN (except always with
unknownEqFalse = true, so as to get the proper semantics for an EXISTS).
I believe this fixes the last case within CVS HEAD in which an EXISTS could
give worse performance than an equivalent IN subquery.

The tricky part of this is that if the upper query probes the EXISTS for only
a few rows, the hashing implementation can actually be worse than the default,
and therefore we need to make a cost-based decision about which way to use.
But at the time when the planner generates plans for subqueries, it doesn't
really know how many times the subquery will be executed.  The least invasive
solution seems to be to generate both plans and postpone the choice until
execution.  Therefore, in a query that has been optimized this way, EXPLAIN
will show two subplans for the EXISTS, of which only one will actually get
executed.

There is a lot more that could be done based on this infrastructure: in
particular it's interesting to consider switching to the hash plan if we start
out using the non-hashed plan but find a lot more upper rows going by than we
expected.  I have therefore left some minor inefficiencies in place, such as
initializing both subplans even though we will currently only use one.
2008-08-22 00:16:04 +00:00
Tom Lane a77eaa6a95 As noted by Andrew Gierth, there's really no need any more to force a junk
filter to be used when INSERT or SELECT INTO has a plan that returns raw
disk tuples.  The virtual-tuple-slot optimizations that were put in place
awhile ago mean that ExecInsert has to do ExecMaterializeSlot, and that
already copies the tuple if it's raw (and does so more efficiently than
a junk filter, too).  So get rid of that logic.  This in turn means that
we can throw away ExecMayReturnRawTuples, which wasn't used for any other
purpose, and was always a kluge anyway.

In passing, move a couple of SELECT-INTO-specific fields out of EState
and into the private state of the SELECT INTO DestReceiver, as was foreseen
in an old comment there.  Also make intorel_receive use ExecMaterializeSlot
not ExecCopySlotTuple, for consistency with ExecInsert and to possibly save
a tuple copy step in some cases.
2008-07-26 19:15:35 +00:00
Tom Lane 6cc88f0af5 Provide a function hook to let plug-ins get control around ExecutorRun.
ITAGAKI Takahiro
2008-07-18 18:23:47 +00:00
Tom Lane 3bc25384d7 Move the "instr_time" typedef and associated macros into a new header
file portability/instr_time.h, and add a couple more macros to eliminate
some abstraction leakage we formerly had.  Also update psql to use this
header instead of its own copy of nearly the same code.

This commit in itself is just code cleanup and shouldn't change anything.
It lays some groundwork for the upcoming function-stats patch, though.
2008-05-14 19:10:29 +00:00
Tom Lane 226837e57e Since createplan.c no longer cares whether index operators are lossy, it has
no particular need to do get_op_opfamily_properties() while building an
indexscan plan.  Postpone that lookup until executor start.  This simplifies
createplan.c a lot more than it complicates nodeIndexscan.c, and makes things
more uniform since we already had to do it that way for RowCompare
expressions.  Should be a bit faster too, at least for plans that aren't
re-used many times, since we avoid palloc'ing and perhaps copying the
intermediate list data structure.
2008-04-13 20:51:21 +00:00
Tom Lane d5466e38f0 Add SPI-level support for executing SQL commands with one-time-use plans,
that is commands that have out-of-line parameters but the plan is prepared
assuming that the parameter values are constants.  This is needed for the
plpgsql EXECUTE USING patch, but will probably have use elsewhere.

This commit includes the SPI functions and documentation, but no callers
nor regression tests.  The upcoming EXECUTE USING patch will provide
regression-test coverage.  I thought committing this separately made
sense since it's logically a distinct feature.
2008-04-01 03:09:30 +00:00
Tom Lane 7692d8d5b7 Support statement-level ON TRUNCATE triggers. Simon Riggs 2008-03-28 00:21:56 +00:00
Tom Lane 0d49838df6 Arrange to "inline" SQL functions that appear in a query's FROM clause,
are declared to return set, and consist of just a single SELECT.  We
can replace the FROM-item with a sub-SELECT and then optimize much as
if we were dealing with a view.  Patch from Richard Rowell, cleaned up
by me.
2008-03-18 22:04:14 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Tom Lane 895a94de6d Avoid incrementing the CommandCounter when CommandCounterIncrement is called
but no database changes have been made since the last CommandCounterIncrement.
This should result in a significant improvement in the number of "commands"
that can typically be performed within a transaction before hitting the 2^32
CommandId size limit.  In particular this buys back (and more) the possible
adverse consequences of my previous patch to fix plan caching behavior.

The implementation requires tracking whether the current CommandCounter
value has been "used" to mark any tuples.  CommandCounter values stored into
snapshots are presumed not to be used for this purpose.  This requires some
small executor changes, since the executor used to conflate the curcid of
the snapshot it was using with the command ID to mark output tuples with.
Separating these concepts allows some small simplifications in executor APIs.

Something for the TODO list: look into having CommandCounterIncrement not do
AcceptInvalidationMessages.  It seems fairly bogus to be doing it there,
but exactly where to do it instead isn't clear, and I'm disinclined to mess
with asynchronous behavior during late beta.
2007-11-30 21:22:54 +00:00
Bruce Momjian f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane 817946bb04 Arrange to cache a ResultRelInfo in the executor's EState for relations that
are not one of the query's defined result relations, but nonetheless have
triggers fired against them while the query is active.  This was formerly
impossible but can now occur because of my recent patch to fix the firing
order for RI triggers.  Caching a ResultRelInfo avoids duplicating work by
repeatedly opening and closing the same relation, and also allows EXPLAIN
ANALYZE to "see" and report on these extra triggers.  Use the same mechanism
to cache open relations when firing deferred triggers at transaction shutdown;
this replaces the former one-element-cache strategy used in that case, and
should improve performance a bit when there are deferred triggers on a number
of relations.
2007-08-15 21:39:50 +00:00
Tom Lane 9cb8409762 Repair problems occurring when multiple RI updates have to be done to the same
row within one query: we were firing check triggers before all the updates
were done, leading to bogus failures.  Fix by making the triggers queued by
an RI update go at the end of the outer query's trigger event list, thereby
effectively making the processing "breadth-first".  This was indeed how it
worked pre-8.0, so the bug does not occur in the 7.x branches.
Per report from Pavel Stehule.
2007-08-15 19:15:47 +00:00
Magnus Hagander 906b2e1b37 Rename DLLIMPORT macro to PGDLLIMPORT to avoid conflict with
third party includes (like tcl) that define DLLIMPORT.
2007-07-25 12:22:54 +00:00
Tom Lane a9545b3aef Improve UPDATE/DELETE WHERE CURRENT OF so that they can be used from plpgsql
with a plpgsql-defined cursor.  The underlying mechanism for this is that the
main SQL engine will now take "WHERE CURRENT OF $n" where $n is a refcursor
parameter.  Not sure if we should document that fact or consider it an
implementation detail.  Per discussion with Pavel Stehule.
2007-06-11 22:22:42 +00:00
Tom Lane 6808f1b1de Support UPDATE/DELETE WHERE CURRENT OF cursor_name, per SQL standard.
Along the way, allow FOR UPDATE in non-WITH-HOLD cursors; there may once
have been a reason to disallow that, but it seems to work now, and it's
really rather necessary if you want to select a row via a cursor and then
update it in a concurrent-safe fashion.

Original patch by Arul Shaji, rather heavily editorialized by Tom Lane.
2007-06-11 01:16:30 +00:00
Tom Lane 24ee8af573 Rework temp_tablespaces patch so that temp tablespaces are assigned separately
for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.)  Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.
2007-06-07 19:19:57 +00:00
Tom Lane acfce502ba Create a GUC parameter temp_tablespaces that allows selection of the
tablespace(s) in which to store temp tables and temporary files.  This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created).  Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.

Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
2007-06-03 17:08:34 +00:00
Tom Lane bd2c980b22 Buy back some of the cycles spent in more-expensive hash functions by
selecting power-of-2, rather than prime, numbers of buckets in hash joins.
If the hash functions are doing their jobs properly by making all hash bits
equally random, this is good enough, and it saves expensive integer division
and modulus operations.
2007-06-01 17:38:44 +00:00
Tom Lane f01b196597 Support scrollable cursors (ie, 'direction' clause in FETCH) in plpgsql.
Pavel Stehule, reworked a bit by Tom.
2007-04-16 17:21:24 +00:00
Tom Lane 66888f7424 Expose more cursor-related functionality in SPI: specifically, allow
access to the planner's cursor-related planning options, and provide new
FETCH/MOVE routines that allow access to the full power of those commands.
Small refactoring of planner(), pg_plan_query(), and pg_plan_queries()
APIs to make it convenient to pass the planning options down from SPI.

This is the core-code portion of Pavel Stehule's patch for scrollable
cursor support in plpgsql; I'll review and apply the plpgsql changes
separately.
2007-04-16 01:14:58 +00:00
Tom Lane bf8236526b Remove the prohibition on executing cursor commands through SPI_execute.
Vadim had included this restriction in the original design of the SPI code,
but I'm darned if I can see a reason for it.

I left the macro definition of SPI_ERROR_CURSOR in place, so as not to
needlessly break any SPI callers that are checking for it, but that code
will never actually be returned anymore.
2007-03-25 23:27:59 +00:00
Tom Lane 95f6d2d209 Make use of plancache module for SPI plans. In particular, since plpgsql
uses SPI plans, this finally fixes the ancient gotcha that you can't
drop and recreate a temp table used by a plpgsql function.

Along the way, clean up SPI's API a little bit by declaring SPI plan
pointers as "SPIPlanPtr" instead of "void *".  This is cosmetic but
helps to forestall simple programming mistakes.  (I have changed some
but not all of the callers to match; there are still some "void *"'s
in contrib and the PL's.  This is intentional so that we can see if
anyone's compiler complains about it.)
2007-03-15 23:12:07 +00:00
Tom Lane c7ff7663e4 Get rid of the separate EState for subplans, and just let them share the
parent query's EState.  Now that there's a single flat rangetable for both
the main plan and subplans, there's no need anymore for a separate EState,
and removing it allows cleaning up some crufty code in nodeSubplan.c and
nodeSubqueryscan.c.  Should be a tad faster too, although any difference
will probably be hard to measure.  This is the last bit of subsidiary
mop-up work from changing to a flat rangetable.
2007-02-27 01:11:26 +00:00
Tom Lane eab6b8b27e Turn the rangetable used by the executor into a flat list, and avoid storing
useless substructure for its RangeTblEntry nodes.  (I chose to keep using the
same struct node type and just zero out the link fields for unneeded info,
rather than making a separate ExecRangeTblEntry type --- it seemed too
fragile to have two different rangetable representations.)

Along the way, put subplans into a list in the toplevel PlannedStmt node,
and have SubPlan nodes refer to them by list index instead of direct pointers.
Vadim wanted to do that years ago, but I never understood what he was on about
until now.  It makes things a *whole* lot more robust, because we can stop
worrying about duplicate processing of subplans during expression tree
traversals.  That's been a constant source of bugs, and it's finally gone.

There are some consequent simplifications yet to be made, like not using
a separate EState for subplans in the executor, but I'll tackle that later.
2007-02-22 22:00:26 +00:00
Tom Lane 9cbd0c155d Remove the Query structure from the executor's API. This allows us to stop
storing mostly-redundant Query trees in prepared statements, portals, etc.
To replace Query, a new node type called PlannedStmt is inserted by the
planner at the top of a completed plan tree; this carries just the fields of
Query that are still needed at runtime.  The statement lists kept in portals
etc. now consist of intermixed PlannedStmt and bare utility-statement nodes
--- no Query.  This incidentally allows us to remove some fields from Query
and Plan nodes that shouldn't have been there in the first place.

Still to do: simplify the execution-time range table; at the moment the
range table passed to the executor still contains Query trees for subqueries.

initdb forced due to change of stored rules.
2007-02-20 17:32:18 +00:00
Tom Lane bfe553fb49 Repair oversight in 8.2 change that improved the handling of "pseudoconstant"
WHERE clauses.  createplan.c is now willing to stick a gating Result node
almost anywhere in the plan tree, and in particular one can wind up directly
underneath a MergeJoin node.  This means it had better be willing to handle
Mark/Restore.  Fortunately, that's trivial in such cases, since we can just
pass off the call to the input node (which the planner has previously ensured
can handle Mark/Restore).  Per report from Phil Frost.
2007-02-15 03:07:13 +00:00
Tom Lane ab05eedecc Add support for cross-type hashing in hashed subplans (hashed IN/NOT IN cases
that aren't turned into true joins).  Since this is the last missing bit of
infrastructure, go ahead and fill out the hash integer_ops and float_ops
opfamilies with cross-type operators.  The operator family project is now
DONE ... er, except for documentation ...
2007-02-06 02:59:15 +00:00
Tom Lane 5413eef8dc Repair failure to check that a table is still compatible with a previously
made query plan.  Use of ALTER COLUMN TYPE creates a hazard for cached
query plans: they could contain Vars that claim a column has a different
type than it now has.  Fix this by checking during plan startup that Vars
at relation scan level match the current relation tuple descriptor.  Since
at that point we already have at least AccessShareLock, we can be sure the
column type will not change underneath us later in the query.  However,
since a backend's locks do not conflict against itself, there is still a
hole for an attacker to exploit: he could try to execute ALTER COLUMN TYPE
while a query is in progress in the current backend.  Seal that hole by
rejecting ALTER TABLE whenever the target relation is already open in
the current backend.

This is a significant security hole: not only can one trivially crash the
backend, but with appropriate misuse of pass-by-reference datatypes it is
possible to read out arbitrary locations in the server process's memory,
which could allow retrieving database content the user should not be able
to see.  Our thanks to Jeff Trout for the initial report.

Security: CVE-2007-0556
2007-02-02 00:07:03 +00:00
Tom Lane a635c08fa1 Add support for cross-type hashing in hash index searches and hash joins.
Hashing for aggregation purposes still needs work, so it's not time to
mark any cross-type operators as hashable for general use, but these cases
work if the operators are so marked by hand in the system catalogs.
2007-01-30 01:33:36 +00:00
Tom Lane b39e91501c Improve hash join to discard input tuples immediately if they can't
match because they contain a null join key (and the join operator is
known strict).  Improves performance significantly when the inner
relation contains a lot of nulls, as per bug #2930.
2007-01-28 23:21:26 +00:00
Tom Lane a191a169d6 Change the planner-to-executor API so that the planner tells the executor
which comparison operators to use for plan nodes involving tuple comparison
(Agg, Group, Unique, SetOp).  Formerly the executor looked up the default
equality operator for the datatype, which was really pretty shaky, since it's
possible that the data being fed to the node is sorted according to some
nondefault operator class that could have an incompatible idea of equality.
The planner knows what it has sorted by and therefore can provide the right
equality operator to use.  Also, this change moves a couple of catalog lookups
out of the executor and into the planner, which should help startup time for
pre-planned queries by some small amount.  Modify the planner to remove some
other cavalier assumptions about always being able to use the default
operators.  Also add "nulls first/last" info to the Plan node for a mergejoin
--- neither the executor nor the planner can cope yet, but at least the API is
in place.
2007-01-10 18:06:05 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane 0cbc5b1ed4 Fix failure due to accessing an already-freed tuple descriptor in a plan
involving HashAggregate over SubqueryScan (this is the known case, there
may well be more).  The bug is only latent in releases before 8.2 since they
didn't try to access tupletable slots' descriptors during ExecDropTupleTable.
The least bogus fix seems to be to make subqueries share the parent query's
memory context, so that tupdescs they create will have the same lifespan as
those of the parent query.  There are comments in the code envisioning going
even further by not having a separate child EState at all, but that will
require rethinking executor access to range tables, which I don't want to
tackle right now.  Per bug report from Jean-Pierre Pelletier.
2006-12-26 21:37:20 +00:00
Tom Lane 8dcc8e3761 Refactor ExecGetJunkAttribute to avoid searching for junk attributes
by name on each and every row processed.  Profiling suggests this may
buy a percent or two for simple UPDATE scenarios, which isn't huge,
but when it's so easy to get ...
2006-12-04 02:06:55 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Bruce Momjian 0e20c48561 Revert FETCH/MOVE int64 patch. Was using incorrect checks for
fetch/move in scan.l.
2006-09-03 03:19:45 +00:00
Bruce Momjian 6c785d599d Change FETCH/MOVE to use int8.
Dhanaraj M
2006-09-02 18:17:18 +00:00
Tom Lane ea2e263539 Add new return codes SPI_OK_INSERT_RETURNING etc to the SPI API.
Fix all the standard PLs to be able to return tuples from FOO_RETURNING
statements as well as utility statements that return tuples.  Also,
fix oversight that SPI_processed wasn't set for a utility statement
returning tuples.  Per recent discussion.
2006-08-27 23:47:58 +00:00
Tom Lane 7a3e30e608 Add INSERT/UPDATE/DELETE RETURNING, with basic docs and regression tests.
plpgsql support to come later.  Along the way, convert execMain's
SELECT INTO support into a DestReceiver, in order to eliminate some ugly
special cases.

Jonah Harris and Tom Lane
2006-08-12 02:52:06 +00:00
Tom Lane c68489863c Fix domain_in() bug exhibited by Darcy Buskermolen. The idea of an EState
that's shorter-lived than the expression state being evaluated in it really
doesn't work :-( --- we end up with fn_extra caches getting deleted while
still in use.  Rather than abandon the notion of caching expression state
across domain_in calls altogether, I chose to make domain_in a bit cozier
with ExprContext.  All we really need for evaluating variable-free
expressions is an ExprContext, not an EState, so I invented the notion of a
"standalone" ExprContext.  domain_in can prevent resource leakages by doing
a ReScanExprContext on this rather than having to free it entirely; so we
can make the ExprContext have the same lifespan (and particularly the same
per_query memory context) as the expression state structs.
2006-08-04 21:33:36 +00:00
Joe Conway 9caafda579 Add support for multi-row VALUES clauses as part of INSERT statements
(e.g. "INSERT ... VALUES (...), (...), ...") and elsewhere as allowed
by the spec. (e.g. similar to a FROM clause subselect). initdb required.
Joe Conway and Tom Lane.
2006-08-02 01:59:48 +00:00
Bruce Momjian b43ebe5f83 More include file adjustments. 2006-07-13 18:01:02 +00:00
Bruce Momjian b844dd3f9e More include file adjustments. 2006-07-13 17:47:02 +00:00
Bruce Momjian a22d76d96a Allow include files to compile own their own.
Strip unused include files out unused include files, and add needed
includes to C files.

The next step is to remove unused include files in C files.
2006-07-13 16:49:20 +00:00
Bruce Momjian ac230e7431 Alphabetically order reference to include files, "S"-"Z". 2006-07-11 18:26:11 +00:00
Bruce Momjian fa601357fb Sort reference of include files, "A" - "F". 2006-07-11 16:35:33 +00:00
Tom Lane 69d0a15e2a Convert hash join code to use MinimalTuple format in tuple hash table
and batch files.  Should reduce memory and I/O demands for such joins.
2006-06-27 21:31:20 +00:00
Tom Lane 3f50ba27cf Create infrastructure for 'MinimalTuple' representation of in-memory
tuples with less header overhead than a regular HeapTuple, per my
recent proposal.  Teach TupleTableSlot code how to deal with these.
As proof of concept, change tuplestore.c to store MinimalTuples instead
of HeapTuples.  Future patches will expand the concept to other places
where it is useful.
2006-06-27 02:51:40 +00:00
Tom Lane 06e10abc0b Fix problems with cached tuple descriptors disappearing while still in use
by creating a reference-count mechanism, similar to what we did a long time
ago for catcache entries.  The back branches have an ugly solution involving
lots of extra copies, but this way is more efficient.  Reference counting is
only applied to tupdescs that are actually in caches --- there seems no need
to use it for tupdescs that are generated in the executor, since they'll go
away during plan shutdown by virtue of being in the per-query memory context.
Neil Conway and Tom Lane
2006-06-16 18:42:24 +00:00
Tom Lane 5de0cbdf0c Revert sampling patch for EXPLAIN ANALYZE; it turns out to be too unreliable
because node timing is much less predictable than the patch expects.  I kept
the API change for InstrStopNode, however.
2006-06-09 19:30:56 +00:00
Tom Lane a18ebc5541 Code review for EXPLAIN patch. Fix some typos, make it behave sanely
across multiple loops, get rid of the shaky assumption that exactly one
tuple is returned per node iteration.
2006-05-30 19:24:25 +00:00
Bruce Momjian 87bd07d979 Make EXPLAIN sampling smarter, to avoid excessive sampling delay.
Martijn van Oosterhout
2006-05-30 14:01:58 +00:00
Tom Lane 798e63ffb0 Remove CXT_printf/CXT1_printf macros. If anyone had found them to be of
any use in the past many years, we'd have made some effort to include
them in all executor node types; but in fact they were only in
nodeAppend.c and nodeIndexscan.c, up until I copied nodeIndexscan.c's
occurrence into the new bitmap node types.  Remove some other unused
macros in execdebug.h, too.  Some day the whole header probably ought to
go away in favor of better-designed facilities.
2006-05-23 15:21:52 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane 2c0ef9777c Extend the ExecInitNode API so that plan nodes receive a set of flag
bits indicating which optional capabilities can actually be exercised
at runtime.  This will allow Sort and Material nodes, and perhaps later
other nodes, to avoid unnecessary overhead in common cases.
This commit just adds the infrastructure and arranges to pass the correct
flag values down to plan nodes; none of the actual optimizations are here
yet.  I'm committing this separately in case anyone wants to measure the
added overhead.  (It should be negligible.)

Simon Riggs and Tom Lane
2006-02-28 04:10:28 +00:00
Tom Lane 3a0a16cb7e Allow row comparisons to be used as indexscan qualifications.
This completes the project to upgrade our handling of row comparisons.
2006-01-25 20:29:24 +00:00
Tom Lane 25b9b1b042 Repair "Halloween problem" in EvalPlanQual: a tuple that's been inserted by
our own command (or more generally, xmin = our xact and cmin >= current
command ID) should not be seen as good.  Else we may try to update rows
we already updated.  This error was inserted last August while fixing the
even bigger problem that the old coding wouldn't see *any* tuples inserted
by our own transaction as good.  Per report from Euler Taveira de Oliveira.
2006-01-12 21:48:53 +00:00
Tom Lane a98871b7ac Tweak indexscan machinery to avoid taking an AccessShareLock on an index
if we already have a stronger lock due to the index's table being the
update target table of the query.  Same optimization I applied earlier
at the table level.  There doesn't seem to be much interest in the more
radical idea of not locking indexes at all, so do what we can ...
2005-12-03 05:51:03 +00:00
Tom Lane d780f07ac1 Adjust scan plan nodes to avoid getting an extra AccessShareLock on a
relation if it's already been locked by execMain.c as either a result
relation or a FOR UPDATE/SHARE relation.  This avoids an extra trip to
the shared lock manager state.  Per my suggestion yesterday.
2005-12-02 20:03:42 +00:00
Tom Lane 290166f934 Teach planner and executor to handle ScalarArrayOpExpr as an indexable
qualification when the underlying operator is indexable and useOr is true.
That is, indexkey op ANY (ARRAY[...]) is effectively translated into an
OR combination of one indexscan for each array element.  This only works
for bitmap index scans, of course, since regular indexscans no longer
support OR'ing of scans.  There are still some loose ends to clean up
before changing 'x IN (list)' to translate as a ScalarArrayOpExpr;
for instance predtest.c ought to be taught about it.  But this gets the
basic functionality in place.
2005-11-25 19:47:50 +00:00
Tom Lane 4dd2048a47 Get rid of ExecAssignResultTypeFromOuterPlan() and make all plan node types
generate their output tuple descriptors from their target lists (ie, using
ExecAssignResultTypeFromTL()).  We long ago fixed things so that all node
types have minimally valid tlists, so there's no longer any good reason to
have two different ways of doing it.  This change is needed to fix bug
reported by Hayden James: the fix of 2005-11-03 to emit the correct column
names after optimizing away a SubqueryScan node didn't work if the new
top-level plan node used ExecAssignResultTypeFromOuterPlan to generate its
tupdesc, since the next plan node down won't have the correct column labels.
2005-11-23 20:27:58 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane 1b61ee3c69 _SPI_execute_plan failed to return result tuple table to caller in
the ProcessUtility case, resulting in an intratransaction memory leak
if a utility command actually did return any tuples, as reported by
Dmitry Karasik.  Fix this and also make the behavior more consistent
for cases involving nested SPI operations and multiple query trees,
by ensuring that we store the state locally until it is ready to be
returned to the caller.
2005-10-01 18:43:19 +00:00
Tom Lane f57e3f4cf3 Repair problems with VACUUM destroying t_ctid chains too soon, and with
insufficient paranoia in code that follows t_ctid links.  (We must do both
because even with VACUUM doing it properly, the intermediate state with
a dangling t_ctid link is visible concurrently during lazy VACUUM, and
could be seen afterwards if either type of VACUUM crashes partway through.)
Also try to improve documentation about what's going on.  Patch is a bit
bulky because passing the XMAX information around required changing the
APIs of some low-level heapam.c routines, but it's not conceptually very
complicated.  Per trouble report from Teodor and subsequent analysis.
This needs to be back-patched, but I'll do that after 8.1 beta is out.
2005-08-20 00:40:32 +00:00
Tom Lane 184e7a73a5 Revise nodeMergejoin in light of example provided by Guillaume Smet.
When one side of the join has a NULL, we don't want to uselessly try
to match it against every remaining tuple of the other side.  While
at it, rewrite the comparison machinery to avoid multiple evaluations
of the left and right input expressions and to use a btree comparator
where available, instead of double operator calls.  Also revise the
state machine to eliminate redundant comparisons and hopefully make it
more readable too.
2005-05-13 21:20:16 +00:00
Neil Conway f478856c7f Change SPI functions to use a `long' when specifying the number of tuples
to produce when running the executor. This is consistent with the internal
executor APIs (such as ExecutorRun), which also use a long for this purpose.
It also allows FETCH_ALL to be passed -- since FETCH_ALL is defined as
LONG_MAX, this wouldn't have worked on platforms where int and long are of
different sizes. Per report from Tzahi Fadida.
2005-05-02 00:37:07 +00:00
Tom Lane 5b05185262 Remove support for OR'd indexscans internal to a single IndexScan plan
node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
2005-04-25 01:30:14 +00:00
Tom Lane 4a8c5d0375 Create executor and planner-backend support for decoupled heap and index
scans, using in-memory tuple ID bitmaps as the intermediary.  The planner
frontend (path creation and cost estimation) is not there yet, so none
of this code can be executed.  I have tested it using some hacked planner
code that is far too ugly to see the light of day, however.  Committing
now so that the bulk of the infrastructure changes go in before the tree
drifts under me.
2005-04-19 22:35:18 +00:00
Tom Lane d8b1bf4791 Create a new 'MultiExecProcNode' call API for plan nodes that don't
return just a single tuple at a time.  Currently the only such node
type is Hash, but I expect we will soon have indexscans that can return
tuple bitmaps.  A side benefit is that EXPLAIN ANALYZE now shows the
correct tuple count for a Hash node.
2005-04-16 20:07:35 +00:00
Tom Lane 47888fe842 First phase of OUT-parameters project. We can now define and use SQL
functions with OUT parameters.  The various PLs still need work, as does
pg_dump.  Rudimentary docs and regression tests included.
2005-03-31 22:46:33 +00:00
Neil Conway 4f6f5db474 Add SPI_getnspname(), including documentation. 2005-03-29 02:53:53 +00:00
Tom Lane adb1a6e95b Improve EXPLAIN ANALYZE to show the time spent in each trigger when
executing a statement that fires triggers.  Formerly this time was
included in "Total runtime" but not otherwise accounted for.
As a side benefit, we avoid re-opening relations when firing non-deferred
AFTER triggers, because the trigger code can re-use the main executor's
ResultRelInfo data structure.
2005-03-25 21:58:00 +00:00
Tom Lane 9e0dd84596 On Windows, use QueryPerformanceCounter instead of gettimeofday for
EXPLAIN ANALYZE instrumentation.  Magnus Hagander
2005-03-20 22:27:52 +00:00
Tom Lane f97aebd162 Revise TupleTableSlot code to avoid unnecessary construction and disassembly
of tuples when passing data up through multiple plan nodes.  A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays.  Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again.  This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.)  A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.

I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '.  This provides a better match to the convention used by
ExecEvalExpr.  While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
2005-03-16 21:38:10 +00:00
Tom Lane a9b05bdc83 Avoid O(N^2) overhead in repeated nocachegetattr calls when columns of
a tuple are being accessed via ExecEvalVar and the attcacheoff shortcut
isn't usable (due to nulls and/or varlena columns).  To do this, cache
Datums extracted from a tuple in the associated TupleTableSlot.
Also some code cleanup in and around the TupleTable handling.
Atsushi Ogawa with some kibitzing by Tom Lane.
2005-03-14 04:41:13 +00:00
Tom Lane 849074f9ae Revise hash join code so that we can increase the number of batches
on-the-fly, and thereby avoid blowing out memory when the planner has
underestimated the hash table size.  Hash join will now obey the
work_mem limit with some faithfulness.  Per my recent proposal
(hash aggregate part isn't done yet though).
2005-03-06 22:15:05 +00:00
Tom Lane 0bf2587df4 Improve planner's estimation of the space needed for HashAgg plans:
look at the actual aggregate transition datatypes and the actual overhead
needed by nodeAgg.c, instead of using pessimistic round numbers.
Per a discussion with Michael Tiemann.
2005-01-28 19:34:28 +00:00
Bruce Momjian 2daed8c5b3 Update copyrights that were missed. 2005-01-01 05:43:09 +00:00
PostgreSQL Daemon 2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Tom Lane 7efa8411cc Rethink plpgsql's way of handling SPI execution during an exception block.
We don't really want to start a new SPI connection, just keep using the old
one; otherwise we have memory management problems as illustrated by
John Kennedy's bug report of today.  This requires a bit of a hack to
ensure the SPI stack state is properly restored, but then again what we
were doing before was a hack too, strictly speaking.  Add a regression
test to cover this case.
2004-11-16 18:10:16 +00:00
Tom Lane a8487e15ed Fix problems with SQL functions returning rowtypes that have dropped
columns.  The returned tuple needs to have appropriate NULL columns
inserted so that it actually matches the declared rowtype.  It seemed
convenient to use a JunkFilter for this, so I made some cleanups and
simplifications in the JunkFilter code to allow it to support this
additional functionality.  (That in turn exposed a latent bug in
nodeAppend.c, which is that it was returning a tuple slot whose
descriptor didn't match its data.)  Also, move check_sql_fn_retval
out of pg_proc.c and into functions.c, where it seems to more naturally
belong.
2004-10-07 18:38:51 +00:00
Bruce Momjian a5d7ba773d Adjust comments previously moved to column 1 by pgident. 2004-10-07 15:21:58 +00:00
Neil Conway be8eafa09d ExecProcAppend() wasn't called ExecAppend() because the latter name was
formerly used in execMain. Since that is no longer the case, this patch
renames ExecProcAppend() to ExecAppend() for the sake of consistency.
2004-09-24 01:36:37 +00:00
Tom Lane 9fcbe2af11 Arrange for hash join to skip scanning the outer relation if it detects
that the inner one is completely empty.  Per recent discussion.  Also some
cosmetic cleanups in nearby code.
2004-09-22 19:13:52 +00:00
Tom Lane 8f9f198603 Restructure subtransaction handling to reduce resource consumption,
as per recent discussions.  Invent SubTransactionIds that are managed like
CommandIds (ie, counter is reset at start of each top transaction), and
use these instead of TransactionIds to keep track of subtransaction status
in those modules that need it.  This means that a subtransaction does not
need an XID unless it actually inserts/modifies rows in the database.
Accordingly, don't assign it an XID nor take a lock on the XID until it
tries to do that.  This saves a lot of overhead for subtransactions that
are only used for error recovery (eg plpgsql exceptions).  Also, arrange
to release a subtransaction's XID lock as soon as the subtransaction
exits, in both the commit and abort cases.  This avoids holding many
unique locks after a long series of subtransactions.  The price is some
additional overhead in XactLockTableWait, but that seems acceptable.
Finally, restructure the state machine in xact.c to have a more orthogonal
set of states for subtransactions.
2004-09-16 16:58:44 +00:00
Tom Lane b2c4071299 Redesign query-snapshot timing so that volatile functions in READ COMMITTED
mode see a fresh snapshot for each command in the function, rather than
using the latest interactive command's snapshot.  Also, suppress fresh
snapshots as well as CommandCounterIncrement inside STABLE and IMMUTABLE
functions, instead using the snapshot taken for the most closely nested
regular query.  (This behavior is only sane for read-only functions, so
the patch also enforces that such functions contain only SELECT commands.)
As per my proposal of 6-Sep-2004; I note that I floated essentially the
same proposal on 19-Jun-2002, but that discussion tailed off without any
action.  Since 8.0 seems like the right place to be taking possibly
nontrivial backwards compatibility hits, let's get it done now.
2004-09-13 20:10:13 +00:00