Commit Graph

243 Commits

Author SHA1 Message Date
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Tom Lane c6e3ac11b6 Create a "sort support" interface API for faster sorting.
This patch creates an API whereby a btree index opclass can optionally
provide non-SQL-callable support functions for sorting.  In the initial
patch, we only use this to provide a directly-callable comparator function,
which can be invoked with a bit less overhead than the traditional
SQL-callable comparator.  While that should be of value in itself, the real
reason for doing this is to provide a datatype-extensible framework for
more aggressive optimizations, as in Peter Geoghegan's recent work.

Robert Haas and Tom Lane
2011-12-07 00:19:39 -05:00
Tom Lane a0185461dd Rearrange the implementation of index-only scans.
This commit changes index-only scans so that data is read directly from the
index tuple without first generating a faux heap tuple.  The only immediate
benefit is that indexes on system columns (such as OID) can be used in
index-only scans, but this is necessary infrastructure if we are ever to
support index-only scans on expression indexes.  The executor is now ready
for that, though the planner still needs substantial work to recognize
the possibility.

To do this, Vars in index-only plan nodes have to refer to index columns
not heap columns.  I introduced a new special varno, INDEX_VAR, to mark
such Vars to avoid confusion.  (In passing, this commit renames the two
existing special varnos to OUTER_VAR and INNER_VAR.)  This allows
ruleutils.c to handle them with logic similar to what we use for subplan
reference Vars.

Since index-only scans are now fundamentally different from regular
indexscans so far as their expression subtrees are concerned, I also chose
to change them to have their own plan node type (and hence, their own
executor source file).
2011-10-11 14:21:30 -04:00
Tom Lane a2822fb933 Support index-only scans using the visibility map to avoid heap fetches.
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page.  This patch depends
on the previous patches that made the visibility map reliable.

There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.

Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
2011-10-07 20:14:13 -04:00
Tom Lane f197272365 Make EXPLAIN ANALYZE report the numbers of rows rejected by filter steps.
This provides information about the numbers of tuples that were visited
but not returned by table scans, as well as the numbers of join tuples
that were considered and discarded within a join plan node.

There is still some discussion going on about the best way to report counts
for outer-join situations, but I think most of what's in the patch would
not change if we revise that, so I'm going to go ahead and commit it as-is.

Documentation changes to follow (they weren't in the submitted patch
either).

Marko Tiikkaja, reviewed by Marc Cousin, somewhat revised by Tom
2011-09-22 11:30:11 -04:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Tom Lane b33f78df17 Fix trigger WHEN conditions when both BEFORE and AFTER triggers exist.
Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER
ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger
fired for the same update.  Fix by not trying to overload the use of
estate->es_trig_tuple_slot.  Per report from Yoran Heling.

Back-patch to 9.0, when trigger WHEN conditions were introduced.
2011-08-21 18:15:55 -04:00
Alvaro Herrera b93f5a5673 Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.h
This lets us stop including rel.h into execnodes.h, which is a widely
used header.
2011-07-04 14:35:58 -04:00
Tom Lane d64713df7e Pass collations to functions in FunctionCallInfoData, not FmgrInfo.
Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings.  Fix by passing it in FunctionCallInfoData
instead.  In particular this allows a clean fix for bug #5970 (record_cmp
not working).  This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
2011-04-12 19:19:24 -04:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane a874fe7b4c Refactor the executor's API to support data-modifying CTEs better.
The originally committed patch for modifying CTEs didn't interact well
with EXPLAIN, as noted by myself, and also had corner-case problems with
triggers, as noted by Dean Rasheed.  Those problems show it is really not
practical for ExecutorEnd to call any user-defined code; so split the
cleanup duties out into a new function ExecutorFinish, which must be called
between the last ExecutorRun call and ExecutorEnd.  Some Asserts have been
added to these functions to help verify correct usage.

It is no longer necessary for callers of the executor to call
AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now
done by ExecutorStart/ExecutorFinish respectively.  If you really need to
suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to
ExecutorStart.

Also, refactor portal commit processing to allow for the possibility that
PortalDrop will invoke user-defined code.  I think this is not actually
necessary just yet, since the portal-execution-strategy logic forces any
non-pure-SELECT query to be run to completion before we will consider
committing.  But it seems like good future-proofing.
2011-02-27 13:44:12 -05:00
Tom Lane 389af95155 Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH.
This patch implements data-modifying WITH queries according to the
semantics that the updates all happen with the same command counter value,
and in an unspecified order.  Therefore one WITH clause can't see the
effects of another, nor can the outer query see the effects other than
through the RETURNING values.  And attempts to do conflicting updates will
have unpredictable results.  We'll need to document all that.

This commit just fixes the code; documentation updates are waiting on
author.

Marko Tiikkaja and Hitoshi Harada
2011-02-25 18:58:02 -05:00
Tom Lane bb74240794 Implement an API to let foreign-data wrappers actually be functional.
This commit provides the core code and documentation needed.  A contrib
module test case will follow shortly.

Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
2011-02-20 00:18:14 -05:00
Tom Lane e617f0d7e4 Fix improper matching of resjunk column names for FOR UPDATE in subselect.
Flattening of subquery range tables during setrefs.c could lead to the
rangetable indexes in PlanRowMark nodes not matching up with the column
names previously assigned to the corresponding resjunk ctid (resp. tableoid
or wholerow) columns.  Typical symptom would be either a "cannot extract
system attribute from virtual tuple" error or an Assert failure.  This
wasn't a problem before 9.0 because we didn't support FOR UPDATE below the
top query level, and so the final flattening could never renumber an RTE
that was relevant to FOR UPDATE.  Fix by using a plan-tree-wide unique
number for each PlanRowMark to label the associated resjunk columns, so
that the number need not change during flattening.

Per report from David Johnston (though I'm darned if I can see how this got
past initial testing of the relevant code).  Back-patch to 9.0.
2011-02-09 23:27:42 -05:00
Tom Lane d487afbb81 Fix PlanRowMark/ExecRowMark structures to handle inheritance correctly.
In an inherited UPDATE/DELETE, each target table has its own subplan,
because it might have a column set different from other targets.  This
means that the resjunk columns we add to support EvalPlanQual might be
at different physical column numbers in each subplan.  The EvalPlanQual
rewrite I did for 9.0 failed to account for this, resulting in possible
misbehavior or even crashes during concurrent updates to the same row,
as seen in a recent report from Gordon Shannon.  Revise the data structure
so that we track resjunk column numbers separately for each subplan.

I also chose to move responsibility for identifying the physical column
numbers back to executor startup, instead of assuming that numbers derived
during preprocess_targetlist would stay valid throughout subsequent
massaging of the plan.  That's a bit slower, so we might want to consider
undoing it someday; but it would complicate the patch considerably and
didn't seem justifiable in a bug fix that has to be back-patched to 9.0.
2011-01-12 20:47:02 -05:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Tom Lane 7b46401557 Move symbols for ExecMergeJoin's state machine into nodeMergejoin.c.
There's no reason for these values to be known anywhere else.  After
doing this, executor/execdefs.h is vestigial and can be removed.
2010-12-30 22:12:40 -05:00
Tom Lane f4e4b32743 Support RIGHT and FULL OUTER JOIN in hash joins.
This is advantageous first because it allows us to hash the smaller table
regardless of the outer-join type, and second because hash join can be more
flexible than merge join in dealing with arbitrary join quals in a FULL
join.  For merge join all the join quals have to be mergejoinable, but hash
join will work so long as there's at least one hashjoinable qual --- the
others can be any condition.  (This is true essentially because we don't
keep per-inner-tuple match flags in merge join, while hash join can do so.)

To do this, we need a has-it-been-matched flag for each tuple in the
hashtable, not just one for the current outer tuple.  The key idea that
makes this practical is that we can store the match flag in the tuple's
infomask, since there are lots of bits there that are of no interest for a
MinimalTuple.  So we aren't increasing the size of the hashtable at all for
the feature.

To write this without turning the hash code into even more of a pile of
spaghetti than it already was, I rewrote ExecHashJoin in a state-machine
style, similar to ExecMergeJoin.  Other than that decision, it was pretty
straightforward.
2010-12-30 20:26:08 -05:00
Tom Lane d583f10b7e Create core infrastructure for KNNGIST.
This is a heavily revised version of builtin_knngist_core-0.9.  The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.

Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too.  Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values.  AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.

Teodor Sigaev and Tom Lane
2010-12-02 20:51:37 -05:00
Tom Lane 0811ff2063 Avoid using a local FunctionCallInfoData struct in ExecMakeFunctionResult
and related routines.

We already had a redundant FunctionCallInfoData struct in FuncExprState,
but were using that copy only in set-returning-function cases, to avoid
keeping function evaluation state in the expression tree for the benefit
of plpgsql's "simple expression" logic.  But of course that didn't work
anyway.  Given the recent fixes in plpgsql there is no need to have two
separate behaviors here.  Getting rid of the local FunctionCallInfoData
structs should make things a little faster (because we don't need to do
InitFunctionCallInfoData each time), and it also makes for a noticeable
reduction in stack space consumption during recursive calls.
2010-11-01 13:54:21 -04:00
Itagaki Takahiro bf76ad07fe Fix typos "are are". 2010-10-26 17:15:17 +09:00
Tom Lane 11cad29c91 Support MergeAppend plans, to allow sorted output from append relations.
This patch eliminates the former need to sort the output of an Append scan
when an ordered scan of an inheritance tree is wanted.  This should be
particularly useful for fast-start cases such as queries with LIMIT.

Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig,
Robert Haas, and Tom Lane.
2010-10-14 16:57:57 -04:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Tom Lane 133924e13e Fix potential failure when hashing the output of a subplan that produces
a pass-by-reference datatype with a nontrivial projection step.
We were using the same memory context for the projection operation as for
the temporary context used by the hashtable routines in execGrouping.c.
However, the hashtable routines feel free to reset their temp context at
any time, which'd lead to destroying input data that was still needed.
Report and diagnosis by Tao Ma.

Back-patch to 8.1, where the problem was introduced by the changes that
allowed us to work with "virtual" tuples instead of materializing intermediate
tuple values everywhere.  The earlier code looks quite similar, but it doesn't
suffer the problem because the data gets copied into another context as a
result of having to materialize ExecProject's output tuple.
2010-07-28 04:50:50 +00:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Tom Lane ec4be2ee68 Extend the set of frame options supported for window functions.
This patch allows the frame to start from CURRENT ROW (in either RANGE or
ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING
start and end points.  (RANGE value PRECEDING/FOLLOWING isn't there yet ---
the grammar works, but that's all.)

Hitoshi Harada, reviewed by Pavel Stehule
2010-02-12 17:33:21 +00:00
Tom Lane 90f4c2d960 Add support for doing FULL JOIN ON FALSE. While this is really a rather
peculiar variant of UNION ALL, and so wouldn't likely get written directly
as-is, it's possible for it to arise as a result of simplification of
less-obviously-silly queries.  In particular, now that we can do flattening
of subqueries that have constant outputs and are underneath an outer join,
it's possible for the case to result from simplification of queries of the
type exhibited in bug #5263.  Back-patch to 8.4 to avoid a functionality
regression for this type of query.
2010-01-05 23:25:36 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Tom Lane 7839d35991 Add an "argisrow" field to NullTest nodes, following a plan made way back in
8.2beta but never carried out.  This avoids repetitive tests of whether the
argument is of scalar or composite type.  Also, be a bit more paranoid about
composite arguments in some places where we previously weren't checking.
2010-01-01 23:03:10 +00:00
Robert Haas cddca5ec13 Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics.
This patch also removes buffer-usage statistics from the track_counts
output, since this (or the global server statistics) is deemed to be a better
interface to this information.

Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
2009-12-15 04:57:48 +00:00
Tom Lane 0cb65564e5 Add exclusion constraints, which generalize the concept of uniqueness to
support any indexable commutative operator, not just equality.  Two rows
violate the exclusion constraint if "row1.col OP row2.col" is TRUE for
each of the columns in the constraint.

Jeff Davis, reviewed by Robert Haas
2009-12-07 05:22:23 +00:00
Tom Lane 7fc0f06221 Add a WHEN clause to CREATE TRIGGER, allowing a boolean expression to be
checked to determine whether the trigger should be fired.

For BEFORE triggers this is mostly a matter of spec compliance; but for AFTER
triggers it can provide a noticeable performance improvement, since queuing of
a deferred trigger event and re-fetching of the row(s) at end of statement can
be short-circuited if the trigger does not need to be fired.

Takahiro Itagaki, reviewed by KaiGai Kohei.
2009-11-20 20:38:12 +00:00
Tom Lane 9f2ee8f287 Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases.  We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries.  If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row.  The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.

Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested.  To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param.  Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.

This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE.  This is needed to avoid the
duplicate-output-tuple problem.  It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 02:26:45 +00:00
Tom Lane 0adaf4cb31 Move the handling of SELECT FOR UPDATE locking and rechecking out of
execMain.c and into a new plan node type LockRows.  Like the recent change
to put table updating into a ModifyTable plan node, this increases planning
flexibility by allowing the operations to occur below the top level of the
plan tree.  It's necessary in any case to restore the previous behavior of
having FOR UPDATE locking occur before ModifyTable does.

This partially refactors EvalPlanQual to allow multiple rows-under-test
to be inserted into the EPQ machinery before starting an EPQ test query.
That isn't sufficient to fix EPQ's general bogosity in the face of plans
that return multiple rows per test row, though.  Since this patch is
mostly about getting some plan node infrastructure in place and not about
fixing ten-year-old bugs, I will leave EPQ improvements for another day.

Another behavioral change that we could now think about is doing FOR UPDATE
before LIMIT, but that too seems like it should be treated as a followon
patch.
2009-10-12 18:10:51 +00:00
Tom Lane 8a5849b7ff Split the processing of INSERT/UPDATE/DELETE operations out of execMain.c.
They are now handled by a new plan node type called ModifyTable, which is
placed at the top of the plan tree.  In itself this change doesn't do much,
except perhaps make the handling of RETURNING lists and inherited UPDATEs a
tad less klugy.  But it is necessary preparation for the intended extension of
allowing RETURNING queries inside WITH.

Marko Tiikkaja
2009-10-10 01:43:50 +00:00
Tom Lane f92e8a4b5e Replace the array-style TupleTable data structure with a simple List of
TupleTableSlot nodes.  This eliminates the need to count in advance
how many Slots will be needed, which seems more than worth the small
increase in the amount of palloc traffic during executor startup.

The ExecCountSlots infrastructure is now all dead code, but I'll remove it
in a separate commit for clarity.

Per a comment from Robert Haas.
2009-09-27 20:09:58 +00:00
Tom Lane c38b75947e Tweak ExecIndexEvalRuntimeKeys to forcibly detoast any toasted comparison
values before they get passed to the index access method.  This avoids
repeated detoastings that will otherwise ensue as the comparison value
is examined by various index support functions.  We have seen a couple of
reports of cases where repeated detoastings result in an order-of-magnitude
slowdown, so it seems worth adding a bit of extra logic to prevent this.

I had previously proposed trying to avoid duplicate detoastings in general,
but this fix takes care of what seems the most important case in practice
with very little effort or risk.

Back-patch to 8.4 so that the PostGIS folk won't have to wait a year to
have this fix in a production release.  (The issue exists further back,
of course, but the code's diverged enough to make backpatching further a
higher-risk action.  Also it appears that the possible gains may be limited
in prior releases because of different handling of lossy operators.)
2009-08-23 18:26:08 +00:00
Tom Lane dcb2bda9b7 Improve plpgsql's ability to cope with rowtypes containing dropped columns,
by supporting conversions in places that used to demand exact rowtype match.

Since this issue is certain to come up elsewhere (in fact, already has,
in ExecEvalConvertRowtype), factor out the support code into new core
functions for tuple conversion.  I chose to put these in a new source
file since heaptuple.c is already overly long.

Heavily revised version of a patch by Pavel Stehule.
2009-08-06 20:44:32 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Peter Eisentraut 77d67a4a3b XMLATTRIBUTES() should send the attribute values through
map_sql_value_to_xml_value() instead of directly through the data type output
function.  This is per SQL standard, and consistent with XMLELEMENT().
2009-04-08 21:51:38 +00:00
Tom Lane 85369f888e Refactor ExecProject and associated routines so that fast-path code is used
for simple Var targetlist entries all the time, even when there are other
entries that are not simple Vars.  Also, ensure that we prefetch attributes
(with slot_getsomeattrs) for all Vars in the targetlist, even those buried
within expressions.  In combination these changes seem to significantly
reduce the runtime for cases where tlists are mostly but not exclusively
Vars.  Per my proposal of yesterday.
2009-04-02 22:39:30 +00:00
Tom Lane 596efd27ed Optimize multi-batch hash joins when the outer relation has a nonuniform
distribution, by creating a special fast path for the (first few) most common
values of the outer relation.  Tuples having hashvalues matching the MCVs
are effectively forced to be in the first batch, so that we never write
them out to the batch temp files.

Bryce Cutt and Ramon Lawrence, with some editorialization by me.
2009-03-21 00:04:40 +00:00
Tom Lane b7b8f0b609 Implement prefetching via posix_fadvise() for bitmap index scans. A new
GUC variable effective_io_concurrency controls how many concurrent block
prefetch requests will be issued.

(The best way to handle this for plain index scans is still under debate,
so that part is not applied yet --- tgl)

Greg Stark
2009-01-12 05:10:45 +00:00
Tom Lane 43a57cf365 Revise the TIDBitmap API to support multiple concurrent iterations over a
bitmap.  This is extracted from Greg Stark's posix_fadvise patch; it seems
worth committing separately, since it's potentially useful independently of
posix_fadvise.
2009-01-10 21:08:36 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Tom Lane 8e8854daa2 Add some basic support for window frame clauses to the window-functions
patch.  This includes the ability to force the frame to cover the whole
partition, and the ability to make the frame end exactly on the current row
rather than its last ORDER BY peer.  Supporting any more of the full SQL
frame-clause syntax will require nontrivial hacking on the window aggregate
code, so it'll have to wait for 8.5 or beyond.
2008-12-31 00:08:39 +00:00
Tom Lane 95b07bc7f5 Support window functions a la SQL:2008.
Hitoshi Harada, with some kibitzing from Heikki and Tom.
2008-12-28 18:54:01 +00:00
Tom Lane 18004101ac Modify UPDATE/DELETE WHERE CURRENT OF to use the FOR UPDATE infrastructure to
locate the target row, if the cursor was declared with FOR UPDATE or FOR
SHARE.  This approach is more flexible and reliable than digging through the
plan tree; for instance it can cope with join cursors.  But we still provide
the old code for use with non-FOR-UPDATE cursors.  Per gripe from Robert Haas.
2008-11-16 17:34:28 +00:00
Tom Lane 0656ed3daa Make SELECT FOR UPDATE/SHARE work on inheritance trees, by having the plan
return the tableoid as well as the ctid for any FOR UPDATE targets that
have child tables.  All child tables are listed in the ExecRowMark list,
but the executor just skips the ones that didn't produce the current row.

Curiously, this longstanding restriction doesn't seem to have been documented
anywhere; so no doc changes.
2008-11-15 19:43:47 +00:00
Tom Lane 9b46abb7c4 Allow SQL-language functions to return the output of an INSERT/UPDATE/DELETE
RETURNING clause, not just a SELECT as formerly.

A side effect of this patch is that when a set-returning SQL function is used
in a FROM clause, performance is improved because the output is collected into
a tuplestore within the function, rather than using the less efficient
value-per-call mechanism.
2008-10-31 19:37:56 +00:00