Commit Graph

172 Commits

Author SHA1 Message Date
Tom Lane a2e923a652 Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scan
is in progress on the same hashtable.  This seems the least invasive way to
fix the recently-recognized problem that a split could cause the scan to
visit entries twice or (with much lower probability) miss them entirely.
The only field-reported problem caused by this is the "failed to re-find
shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky,
which was caused by multiply visited entries.  However, it seems certain
that mdsync() is vulnerable to missing required fsync's due to missed
entries, and I am fearful that RelationCacheInitializePhase2() might be at
risk as well.  Because of that and the generalized hazard presented by this
bug, back-patch all the supported branches.

Along the way, fix pg_prepared_statement() and pg_cursor() to not assume
that the hashtables they are examining will stay static between calls.
This is risky regardless of the newly noted dynahash problem, because
hash_seq_search() has never promised to cope with deletion of table entries
other than the just-returned one.  There may be no bug here because the only
supported way to call these functions is via ExecMakeTableFunctionResult()
which will cycle them to completion before doing anything very interesting,
but it seems best to get rid of the assumption.  This affects 8.2 and HEAD
only, since those functions weren't there earlier.
2007-04-26 23:24:46 +00:00
Tom Lane bf94076348 Fix array coercion expressions to ensure that the correct volatility is
seen by code inspecting the expression.  The best way to do this seems
to be to drop the original representation as a function invocation, and
instead make a special expression node type that represents applying
the element-type coercion function to each array element.  In this way
the element function is exposed and will be checked for volatility.
Per report from Guillaume Smet.
2007-03-27 23:21:12 +00:00
Tom Lane c7ff7663e4 Get rid of the separate EState for subplans, and just let them share the
parent query's EState.  Now that there's a single flat rangetable for both
the main plan and subplans, there's no need anymore for a separate EState,
and removing it allows cleaning up some crufty code in nodeSubplan.c and
nodeSubqueryscan.c.  Should be a tad faster too, although any difference
will probably be hard to measure.  This is the last bit of subsidiary
mop-up work from changing to a flat rangetable.
2007-02-27 01:11:26 +00:00
Tom Lane eab6b8b27e Turn the rangetable used by the executor into a flat list, and avoid storing
useless substructure for its RangeTblEntry nodes.  (I chose to keep using the
same struct node type and just zero out the link fields for unneeded info,
rather than making a separate ExecRangeTblEntry type --- it seemed too
fragile to have two different rangetable representations.)

Along the way, put subplans into a list in the toplevel PlannedStmt node,
and have SubPlan nodes refer to them by list index instead of direct pointers.
Vadim wanted to do that years ago, but I never understood what he was on about
until now.  It makes things a *whole* lot more robust, because we can stop
worrying about duplicate processing of subplans during expression tree
traversals.  That's been a constant source of bugs, and it's finally gone.

There are some consequent simplifications yet to be made, like not using
a separate EState for subplans in the executor, but I'll tackle that later.
2007-02-22 22:00:26 +00:00
Tom Lane 9cbd0c155d Remove the Query structure from the executor's API. This allows us to stop
storing mostly-redundant Query trees in prepared statements, portals, etc.
To replace Query, a new node type called PlannedStmt is inserted by the
planner at the top of a completed plan tree; this carries just the fields of
Query that are still needed at runtime.  The statement lists kept in portals
etc. now consist of intermixed PlannedStmt and bare utility-statement nodes
--- no Query.  This incidentally allows us to remove some fields from Query
and Plan nodes that shouldn't have been there in the first place.

Still to do: simplify the execution-time range table; at the moment the
range table passed to the executor still contains Query trees for subqueries.

initdb forced due to change of stored rules.
2007-02-20 17:32:18 +00:00
Tom Lane ab05eedecc Add support for cross-type hashing in hashed subplans (hashed IN/NOT IN cases
that aren't turned into true joins).  Since this is the last missing bit of
infrastructure, go ahead and fill out the hash integer_ops and float_ops
opfamilies with cross-type operators.  The operator family project is now
DONE ... er, except for documentation ...
2007-02-06 02:59:15 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane 0cbc5b1ed4 Fix failure due to accessing an already-freed tuple descriptor in a plan
involving HashAggregate over SubqueryScan (this is the known case, there
may well be more).  The bug is only latent in releases before 8.2 since they
didn't try to access tupletable slots' descriptors during ExecDropTupleTable.
The least bogus fix seems to be to make subqueries share the parent query's
memory context, so that tupdescs they create will have the same lifespan as
those of the parent query.  There are comments in the code envisioning going
even further by not having a separate child EState at all, but that will
require rethinking executor access to range tables, which I don't want to
tackle right now.  Per bug report from Jean-Pierre Pelletier.
2006-12-26 21:37:20 +00:00
Tom Lane c957c0bac7 Code review for XML patch. Instill a bit of sanity in the location of
the XmlExpr code in various lists, use a representation that has some hope
of reverse-listing correctly (though it's still a de-escaping function
shy of correctness), generally try to make it look more like Postgres
coding conventions.
2006-12-24 00:29:20 +00:00
Peter Eisentraut 8c1de5fb00 Initial SQL/XML support: xml data type and initial set of functions. 2006-12-21 16:05:16 +00:00
Tom Lane 8dcc8e3761 Refactor ExecGetJunkAttribute to avoid searching for junk attributes
by name on each and every row processed.  Profiling suggests this may
buy a percent or two for simple UPDATE scenarios, which isn't huge,
but when it's so easy to get ...
2006-12-04 02:06:55 +00:00
Tom Lane f213131f20 Fix IS NULL and IS NOT NULL tests on row-valued expressions to conform to
the SQL spec, viz IS NULL is true if all the row's fields are null, IS NOT
NULL is true if all the row's fields are not null.  The former coding got
this right for a limited number of cases with IS NULL (ie, those where it
could disassemble a ROW constructor at parse time), but was entirely wrong
for IS NOT NULL.  Per report from Teodor.

I desisted from changing the behavior for arrays, since on closer inspection
it's not clear that there's any support for that in the SQL spec.  This
probably needs more consideration.
2006-09-28 20:51:43 +00:00
Tom Lane e093dcdd28 Add the ability to create indexes 'concurrently', that is, without
blocking concurrent writes to the table.  Greg Stark, with a little help
from Tom Lane.
2006-08-25 04:06:58 +00:00
Tom Lane 7a3e30e608 Add INSERT/UPDATE/DELETE RETURNING, with basic docs and regression tests.
plpgsql support to come later.  Along the way, convert execMain's
SELECT INTO support into a DestReceiver, in order to eliminate some ugly
special cases.

Jonah Harris and Tom Lane
2006-08-12 02:52:06 +00:00
Tom Lane c68489863c Fix domain_in() bug exhibited by Darcy Buskermolen. The idea of an EState
that's shorter-lived than the expression state being evaluated in it really
doesn't work :-( --- we end up with fn_extra caches getting deleted while
still in use.  Rather than abandon the notion of caching expression state
across domain_in calls altogether, I chose to make domain_in a bit cozier
with ExprContext.  All we really need for evaluating variable-free
expressions is an ExprContext, not an EState, so I invented the notion of a
"standalone" ExprContext.  domain_in can prevent resource leakages by doing
a ReScanExprContext on this rather than having to free it entirely; so we
can make the ExprContext have the same lifespan (and particularly the same
per_query memory context) as the expression state structs.
2006-08-04 21:33:36 +00:00
Tom Lane 0dfb595d7a Arrange for ValuesScan to keep per-sublist expression eval state in a
temporary context that can be reset when advancing to the next sublist.
This is faster and more thorough at recovering space than the previous
method; moreover it will do the right thing if something in the sublist
tries to register an expression context callback.
2006-08-02 18:58:21 +00:00
Joe Conway 9caafda579 Add support for multi-row VALUES clauses as part of INSERT statements
(e.g. "INSERT ... VALUES (...), (...), ...") and elsewhere as allowed
by the spec. (e.g. similar to a FROM clause subselect). initdb required.
Joe Conway and Tom Lane.
2006-08-02 01:59:48 +00:00
Tom Lane 108fe47301 Aggregate functions now support multiple input arguments. I also took
the opportunity to treat COUNT(*) as a zero-argument aggregate instead
of the old hack that equated it to COUNT(1); this is materially cleaner
(no more weird ANYOID cases) and ought to be at least a tiny bit faster.
Original patch by Sergey Koposov; review, documentation, simple regression
tests, pg_dump and psql support by moi.
2006-07-27 19:52:07 +00:00
Bruce Momjian 085e559654 Change LIMIT/OFFSET to use int8
Dhanaraj M
2006-07-26 00:34:48 +00:00
Bruce Momjian a22d76d96a Allow include files to compile own their own.
Strip unused include files out unused include files, and add needed
includes to C files.

The next step is to remove unused include files in C files.
2006-07-13 16:49:20 +00:00
Tom Lane 485375a1c9 Fix hash aggregation to suppress unneeded columns from being stored in
tuple hash table entries.  This addresses the problem previously noted
that use of a 'physical tlist' in the input scan node could bloat the
hash table entries far beyond what the planner expects.  It's a better
answer than my previous thought of undoing the physical tlist optimization,
because we can also remove columns that are needed to compute the aggregate
functions but aren't part of the grouping column set.
2006-06-28 19:40:52 +00:00
Tom Lane cfc710312e Adjust TupleHashTables to use MinimalTuple format for contained tuples. 2006-06-28 17:05:49 +00:00
Tom Lane 986085a7f0 Improve the representation of FOR UPDATE/FOR SHARE so that we can
support both FOR UPDATE and FOR SHARE in one command, as well as both
NOWAIT and normal WAIT behavior.  The more general code is actually
simpler and cleaner.
2006-04-30 18:30:40 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane d2c555ee53 Teach nodeSort and nodeMaterial to optimize out unnecessary overhead
when the passed-down eflags indicate they can.
Simon Riggs and Tom Lane
2006-02-28 05:48:44 +00:00
Tom Lane 6e07709760 Implement SQL-compliant treatment of row comparisons for < <= > >= cases
(previously we only did = and <> correctly).  Also, allow row comparisons
with any operators that are in btree opclasses, not only those with these
specific names.  This gets rid of a whole lot of indefensible assumptions
about the behavior of particular operators based on their names ... though
it's still true that IN and NOT IN expand to "= ANY".  The patch adds a
RowCompareExpr expression node type, and makes some changes in the
representation of ANY/ALL/ROWCOMPARE SubLinks so that they can share code
with RowCompareExpr.

I have not yet done anything about making RowCompareExpr an indexable
operator, but will look at that soon.

initdb forced due to changes in stored rules.
2005-12-28 01:30:02 +00:00
Tom Lane d780f07ac1 Adjust scan plan nodes to avoid getting an extra AccessShareLock on a
relation if it's already been locked by execMain.c as either a result
relation or a FOR UPDATE/SHARE relation.  This avoids an extra trip to
the shared lock manager state.  Per my suggestion yesterday.
2005-12-02 20:03:42 +00:00
Tom Lane 4ab76b1c20 Tweak hash join code to use an additional heuristic for deciding whether
it's worth probing the outer relation for emptiness before building the
hash table.  To wit, if we're rescanning a join previously performed,
remember whether we found it nonempty the previous time, and don't bother
with the probe if it was nonempty.  This buys back the performance lost
in examples like Mario Weilguni's.
2005-11-28 23:46:03 +00:00
Tom Lane da27c0a1ef Teach tid-scan code to make use of "ctid = ANY (array)" clauses, so that
"ctid IN (list)" will still work after we convert IN to ScalarArrayOpExpr.
Make some minor efficiency improvements while at it, such as ensuring that
multiple TIDs are fetched in physical heap order.  And fix EXPLAIN so that
it shows what's really going on for a TID scan.
2005-11-26 22:14:57 +00:00
Tom Lane 70f1482de3 Change seqscan logic so that we check visibility of all tuples on a page
when we first read the page, rather than checking them one at a time.
This allows us to take and release the buffer content lock just once
per page, instead of once per tuple.  Since it's a shared lock the
contention penalty for holding the lock longer shouldn't be too bad.
We can safely do this only when using an MVCC snapshot; else the
assumption that visibility won't change over time is uncool.  Therefore
there are now two code paths depending on the snapshot type.  I also
made the same change in nodeBitmapHeapscan.c, where it can be done always
because we only support MVCC snapshots for bitmap scans anyway.
Also make some incidental cleanups in the APIs of these functions.
Per a suggestion from Qingqing Zhou.
2005-11-26 03:03:07 +00:00
Tom Lane 290166f934 Teach planner and executor to handle ScalarArrayOpExpr as an indexable
qualification when the underlying operator is indexable and useOr is true.
That is, indexkey op ANY (ARRAY[...]) is effectively translated into an
OR combination of one indexscan for each array element.  This only works
for bitmap index scans, of course, since regular indexscans no longer
support OR'ing of scans.  There are still some loose ends to clean up
before changing 'x IN (list)' to translate as a ScalarArrayOpExpr;
for instance predtest.c ought to be taught about it.  But this gets the
basic functionality in place.
2005-11-25 19:47:50 +00:00
Bruce Momjian 436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Tom Lane 76ce39e386 Prevent ExecInsert() and ExecUpdate() from scribbling on the result tuple
slot of the topmost plan node when a trigger returns a modified tuple.
These appear to be the only places where a plan node's caller did not
treat the result slot as read-only, which is an assumption that nodeUnique
makes as of 8.1.  Fixes trigger-vs-DISTINCT bug reported by Frank van Vugt.
2005-11-14 17:42:55 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane e990b9ce23 The original patch to avoid building a hash join's hashtable when the
outer relation is empty did not work, per test case from Patrick Welche.
It tried to use nodeHashjoin.c's high-level mechanisms for fetching an
outer-relation tuple, but that code expected the hash table to be filled
already.  As patched, the code failed in corner cases such as having no
outer-relation tuples for the first hash batch.  Revert and rewrite.
2005-09-25 19:37:35 +00:00
Tom Lane 2a4fad1a0e Add NOWAIT option to SELECT FOR UPDATE/SHARE.
Original patch by Hans-Juergen Schoenig, revisions by Karel Zak
and Tom Lane.
2005-08-01 20:31:16 +00:00
Tom Lane 943b396245 Add Oracle-compatible GREATEST and LEAST functions. Pavel Stehule 2005-06-26 22:05:42 +00:00
Tom Lane b95ae32b41 Avoid WAL-logging individual tuple insertions during CREATE TABLE AS
(a/k/a SELECT INTO).  Instead, flush and fsync the whole relation before
committing.  We do still need the WAL log when PITR is active, however.
Simon Riggs and Tom Lane.
2005-06-20 18:37:02 +00:00
Neil Conway c119c5bd49 Change the implementation of hash join to attempt to avoid unnecessary
work if either of the join relations are empty. The logic is:

(1) if the inner relation's startup cost is less than the outer
    relation's startup cost and this is not an outer join, read
    a single tuple from the inner relation via ExecHash()
      - if NULL, we're done

(2) read a single tuple from the outer relation
      - if NULL, we're done

(3) build the hash table on the inner relation
      - if hash table is empty and this is not an outer join,
        we're done

(4) otherwise, do hash join as usual

The implementation uses the new MultiExecProcNode API, per a
suggestion from Tom: invoking ExecHash() now produces the first
tuple from the Hash node's child node, whereas MultiExecHash()
builds the hash table.

I had to put in a bit of a kludge to get the row count returned
for EXPLAIN ANALYZE to be correct: since ExecHash() is invoked to
return a tuple, and then MultiExecHash() is invoked, we would
return one too many tuples to EXPLAIN ANALYZE. I hacked around
this by just manually detecting this situation and subtracting 1
from the EXPLAIN ANALYZE row count.
2005-06-15 07:27:44 +00:00
Tom Lane fabef3044a Minor refactoring to eliminate duplicate code and make startup a
tad faster.
2005-05-14 21:29:23 +00:00
Tom Lane 184e7a73a5 Revise nodeMergejoin in light of example provided by Guillaume Smet.
When one side of the join has a NULL, we don't want to uselessly try
to match it against every remaining tuple of the other side.  While
at it, rewrite the comparison machinery to avoid multiple evaluations
of the left and right input expressions and to use a btree comparator
where available, instead of double operator calls.  Also revise the
state machine to eliminate redundant comparisons and hopefully make it
more readable too.
2005-05-13 21:20:16 +00:00
Tom Lane db70a31294 Adjust nodeBitmapIndexscan to keep the target index opened from plan
startup to end, rather than re-opening it in each MultiExecBitmapIndexScan
call.  I had foolishly thought that opening/closing wouldn't be much
more expensive than a rescan call, but that was sheer brain fade.

This seems to fix about half of the performance lossage reported by
Sergey Koposov.  I'm still not sure where the other half went.
2005-05-05 03:37:23 +00:00
Tom Lane bedb78d386 Implement sharable row-level locks, and use them for foreign key references
to eliminate unnecessary deadlocks.  This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE.  The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets.  When more than one transaction is holding a shared
lock on a particular row, we create a MultiXactId representing that set
of transactions and store its ID in the row's XMAX.  This scheme allows
an effectively unlimited number of row locks, just as we did before,
while not costing any extra overhead except when a shared lock actually
has to be shared.   Still TODO: use the regular lock manager to control
the grant order when multiple backends are waiting for a row lock.

Alvaro Herrera and Tom Lane.
2005-04-28 21:47:18 +00:00
Tom Lane 5b05185262 Remove support for OR'd indexscans internal to a single IndexScan plan
node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
2005-04-25 01:30:14 +00:00
Tom Lane 186655e9a5 Adjust nodeBitmapIndexscan.c to not keep the index open across calls,
but just to open and close it during MultiExecBitmapIndexScan.  This
avoids acquiring duplicate resources (eg, multiple locks on the same
relation) in a tree with many bitmap scans.  Also, don't bother to
lock the parent heap at all here, since we must be underneath a
BitmapHeapScan node that will be holding a suitable lock.
2005-04-24 18:16:38 +00:00
Tom Lane 9d64632144 Minor performance improvement: avoid unnecessary creation/unioning of
bitmaps for multiple indexscans.  Instead just let each indexscan add
TIDs directly into the BitmapOr node's result bitmap.
2005-04-20 15:48:36 +00:00
Tom Lane 4a8c5d0375 Create executor and planner-backend support for decoupled heap and index
scans, using in-memory tuple ID bitmaps as the intermediary.  The planner
frontend (path creation and cost estimation) is not there yet, so none
of this code can be executed.  I have tested it using some hacked planner
code that is far too ugly to see the light of day, however.  Committing
now so that the bulk of the infrastructure changes go in before the tree
drifts under me.
2005-04-19 22:35:18 +00:00
Tom Lane adb1a6e95b Improve EXPLAIN ANALYZE to show the time spent in each trigger when
executing a statement that fires triggers.  Formerly this time was
included in "Total runtime" but not otherwise accounted for.
As a side benefit, we avoid re-opening relations when firing non-deferred
AFTER triggers, because the trigger code can re-use the main executor's
ResultRelInfo data structure.
2005-03-25 21:58:00 +00:00
Tom Lane f97aebd162 Revise TupleTableSlot code to avoid unnecessary construction and disassembly
of tuples when passing data up through multiple plan nodes.  A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays.  Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again.  This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.)  A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.

I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '.  This provides a better match to the convention used by
ExecEvalExpr.  While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
2005-03-16 21:38:10 +00:00
Tom Lane 849074f9ae Revise hash join code so that we can increase the number of batches
on-the-fly, and thereby avoid blowing out memory when the planner has
underestimated the hash table size.  Hash join will now obey the
work_mem limit with some faithfulness.  Per my recent proposal
(hash aggregate part isn't done yet though).
2005-03-06 22:15:05 +00:00