Commit Graph

130 Commits

Author SHA1 Message Date
Tom Lane 53e757689c Make NestLoop plan nodes pass outer-relation variables into their inner
relation using the general PARAM_EXEC executor parameter mechanism, rather
than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
The previous method was hard to understand and could never be extended to
handle parameters coming from multiple join levels.  This patch doesn't
change the set of possible plans nor have any significant performance effect,
but it's necessary infrastructure for future generalization of the concept
of an inner indexscan plan.

ExecReScan's second parameter is now unused, so it's removed.
2010-07-12 17:01:06 +00:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Robert Haas e26c539e9f Wrap calls to SearchSysCache and related functions using macros.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4).  This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.

Design and review by Tom Lane.
2010-02-14 18:42:19 +00:00
Robert Haas 42a8ab0a14 Augment EXPLAIN output with more details on Hash nodes.
We show the number of buckets, the number of batches (and also the original
number if it has changed), and the peak space used by the hash table.  Minor
executor changes to track peak space used.
2010-02-01 15:43:36 +00:00
Tom Lane 40608e7f94 When estimating the selectivity of an inequality "column > constant" or
"column < constant", and the comparison value is in the first or last
histogram bin or outside the histogram entirely, try to fetch the actual
column min or max value using an index scan (if there is an index on the
column).  If successful, replace the lower or upper histogram bound with
that value before carrying on with the estimate.  This limits the
estimation error caused by moving min/max values when the comparison
value is close to the min or max.  Per a complaint from Josh Berkus.

It is tempting to consider using this mechanism for mergejoinscansel as well,
but that would inject index fetches into main-line join estimation not just
endpoint cases.  I'm refraining from that until we can get a better handle
on the costs of doing this type of lookup.
2010-01-04 02:44:40 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Tom Lane 649b5ec7c8 Add the ability to store inheritance-tree statistics in pg_statistic,
and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.

autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.
2009-12-29 20:11:45 +00:00
Tom Lane 8442317beb Make the overflow guards in ExecChooseHashTableSize be more protective.
The original coding ensured nbuckets and nbatch didn't exceed INT_MAX,
which while not insane on its own terms did nothing to protect subsequent
code like "palloc(nbatch * sizeof(BufFile *))".  Since enormous join size
estimates might well be planner error rather than reality, it seems best
to constrain the initial sizes to be not more than work_mem/sizeof(pointer),
thus ensuring the allocated arrays don't exceed work_mem.  We will allow
nbatch to get bigger than that during subsequent ExecHashIncreaseNumBatches
calls, but we should still guard against integer overflow in those palloc
requests.  Per bug #5145 from Bernt Marius Johnsen.

Although the given test case only seems to fail back to 8.2, previous
releases have variants of this issue, so patch all supported branches.
2009-10-30 20:58:45 +00:00
Tom Lane 421d7d8edb Remove no-longer-needed ExecCountSlots infrastructure. 2009-09-27 21:10:53 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Bruce Momjian 0e550ff617 Revert DTrace patch from Robert Lor 2009-04-02 20:59:10 +00:00
Bruce Momjian 227f817c1f Add support for additional DTrace probes.
Robert Lor
2009-04-02 19:14:34 +00:00
Tom Lane 596efd27ed Optimize multi-batch hash joins when the outer relation has a nonuniform
distribution, by creating a special fast path for the (first few) most common
values of the outer relation.  Tuples having hashvalues matching the MCVs
are effectively forced to be in the first batch, so that we never write
them out to the batch temp files.

Bryce Cutt and Ramon Lawrence, with some editorialization by me.
2009-03-21 00:04:40 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane 24ee8af573 Rework temp_tablespaces patch so that temp tablespaces are assigned separately
for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.)  Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.
2007-06-07 19:19:57 +00:00
Tom Lane acfce502ba Create a GUC parameter temp_tablespaces that allows selection of the
tablespace(s) in which to store temp tables and temporary files.  This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created).  Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.

Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
2007-06-03 17:08:34 +00:00
Tom Lane bd2c980b22 Buy back some of the cycles spent in more-expensive hash functions by
selecting power-of-2, rather than prime, numbers of buckets in hash joins.
If the hash functions are doing their jobs properly by making all hash bits
equally random, this is good enough, and it saves expensive integer division
and modulus operations.
2007-06-01 17:38:44 +00:00
Tom Lane 3c5985b473 Fix bug I introduced in recent patch to make hash joins discard null tuples
immediately: ExecHashGetHashValue failed to restore the caller's memory
context before taking the failure exit.
2007-02-22 22:49:27 +00:00
Tom Lane a635c08fa1 Add support for cross-type hashing in hash index searches and hash joins.
Hashing for aggregation purposes still needs work, so it's not time to
mark any cross-type operators as hashable for general use, but these cases
work if the operators are so marked by hand in the system catalogs.
2007-01-30 01:33:36 +00:00
Tom Lane b39e91501c Improve hash join to discard input tuples immediately if they can't
match because they contain a null join key (and the join operator is
known strict).  Improves performance significantly when the inner
relation contains a lot of nulls, as per bug #2930.
2007-01-28 23:21:26 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Bruce Momjian 03c2e5924e Add additional includes needed on some platforms. 2006-07-14 04:44:46 +00:00
Bruce Momjian fad1ea86bd Move math.h after postgresql.h 2006-07-13 20:14:12 +00:00
Bruce Momjian a22d76d96a Allow include files to compile own their own.
Strip unused include files out unused include files, and add needed
includes to C files.

The next step is to remove unused include files in C files.
2006-07-13 16:49:20 +00:00
Tom Lane 69d0a15e2a Convert hash join code to use MinimalTuple format in tuple hash table
and batch files.  Should reduce memory and I/O demands for such joins.
2006-06-27 21:31:20 +00:00
Bruce Momjian 87bd07d979 Make EXPLAIN sampling smarter, to avoid excessive sampling delay.
Martijn van Oosterhout
2006-05-30 14:01:58 +00:00
Tom Lane 798e63ffb0 Remove CXT_printf/CXT1_printf macros. If anyone had found them to be of
any use in the past many years, we'd have made some effort to include
them in all executor node types; but in fact they were only in
nodeAppend.c and nodeIndexscan.c, up until I copied nodeIndexscan.c's
occurrence into the new bitmap node types.  Remove some other unused
macros in execdebug.h, too.  Some day the whole header probably ought to
go away in favor of better-designed facilities.
2006-05-23 15:21:52 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane 2c0ef9777c Extend the ExecInitNode API so that plan nodes receive a set of flag
bits indicating which optional capabilities can actually be exercised
at runtime.  This will allow Sort and Material nodes, and perhaps later
other nodes, to avoid unnecessary overhead in common cases.
This commit just adds the infrastructure and arranges to pass the correct
flag values down to plan nodes; none of the actual optimizations are here
yet.  I'm committing this separately in case anyone wants to measure the
added overhead.  (It should be negligible.)

Simon Riggs and Tom Lane
2006-02-28 04:10:28 +00:00
Tom Lane 4dd2048a47 Get rid of ExecAssignResultTypeFromOuterPlan() and make all plan node types
generate their output tuple descriptors from their target lists (ie, using
ExecAssignResultTypeFromTL()).  We long ago fixed things so that all node
types have minimally valid tlists, so there's no longer any good reason to
have two different ways of doing it.  This change is needed to fix bug
reported by Hayden James: the fix of 2005-11-03 to emit the correct column
names after optimizing away a SubqueryScan node didn't work if the new
top-level plan node used ExecAssignResultTypeFromOuterPlan to generate its
tupdesc, since the next plan node down won't have the correct column labels.
2005-11-23 20:27:58 +00:00
Bruce Momjian 436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Tom Lane dd218ae7b0 Remove the t_datamcxt field of HeapTupleData. This was introduced for
the convenience of tuptoaster.c and is no longer needed, so may as well
get rid of some small amount of overhead.
2005-11-20 19:49:08 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane e990b9ce23 The original patch to avoid building a hash join's hashtable when the
outer relation is empty did not work, per test case from Patrick Welche.
It tried to use nodeHashjoin.c's high-level mechanisms for fetching an
outer-relation tuple, but that code expected the hash table to be filled
already.  As patched, the code failed in corner cases such as having no
outer-relation tuples for the first hash batch.  Revert and rewrite.
2005-09-25 19:37:35 +00:00
Neil Conway c119c5bd49 Change the implementation of hash join to attempt to avoid unnecessary
work if either of the join relations are empty. The logic is:

(1) if the inner relation's startup cost is less than the outer
    relation's startup cost and this is not an outer join, read
    a single tuple from the inner relation via ExecHash()
      - if NULL, we're done

(2) read a single tuple from the outer relation
      - if NULL, we're done

(3) build the hash table on the inner relation
      - if hash table is empty and this is not an outer join,
        we're done

(4) otherwise, do hash join as usual

The implementation uses the new MultiExecProcNode API, per a
suggestion from Tom: invoking ExecHash() now produces the first
tuple from the Hash node's child node, whereas MultiExecHash()
builds the hash table.

I had to put in a bit of a kludge to get the row count returned
for EXPLAIN ANALYZE to be correct: since ExecHash() is invoked to
return a tuple, and then MultiExecHash() is invoked, we would
return one too many tuples to EXPLAIN ANALYZE. I hacked around
this by just manually detecting this situation and subtracting 1
from the EXPLAIN ANALYZE row count.
2005-06-15 07:27:44 +00:00
Tom Lane d8b1bf4791 Create a new 'MultiExecProcNode' call API for plan nodes that don't
return just a single tuple at a time.  Currently the only such node
type is Hash, but I expect we will soon have indexscans that can return
tuple bitmaps.  A side benefit is that EXPLAIN ANALYZE now shows the
correct tuple count for a Hash node.
2005-04-16 20:07:35 +00:00
Neil Conway aeb502346b Minor code cleanup: ExecHash() was returning a null TupleTableSlot, and an
old comment in the code claimed that this was necessary. Since it is not
actually necessary any more, it is clearer to remove the comment and
just return NULL instead -- the return value of ExecHash() is not used.
2005-03-31 02:02:52 +00:00
Tom Lane f97aebd162 Revise TupleTableSlot code to avoid unnecessary construction and disassembly
of tuples when passing data up through multiple plan nodes.  A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays.  Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again.  This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.)  A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.

I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '.  This provides a better match to the convention used by
ExecEvalExpr.  While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
2005-03-16 21:38:10 +00:00
Tom Lane dffbbb3e55 Forgot that I had intended to replace division by masking in hash calculation. 2005-03-13 19:59:40 +00:00
Tom Lane 849074f9ae Revise hash join code so that we can increase the number of batches
on-the-fly, and thereby avoid blowing out memory when the planner has
underestimated the hash table size.  Hash join will now obey the
work_mem limit with some faithfulness.  Per my recent proposal
(hash aggregate part isn't done yet though).
2005-03-06 22:15:05 +00:00
PostgreSQL Daemon 2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Tom Lane 9fcbe2af11 Arrange for hash join to skip scanning the outer relation if it detects
that the inner one is completely empty.  Per recent discussion.  Also some
cosmetic cleanups in nearby code.
2004-09-22 19:13:52 +00:00
Bruce Momjian da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Neil Conway 72b6ad6313 Use the new List API function names throughout the backend, and disable the
list compatibility API by default. While doing this, I decided to keep
the llast() macro around and introduce llast_int() and llast_oid() variants.
2004-05-30 23:40:41 +00:00
Neil Conway d0b4399d81 Reimplement the linked list data structure used throughout the backend.
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.

The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
2004-05-26 04:41:50 +00:00
Tom Lane c1352052ef Replace the switching function ExecEvalExpr() with a macro that jumps
directly to the appropriate per-node execution function, using a function
pointer stored by ExecInitExpr.  This speeds things up by eliminating one
level of function call.  The function-pointer technique also enables further
small improvements such as only making one-time tests once (and then
changing the function pointer).  Overall this seems to gain about 10%
on evaluation of simple expressions, which isn't earthshaking but seems
a worthwhile gain for a relatively small hack.  Per recent discussion
on pghackers.
2004-03-17 01:02:24 +00:00
Tom Lane 391c3811a2 Rename SortMem and VacuumMem to work_mem and maintenance_work_mem.
Make btree index creation and initial validation of foreign-key constraints
use maintenance_work_mem rather than work_mem as their memory limit.
Add some code to guc.c to allow these variables to be referenced by their
old names in SHOW and SET commands, for backwards compatibility.
2004-02-03 17:34:04 +00:00
PostgreSQL Daemon 969685ad44 $Header: -> $PostgreSQL Changes ... 2003-11-29 19:52:15 +00:00