Commit Graph

1215 Commits

Author SHA1 Message Date
Robert Haas
40b9b95769 New GUC, track_iotiming, to track I/O timings.
Currently, the only way to see the numbers this gathers is via
EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through
the stats collector and pg_stat_statements in subsequent patches.

Ants Aasma, reviewed by Greg Smith, with some further changes by me.
2012-03-27 14:55:02 -04:00
Peter Eisentraut
0e85abd658 Clean up compiler warnings from unused variables with asserts disabled
For those variables only used when asserts are enabled, use a new
macro PG_USED_FOR_ASSERTS_ONLY, which expands to
__attribute__((unused)) when asserts are not enabled.
2012-03-21 23:33:10 +02:00
Tom Lane
9dbf2b7d75 Restructure SELECT INTO's parsetree representation into CreateTableAsStmt.
Making this operation look like a utility statement seems generally a good
idea, and particularly so in light of the desire to provide command
triggers for utility statements.  The original choice of representing it as
SELECT with an IntoClause appendage had metastasized into rather a lot of
places, unfortunately, so that this patch is a great deal more complicated
than one might at first expect.

In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS
subcommands required restructuring some EXPLAIN-related APIs.  Add-on code
that calls ExplainOnePlan or ExplainOneUtility, or uses
ExplainOneQuery_hook, will need adjustment.

Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO,
which formerly were accepted though undocumented, are no longer accepted.
The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE.
The CREATE RULE case doesn't seem to have much real-world use (since the
rule would work only once before failing with "table already exists"),
so we'll not bother with that one.

Both SELECT INTO and CREATE TABLE AS still return a command tag of
"SELECT nnnn".  There was some discussion of returning "CREATE TABLE nnnn",
but for the moment backwards compatibility wins the day.

Andres Freund and Tom Lane
2012-03-19 21:38:12 -04:00
Robert Haas
2254367435 Make EXPLAIN (BUFFERS) track blocks dirtied, as well as those written.
Also expose the new counters through pg_stat_statements.

Patch by me.  Review by Fujii Masao and Greg Smith.
2012-02-22 20:33:05 -05:00
Tom Lane
398f70ec07 Preserve column names in the execution-time tupledesc for a RowExpr.
The hstore and json datatypes both have record-conversion functions that
pay attention to column names in the composite values they're handed.
We used to not worry about inserting correct field names into tuple
descriptors generated at runtime, but given these examples it seems
useful to do so.  Observe the nicer-looking results in the regression
tests whose results changed.

catversion bump because there is a subtle change in requirements for stored
rule parsetrees: RowExprs from ROW() constructs now have to include field
names.

Andrew Dunstan and Tom Lane
2012-02-14 17:34:56 -05:00
Robert Haas
af7914c662 Add TIMING option to EXPLAIN, to allow eliminating of timing overhead.
Sometimes it may be useful to get actual row counts out of EXPLAIN
(ANALYZE) without paying the cost of timing every node entry/exit.
With this patch, you can say EXPLAIN (ANALYZE, TIMING OFF) to get that.

Tomas Vondra, reviewed by Eric Theise, with minor doc changes by me.
2012-02-07 11:23:04 -05:00
Tom Lane
9bff0780cf Allow SQL-language functions to reference parameters by name.
Matthew Draper, reviewed by Hitoshi Harada
2012-02-04 19:23:49 -05:00
Tom Lane
ad10853b30 Assorted comment fixes, mostly just typos, but some obsolete statements.
YAMAMOTO Takashi
2012-01-29 19:23:56 -05:00
Tom Lane
7c1719bc68 Fix handling of data-modifying CTE subplans in EvalPlanQual.
We can't just skip initializing such subplans, because the referencing CTE
node will expect to find the subplan available when it initializes.  That
in turn means that ExecInitModifyTable must allow the case (which actually
it needed to do anyway, since there's no guarantee that ModifyTable is
exactly at the top of the CTE plan tree).  So move the complaint about not
being allowed in EvalPlanQual mode to execution instead of initialization.
Testing turned up yet another problem, which is that we'd try to
re-initialize the result relation's index list, leading to leaks and
dangling pointers.

Per report from Phil Sorber.  Back-patch to 9.1 where data-modifying CTEs
were introduced.
2012-01-28 17:43:57 -05:00
Robert Haas
9f9135d129 Instrument index-only scans to count heap fetches performed.
Patch by me; review by Tom Lane, Jeff Davis, and Peter Geoghegan.
2012-01-25 20:41:52 -05:00
Robert Haas
1575fbcb79 Prevent adding relations to a concurrently dropped schema.
In the previous coding, it was possible for a relation to be created
via CREATE TABLE, CREATE VIEW, CREATE SEQUENCE, CREATE FOREIGN TABLE,
etc.  in a schema while that schema was meanwhile being concurrently
dropped.  This led to a pg_class entry with an invalid relnamespace
value.  The same problem could occur if a relation was moved using
ALTER .. SET SCHEMA while the target schema was being concurrently
dropped.  This patch prevents both of those scenarios by locking the
schema to which the relation is being added using AccessShareLock,
which conflicts with the AccessExclusiveLock taken by DROP.

As a desirable side effect, this also prevents the use of CREATE OR
REPLACE VIEW to queue for an AccessExclusiveLock on a relation on which
you have no rights: that will now fail immediately with a permissions
error, before trying to obtain a lock.

We need similar protection for all other object types, but as everything
other than relations uses a slightly different set of code paths, I'm
leaving that for a separate commit.

Original complaint (as far as I could find) about CREATE by Nikhil
Sontakke; risk for ALTER .. SET SCHEMA pointed out by Tom Lane;
further details by Dan Farina; patch by me; review by Hitoshi Harada.
2012-01-16 09:49:34 -05:00
Tom Lane
dfd26f9c5f Make executor's SELECT INTO code save and restore original tuple receiver.
As previously coded, the QueryDesc's dest pointer was left dangling
(pointing at an already-freed receiver object) after ExecutorEnd.  It's a
bit astonishing that it took us this long to notice, and I'm not sure that
the known problem case with SQL functions is the only one.  Fix it by
saving and restoring the original receiver pointer, which seems the most
bulletproof way of ensuring any related bugs are also covered.

Per bug #6379 from Paul Ramsey.  Back-patch to 8.4 where the current
handling of SELECT INTO was introduced.
2012-01-04 18:30:55 -05:00
Bruce Momjian
e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Robert Haas
d573e239f0 Take fewer snapshots.
When a PORTAL_ONE_SELECT query is executed, we can opportunistically
reuse the parse/plan shot for the execution phase.  This cuts down the
number of snapshots per simple query from 2 to 1 for the simple
protocol, and 3 to 2 for the extended protocol.  Since we are only
reusing a snapshot taken early in the processing of the same protocol
message, the change shouldn't be user-visible, except that the remote
possibility of the planning and execution snapshots being different is
eliminated.

Note that this change does not make it safe to assume that the parse/plan
snapshot will certainly be reused; that will currently only happen if
PortalStart() decides to use the PORTAL_ONE_SELECT strategy.  It might
be worth trying to provide some stronger guarantees here in the future,
but for now we don't.

Patch by me; review by Dimitri Fontaine.
2011-12-21 09:16:55 -05:00
Peter Eisentraut
729205571e Add support for privileges on types
This adds support for the more or less SQL-conforming USAGE privilege
on types and domains.  The intent is to be able restrict which users
can create dependencies on types, which restricts the way in which
owners can alter types.

reviewed by Yeb Havinga
2011-12-20 00:05:19 +02:00
Andrew Dunstan
0f44335122 Miscellaneous cleanup to silence compiler warnings seen on Mingw.
Remove some dead code, conditionally declare some items or call
some code, and fix one or two declarations.
2011-12-10 18:15:15 -05:00
Tom Lane
c6e3ac11b6 Create a "sort support" interface API for faster sorting.
This patch creates an API whereby a btree index opclass can optionally
provide non-SQL-callable support functions for sorting.  In the initial
patch, we only use this to provide a directly-callable comparator function,
which can be invoked with a bit less overhead than the traditional
SQL-callable comparator.  While that should be of value in itself, the real
reason for doing this is to provide a datatype-extensible framework for
more aggressive optimizations, as in Peter Geoghegan's recent work.

Robert Haas and Tom Lane
2011-12-07 00:19:39 -05:00
Tom Lane
f225e4bc54 When a row fails a not-null constraint, show row's contents in errdetail.
Simple extension of previous patch for CHECK constraints.
2011-11-29 18:29:18 -05:00
Tom Lane
f1e13001b2 When a row fails a CHECK constraint, show row's contents in errdetail.
This should make it easier to identify which row is problematic when an
insert or update is processing many rows.

The formatting is similar to that for unique-index violation messages,
except that we limit field widths to 64 bytes since otherwise the message
could get unreasonably long.  (In particular, there's currently no attempt
to quote or escape field values that contain commas etc.)

Jan Kundrát, reviewed by Royce Ausburn, somewhat rewritten by me.
2011-11-29 15:02:49 -05:00
Tom Lane
9ed439a9c0 Fix unsupported options in CREATE TABLE ... AS EXECUTE.
The WITH [NO] DATA option was not supported, nor the ability to specify
replacement column names; the former limitation wasn't even documented, as
per recent complaint from Naoya Anzai.  Fix by moving the responsibility
for supporting these options into the executor.  It actually takes less
code this way ...

catversion bump due to change in representation of IntoClause, which might
affect stored rules.
2011-11-24 23:21:45 -05:00
Robert Haas
f1b4aa2a84 Check for INSERT privileges in SELECT INTO / CREATE TABLE AS.
In the normal course of events, this matters only if ALTER DEFAULT
PRIVILEGES has been used to revoke default INSERT permission.  Whether
or not the new behavior is more or less likely to be what the user wants
when dealing only with the built-in privilege facilities is arguable,
but it's clearly better when using a loadable module such as sepgsql
that may use the hook in ExecCheckRTPerms to enforce additional
permissions checks.

KaiGai Kohei, reviewed by Albe Laurenz
2011-11-22 16:16:26 -05:00
Heikki Linnakangas
4429f6a9e3 Support range data types.
Selectivity estimation functions are missing for some range type operators,
which is a TODO.

Jeff Davis
2011-11-03 13:42:15 +02:00
Tom Lane
7e3bf99baa Fix handling of PlaceHolderVars in nestloop parameter management.
If we use a PlaceHolderVar from the outer relation in an inner indexscan,
we need to reference the PlaceHolderVar as such as the value to be passed
in from the outer relation.  The previous code effectively tried to
reconstruct the PHV from its component expression, which doesn't work since
(a) the Vars therein aren't necessarily bubbled up far enough, and (b) it
would be the wrong semantics anyway because of the possibility that the PHV
is supposed to have gone to null at some point before the current join.
Point (a) led to "variable not found in subplan target list" planner
errors, but point (b) would have led to silently wrong answers.
Per report from Roger Niederland.
2011-11-03 00:50:58 -04:00
Tom Lane
336c1d7a51 Avoid assuming that index-only scan data matches the index's rowtype.
In general the data returned by an index-only scan should have the
datatypes originally computed by FormIndexDatum.  If the index opclasses
use "storage" datatypes different from their input datatypes, the scan
tuple will not have the same rowtype attributed to the index; but we had
a hard-wired assumption that that was true in nodeIndexonlyscan.c.  We'd
already hacked around the issue for the one case where the types are
different in btree indexes (btree name_ops), but this would definitely
come back to bite us if we ever implement index-only scans in GiST.

To fix, require the index AM to explicitly provide the tupdesc for the
tuple it is returning.  btree can just pass back the index's tupdesc, but
GiST will have to work harder when and if it supports index-only scans.

I had previously proposed fixing this by allowing the index AM to fill the
scan tuple slot directly; but on reflection that seemed like a module
layering violation, since TupleTableSlots are creatures of the executor.
At least in the btree case, it would also be less efficient, since the
tuple deconstruction work would occur even for rows later found to be
invisible to the scan's snapshot.
2011-10-16 19:15:04 -04:00
Tom Lane
9e8da0f757 Teach btree to handle ScalarArrayOpExpr quals natively.
This allows "indexedcol op ANY(ARRAY[...])" conditions to be used in plain
indexscans, and particularly in index-only scans.
2011-10-16 15:39:24 -04:00
Tom Lane
cb6771fb32 Generate index-only scan tuple descriptor from the plan node's indextlist.
Dept. of second thoughts: as long as we've got that tlist hanging around
anyway, we can apply ExecTypeFromTL to it to get a suitable descriptor for
the ScanTupleSlot.  This is a nicer solution than the previous one because
it eliminates some hard-wired knowledge about btree name_ops, and because
it avoids the somewhat shaky assumption that we needn't set up the scan
tuple descriptor in EXPLAIN_ONLY mode.  It doesn't change what actually
happens at run-time though, and I'm still a bit nervous about that.
2011-10-11 18:12:57 -04:00
Tom Lane
a0185461dd Rearrange the implementation of index-only scans.
This commit changes index-only scans so that data is read directly from the
index tuple without first generating a faux heap tuple.  The only immediate
benefit is that indexes on system columns (such as OID) can be used in
index-only scans, but this is necessary infrastructure if we are ever to
support index-only scans on expression indexes.  The executor is now ready
for that, though the planner still needs substantial work to recognize
the possibility.

To do this, Vars in index-only plan nodes have to refer to index columns
not heap columns.  I introduced a new special varno, INDEX_VAR, to mark
such Vars to avoid confusion.  (In passing, this commit renames the two
existing special varnos to OUTER_VAR and INNER_VAR.)  This allows
ruleutils.c to handle them with logic similar to what we use for subplan
reference Vars.

Since index-only scans are now fundamentally different from regular
indexscans so far as their expression subtrees are concerned, I also chose
to change them to have their own plan node type (and hence, their own
executor source file).
2011-10-11 14:21:30 -04:00
Tom Lane
cbfa92c23c Improve index-only scans to avoid repeated access to the index page.
We copy all the matched tuples off the page during _bt_readpage, instead of
expensively re-locking the page during each subsequent tuple fetch.  This
costs a bit more local storage, but not more than 2*BLCKSZ worth, and the
reduction in LWLock traffic is certainly worth that.  What's more, this
lets us get rid of the API wart in the original patch that said an index AM
could randomly decline to supply an index tuple despite having asserted
pg_am.amcanreturn.  That will be important for future improvements in the
index-only-scan feature, since the executor will now be able to rely on
having the index data available.
2011-10-09 00:21:08 -04:00
Tom Lane
a2822fb933 Support index-only scans using the visibility map to avoid heap fetches.
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page.  This patch depends
on the previous patches that made the visibility map reliable.

There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.

Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
2011-10-07 20:14:13 -04:00
Robert Haas
821fd903f9 Update obsolete comments.
This was partially fixed by 57fdb2b0d8,
back in 2005, but it missed a couple of spots.

YAMAMOTO Takashi
2011-09-26 13:12:22 -04:00
Tom Lane
f197272365 Make EXPLAIN ANALYZE report the numbers of rows rejected by filter steps.
This provides information about the numbers of tuples that were visited
but not returned by table scans, as well as the numbers of join tuples
that were considered and discarded within a join plan node.

There is still some discussion going on about the best way to report counts
for outer-join situations, but I think most of what's in the patch would
not change if we revise that, so I'm going to go ahead and commit it as-is.

Documentation changes to follow (they weren't in the submitted patch
either).

Marko Tiikkaja, reviewed by Marc Cousin, somewhat revised by Tom
2011-09-22 11:30:11 -04:00
Tom Lane
e6faf910d7 Redesign the plancache mechanism for more flexibility and efficiency.
Rewrite plancache.c so that a "cached plan" (which is rather a misnomer
at this point) can support generation of custom, parameter-value-dependent
plans, and can make an intelligent choice between using custom plans and
the traditional generic-plan approach.  The specific choice algorithm
implemented here can probably be improved in future, but this commit is
all about getting the mechanism in place, not the policy.

In addition, restructure the API to greatly reduce the amount of extraneous
data copying needed.  The main compromise needed to make that possible was
to split the initial creation of a CachedPlanSource into two steps.  It's
worth noting in particular that SPI_saveplan is now deprecated in favor of
SPI_keepplan, which accomplishes the same end result with zero data
copying, and no need to then spend even more cycles throwing away the
original SPIPlan.  The risk of long-term memory leaks while manipulating
SPIPlans has also been greatly reduced.  Most of this improvement is based
on use of the recently-added MemoryContextSetParent primitive.
2011-09-16 00:43:52 -04:00
Tom Lane
1609797c25 Clean up the #include mess a little.
walsender.h should depend on xlog.h, not vice versa.  (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.)  Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h.  Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.

This episode reinforces my feeling that pgrminclude should not be run
without adult supervision.  Inclusion changes in header files in particular
need to be reviewed with great care.  More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.
2011-09-04 01:13:16 -04:00
Tom Lane
b3aaf9081a Rearrange planner to save the whole PlannerInfo (subroot) for a subquery.
Formerly, set_subquery_pathlist and other creators of plans for subqueries
saved only the rangetable and rowMarks lists from the lower-level
PlannerInfo.  But there's no reason not to remember the whole PlannerInfo,
and indeed this turns out to simplify matters in a number of places.

The immediate reason for doing this was so that the subroot will still be
accessible when we're trying to extract column statistics out of an
already-planned subquery.  But now that I've done it, it seems like a good
code-beautification effort in its own right.

I also chose to get rid of the transient subrtable and subrowmark fields in
SubqueryScan nodes, in favor of having setrefs.c look up the subquery's
RelOptInfo.  That required changing all the APIs in setrefs.c to pass
PlannerInfo not PlannerGlobal, which was a large but quite mechanical
transformation.

One side-effect not foreseen at the beginning is that this finally broke
inheritance_planner's assumption that replanning the same subquery RTE N
times would necessarily give interchangeable results each time.  That
assumption was always pretty risky, but now we really have to make a
separate RTE for each instance so that there's a place to carry the
separate subroots.
2011-09-03 15:36:24 -04:00
Bruce Momjian
6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Tom Lane
b33f78df17 Fix trigger WHEN conditions when both BEFORE and AFTER triggers exist.
Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER
ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger
fired for the same update.  Fix by not trying to overload the use of
estate->es_trig_tuple_slot.  Per report from Yoran Heling.

Back-patch to 9.0, when trigger WHEN conditions were introduced.
2011-08-21 18:15:55 -04:00
Heikki Linnakangas
89df948ec2 Avoid integer overflow when LIMIT + OFFSET >= 2^63.
This fixes bug #6139 reported by Hitoshi Harada.
2011-08-02 10:47:17 +03:00
Alvaro Herrera
b93f5a5673 Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.h
This lets us stop including rel.h into execnodes.h, which is a widely
used header.
2011-07-04 14:35:58 -04:00
Robert Haas
5da79169d3 Fix bugs in relpersistence handling during table creation.
Unlike the relistemp field which it replaced, relpersistence must be
set correctly quite early during the table creation process, as we
rely on it quite early on for a number of purposes, including security
checks.  Normally, this is set based on whether the user enters CREATE
TABLE, CREATE UNLOGGED TABLE, or CREATE TEMPORARY TABLE, but a
relation may also be made implicitly temporary by creating it in
pg_temp.  This patch fixes the handling of that case, and also
disables creation of unlogged tables in temporary tablespace (such
table indeed skip WAL-logging, but we reject an explicit
specification) and creation of relations in the temporary schemas of
other sessions (which is not very sensible, and didn't work right
anyway).

Report by Amit Khandekar.
2011-07-03 17:34:47 -04:00
Heikki Linnakangas
cd70dd6bef Move the PredicateLockRelation() call from nodeSeqscan.c to heapam.c. It's
more consistent that way, since all the other PredicateLock* calls are
made in various heapam.c and index AM functions. The call in nodeSeqscan.c
was unnecessarily aggressive anyway, there's no need to try to lock the
relation every time a tuple is fetched, it's enough to do it once.

This has the user-visible effect that if a seq scan is initialized in the
executor, but never executed, we now acquire the predicate lock on the heap
relation anyway. We could avoid that by taking the lock on the first
heap_getnext() call instead, but it doesn't seem worth the trouble given
that it feels more natural to do it in heap_beginscan().

Also, remove the retail PredicateLockTuple() calls from heap_getnext(). In
a seqscan, started with heap_begin(), we're holding a whole-relation
predicate lock on the heap so there's no need to lock the tuples
individually.

Kevin Grittner and me
2011-06-29 21:57:43 +03:00
Heikki Linnakangas
d9fe63acb0 Grab predicate locks on matching tuples in a lossy bitmap heap scan.
Non-lossy case was already handled correctly.

Kevin Grittner
2011-06-29 21:50:42 +03:00
Robert Haas
4da99ea423 Avoid having two copies of the HOT-chain search logic.
It's been like this since HOT was originally introduced, but the logic
is complex enough that this is a recipe for bugs, as we've already
found out with SSI.  So refactor heap_hot_search_buffer() so that it
can satisfy the needs of index_getnext(), and make index_getnext() use
that rather than duplicating the logic.

This change was originally proposed by Heikki Linnakangas as part of a
larger refactoring oriented towards allowing index-only scans.  I
extracted and adjusted this part, since it seems to have independent
merit.  Review by Jeff Davis.
2011-06-27 10:27:17 -04:00
Alvaro Herrera
a40a5d9468 Remove extra copying of TupleDescs for heap_create_with_catalog
Some callers were creating copies of tuple descriptors to pass to that
function, stating in code comments that it was necessary because it
modified the passed descriptor.  Code inspection reveals this not to be
true, and indeed not all callers are passing copies in the first place.
So remove the extra ones and the misleading comments about this behavior
as well.
2011-06-20 10:50:23 -04:00
Tom Lane
307a4c2cbb Remove another no-longer-needed inclusion of predicate.h. 2011-06-16 11:43:30 -04:00
Heikki Linnakangas
0a0e2b52a5 Make non-MVCC snapshots exempt from predicate locking. Scans with non-MVCC
snapshots, like in REINDEX, are basically non-transactional operations. The
DDL operation itself might participate in SSI, but there's separate
functions for that.

Kevin Grittner and Dan Ports, with some changes by me.
2011-06-15 12:11:18 +03:00
Bruce Momjian
6560407c7d Pgindent run before 9.1 beta2. 2011-06-09 14:32:50 -04:00
Tom Lane
21538377ee Disallow SELECT FOR UPDATE/SHARE on sequences.
We can't allow this because such an operation stores its transaction XID
into the sequence tuple's xmax.  Because VACUUM doesn't process sequences
(and we don't want it to start doing so), such an xmax value won't get
frozen, meaning it will eventually refer to nonexistent pg_clog storage,
and even wrap around completely.  Since the row lock is ignored by nextval
and setval, the usefulness of the operation is highly debatable anyway.
Per reports of trouble with pgpool 3.0, which had ill-advisedly started
using such commands as a form of locking.

In HEAD, also disallow SELECT FOR UPDATE/SHARE on toast tables.  Although
this does work safely given the current implementation, there seems no
good reason to allow it.  I refrained from changing that behavior in
back branches, however.
2011-06-02 14:46:15 -04:00
Tom Lane
0c99d41ec8 Allow hash joins to be interrupted while searching hash table for match.
Per experimentation with a recent example, in which unreasonable amounts
of time could elapse before the backend would respond to a query-cancel.

This might be something to back-patch, but the patch doesn't apply cleanly
because this code was rewritten for 9.1.  Given the lack of field
complaints I won't bother for now.

Cédric Villemain
2011-06-01 17:01:59 -04:00
Tom Lane
299d171652 Install defenses against overflow in BuildTupleHashTable().
The planner can sometimes compute very large values for numGroups, and in
cases where we have no alternative to building a hashtable, such a value
will get fed directly to BuildTupleHashTable as its nbuckets parameter.
There were two ways in which that could go bad.  First, BuildTupleHashTable
declared the parameter as "int" but most callers were passing "long"s,
so on 64-bit machines undetected overflow could occur leading to a bogus
negative value.  The obvious fix for that is to change the parameter to
"long", which is what I've done in HEAD.  In the back branches that seems a
bit risky, though, since third-party code might be calling this function.
So for them, just put in a kluge to treat negative inputs as INT_MAX.
Second, hash_create can go nuts with extremely large requested table sizes
(notably, my_log2 becomes an infinite loop for inputs larger than
LONG_MAX/2).  What seems most appropriate to avoid that is to bound the
initial table size request to work_mem.

This fixes bug #6035 reported by Daniel Schreiber.  Although the reported
case only occurs back to 8.4 since it involves WITH RECURSIVE, I think
it's a good idea to install the defenses in all supported branches.
2011-05-23 12:52:46 -04:00
Heikki Linnakangas
0319da638f Reset per-tuple memory context between every row in a scan node, even when
there's no quals or projections. Currently this only matters for foreign
scans, as none of the other scan nodes litter the per-tuple memory context
when there's no quals or projections.
2011-05-21 14:30:11 -04:00