Fix all the standard PLs to be able to return tuples from FOO_RETURNING
statements as well as utility statements that return tuples. Also,
fix oversight that SPI_processed wasn't set for a utility statement
returning tuples. Per recent discussion.
cannot assume that there's exactly one Query in the Portal, as we can for
ONE_SELECT mode, because non-SELECT queries might have extra queries added
during rule rewrites. Fix things up so that we'll use ONE_RETURNING mode
when a Portal contains one primary (canSetTag) query and that query has
a RETURNING list. This appears to be a second showstopper reason for running
the Portal to completion before we start to hand anything back --- we want
to be sure that the rule-added queries get run too.
_SPI_execute_plan's return code should reflect the type of the query
that is marked canSetTag, not necessarily the last one in the list.
This is arguably a bug fix, but I'm hesitant to back-patch it because
it's the sort of subtle change that might break someone's code, and it's
best not to do that kind of thing in point releases.
merely a matter of fixing the error check, since the underlying Portal
infrastructure already handles it. This in turn allows these statements
to be used in some existing plpgsql and plperl contexts, such as a
plpgsql FOR loop. Also, do some marginal code cleanup in places that
were being sloppy about distinguishing SELECT from SELECT INTO.
plpgsql support to come later. Along the way, convert execMain's
SELECT INTO support into a DestReceiver, in order to eliminate some ugly
special cases.
Jonah Harris and Tom Lane
o print user name for all
o print portal name if defined for all
o print query for all
o reduce log_statement header to single keyword
o print bind parameters as DETAIL if text mode
that's shorter-lived than the expression state being evaluated in it really
doesn't work :-( --- we end up with fn_extra caches getting deleted while
still in use. Rather than abandon the notion of caching expression state
across domain_in calls altogether, I chose to make domain_in a bit cozier
with ExprContext. All we really need for evaluating variable-free
expressions is an ExprContext, not an EState, so I invented the notion of a
"standalone" ExprContext. domain_in can prevent resource leakages by doing
a ReScanExprContext on this rather than having to free it entirely; so we
can make the ExprContext have the same lifespan (and particularly the same
per_query memory context) as the expression state structs.
temporary context that can be reset when advancing to the next sublist.
This is faster and more thorough at recovering space than the previous
method; moreover it will do the right thing if something in the sublist
tries to register an expression context callback.
(e.g. "INSERT ... VALUES (...), (...), ...") and elsewhere as allowed
by the spec. (e.g. similar to a FROM clause subselect). initdb required.
Joe Conway and Tom Lane.
(table or index) before trying to open its relcache entry. This fixes
race conditions in which someone else commits a change to the relation's
catalog entries while we are in process of doing relcache load. Problems
of that ilk have been reported sporadically for years, but it was not
really practical to fix until recently --- for instance, the recent
addition of WAL-log support for in-place updates helped.
Along the way, remove pg_am.amconcurrent: all AMs are now expected to support
concurrent update.
created in the bootstrap phase proper, rather than added after-the-fact
by initdb. This is cleaner than before because it allows us to retire the
undocumented ALTER TABLE ... CREATE TOAST TABLE command, but the real reason
I'm doing it is so that toast tables of shared catalogs will now have
predetermined OIDs. This will allow a reasonably clean solution to the
problem of locking tables before we load their relcache entries, to appear
in a forthcoming patch.
the opportunity to treat COUNT(*) as a zero-argument aggregate instead
of the old hack that equated it to COUNT(1); this is materially cleaner
(no more weird ANYOID cases) and ought to be at least a tiny bit faster.
Original patch by Sergey Koposov; review, documentation, simple regression
tests, pg_dump and psql support by moi.
eliminate unnecessary code, force initdb because stored rules change
(limit nodes are now supposed to be int8 not int4 expressions).
Update comments and error messages, which still all said 'integer'.
discussion (including making def_arg allow reserved words), add missed
opt_definition for UNIQUE case. Put the reloptions support code in a less
random place (I chose to make a new file access/common/reloptions.c).
Eliminate header inclusion creep. Make the index options functions safely
user-callable (seems like client apps might like to be able to test validity
of options before trying to make an index). Reduce overhead for normal case
with no options by allowing rd_options to be NULL. Fix some unmaintainably
klugy code, including getting rid of Natts_pg_class_fixed at long last.
Some stylistic cleanup too, and pay attention to keeping comments in sync
with code.
Documentation still needs work, though I did fix the omissions in
catalogs.sgml and indexam.sgml.
tuple hash table entries. This addresses the problem previously noted
that use of a 'physical tlist' in the input scan node could bloat the
hash table entries far beyond what the planner expects. It's a better
answer than my previous thought of undoing the physical tlist optimization,
because we can also remove columns that are needed to compute the aggregate
functions but aren't part of the grouping column set.
per-tuple space overhead for sorts in memory. I chose to replace the
previous patch that tried to write out the bare minimum amount of data
when sorting on disk; instead, just dump the MinimalTuples as-is. This
wastes 3 to 10 bytes per tuple depending on architecture and null-bitmap
length, but the simplification in the writetup/readtup routines seems
worth it.
tuples with less header overhead than a regular HeapTuple, per my
recent proposal. Teach TupleTableSlot code how to deal with these.
As proof of concept, change tuplestore.c to store MinimalTuples instead
of HeapTuples. Future patches will expand the concept to other places
where it is useful.
aggregates. We just disallowed that, and AFAICS there should be no other
cases where direct (non-aggregated) references to input columns are allowed
in a query with aggregation and no GROUP BY.
by creating a reference-count mechanism, similar to what we did a long time
ago for catcache entries. The back branches have an ugly solution involving
lots of extra copies, but this way is more efficient. Reference counting is
only applied to tupdescs that are actually in caches --- there seems no need
to use it for tupdescs that are generated in the executor, since they'll go
away during plan shutdown by virtue of being in the per-query memory context.
Neil Conway and Tom Lane
it is just the total time to do INSTR_TIME_SET_CURRENT(), and not any of
the other code involved in InstrStartNode/InstrStopNode. Even though I
fear we may end up reverting this patch altogether, we may as well have
the most correct version in our CVS archive.
any use in the past many years, we'd have made some effort to include
them in all executor node types; but in fact they were only in
nodeAppend.c and nodeIndexscan.c, up until I copied nodeIndexscan.c's
occurrence into the new bitmap node types. Remove some other unused
macros in execdebug.h, too. Some day the whole header probably ought to
go away in favor of better-designed facilities.
support both FOR UPDATE and FOR SHARE in one command, as well as both
NOWAIT and normal WAIT behavior. The more general code is actually
simpler and cleaner.
not named ones, and replace linear searches of the list with array indexing.
The named-parameter support has been dead code for many years anyway,
and recent profiling suggests that the searching was costing a noticeable
amount of performance for complex queries.
functions are not strict, they will be called (passing a NULL first parameter)
during any attempt to input a NULL value of their datatype. Currently, all
our input functions are strict and so this commit does not change any
behavior. However, this will make it possible to build domain input functions
that centralize checking of domain constraints, thereby closing numerous holes
in our domain support, as per previous discussion.
While at it, I took the opportunity to introduce convenience functions
InputFunctionCall, OutputFunctionCall, etc to use in code that calls I/O
functions. This eliminates a lot of grotty-looking casts, but the main
motivation is to make it easier to grep for these places if we ever need
to touch them again.
2005-05-13. When we find that a new inner tuple can't possibly match any
outer tuple (because it contains a NULL), we can't immediately skip the
tuple when we are in NEXTINNER state. Doing so can lead to emitting
multiple copies of the tuple in FillInner mode, because we may rescan the
tuple after returning to a previous marked tuple. Instead, proceed to
NEXTOUTER state the same as we used to do. After we've found that there's
no need to return to the marked position, we can go to SKIPINNER_ADVANCE
state instead of SKIP_TEST when the inner tuple is unmatchable; this
preserves the performance improvement. Per bug report from Bruce.
I also made a couple of cosmetic code rearrangements and added a regression
test for the problem.
The original coding stored the raw parser output (ColumnDef and TypeName
nodes) which was ugly, bulky, and wrong because it failed to create any
dependency on the referenced datatype --- and in fact would not track type
renamings and suchlike. Instead store a list of column type OIDs in the
RTE.
Also fix up general failure of recordDependencyOnExpr to do anything sane
about recording dependencies on datatypes. While there are many cases where
there will be an indirect dependency (eg if an operator returns a datatype,
the dependency on the operator is enough), we do have to record the datatype
as a separate dependency in examples like CoerceToDomain.
initdb forced because of change of stored rules.
during parse analysis, not only errors detected in the flex/bison stages.
This is per my earlier proposal. This commit includes all the basic
infrastructure, but locations are only tracked and reported for errors
involving column references, function calls, and operators. More could
be done later but this seems like a good set to start with. I've also
moved the ReportSyntaxErrorPosition logic out of psql and into libpq,
which should make it available to more people --- even within psql this
is an improvement because warnings weren't handled by ReportSyntaxErrorPosition.
bits indicating which optional capabilities can actually be exercised
at runtime. This will allow Sort and Material nodes, and perhaps later
other nodes, to avoid unnecessary overhead in common cases.
This commit just adds the infrastructure and arranges to pass the correct
flag values down to plan nodes; none of the actual optimizations are here
yet. I'm committing this separately in case anyone wants to measure the
added overhead. (It should be negligible.)
Simon Riggs and Tom Lane
each tuple, as per my proposal of several days ago. Also, clean up
sort memory management by keeping all working data in a separate memory
context, and refine the handling of low-memory conditions.
possible ScanDirection alternatives rather than magic numbers
(-1, 0, 1). Also, use the ScanDirection macros in a few places
rather than directly checking whether `dir == ForwardScanDirection'
and the like. Per patch from James William Pye. His patch also
changed ScanDirection to be a "char" rather than an enum, which
I haven't applied.
relations: fix the executor so that we can have an Append plan on the
inside of a nestloop and still pass down outer index keys to index scans
within the Append, then generate such plans as if they were regular
inner indexscans. This avoids the need to evaluate the outer relation
multiple times.
cursors. Patch from Joachim Wieland, review and ediorialization by Neil
Conway. The view lists cursors defined by DECLARE CURSOR, using SPI, or
via the Bind message of the frontend/backend protocol. This means the
view does not list the unnamed portal or the portal created to implement
EXECUTE. Because we do list SPI portals, there might be more rows in
this view than you might expect if you are using SPI implicitly (e.g.
via a procedural language).
Per recent discussion on -hackers, the query string included in the
view for cursors defined by DECLARE CURSOR is based on
debug_query_string. That means it is not accurate if multiple queries
separated by semicolons are submitted as one query string. However,
there doesn't seem a trivial fix for that: debug_query_string
is better than nothing. I also changed SPI_cursor_open() to include
the source text for the portal it creates: AFAICS there is no reason
not to do this.
Update the documentation and regression tests, bump the catversion.
isn't being used anywhere anymore, and there seems no point in a generic
index_keytest() routine when two out of three remaining access methods
aren't using it. Also, add a comment documenting a convention for
letting access methods define private flag bits in ScanKey sk_flags.
There are no such flags at the moment but I'm thinking about changing
btree's handling of "required keys" to use flag bits in the keys
rather than a count of required key positions. Also, if some AM did
still want SK_NEGATE then it would be reasonable to treat it as a private
flag bit.
our own command (or more generally, xmin = our xact and cmin >= current
command ID) should not be seen as good. Else we may try to update rows
we already updated. This error was inserted last August while fixing the
even bigger problem that the old coding wouldn't see *any* tuples inserted
by our own transaction as good. Per report from Euler Taveira de Oliveira.
rather than "return expr;" -- the latter style is used in most of the
tree. I kept the parentheses when they were necessary or useful because
the return expression was complex.
(previously we only did = and <> correctly). Also, allow row comparisons
with any operators that are in btree opclasses, not only those with these
specific names. This gets rid of a whole lot of indefensible assumptions
about the behavior of particular operators based on their names ... though
it's still true that IN and NOT IN expand to "= ANY". The patch adds a
RowCompareExpr expression node type, and makes some changes in the
representation of ANY/ALL/ROWCOMPARE SubLinks so that they can share code
with RowCompareExpr.
I have not yet done anything about making RowCompareExpr an indexable
operator, but will look at that soon.
initdb forced due to changes in stored rules.
if we already have a stronger lock due to the index's table being the
update target table of the query. Same optimization I applied earlier
at the table level. There doesn't seem to be much interest in the more
radical idea of not locking indexes at all, so do what we can ...
relation if it's already been locked by execMain.c as either a result
relation or a FOR UPDATE/SHARE relation. This avoids an extra trip to
the shared lock manager state. Per my suggestion yesterday.
child plan nodes until we have acquired lock on the relation to scan.
The relative order of initialization of plan nodes isn't real important in
other cases, but it's critical here because one is supposed to lock a
relation before its indexes, not vice versa. The original coding was at
least vulnerable to deadlock against DROP INDEX, and perhaps worse things.
it's worth probing the outer relation for emptiness before building the
hash table. To wit, if we're rescanning a join previously performed,
remember whether we found it nonempty the previous time, and don't bother
with the probe if it was nonempty. This buys back the performance lost
in examples like Mario Weilguni's.
one child or the other had a problem: they did not leave the node in a
state that ExecReScanHashJoin would understand. In particular it would
tend to fail to reset the child plans when needed. Per report from
Mario Weilguni.
"ctid IN (list)" will still work after we convert IN to ScalarArrayOpExpr.
Make some minor efficiency improvements while at it, such as ensuring that
multiple TIDs are fetched in physical heap order. And fix EXPLAIN so that
it shows what's really going on for a TID scan.
when we first read the page, rather than checking them one at a time.
This allows us to take and release the buffer content lock just once
per page, instead of once per tuple. Since it's a shared lock the
contention penalty for holding the lock longer shouldn't be too bad.
We can safely do this only when using an MVCC snapshot; else the
assumption that visibility won't change over time is uncool. Therefore
there are now two code paths depending on the snapshot type. I also
made the same change in nodeBitmapHeapscan.c, where it can be done always
because we only support MVCC snapshots for bitmap scans anyway.
Also make some incidental cleanups in the APIs of these functions.
Per a suggestion from Qingqing Zhou.
qualification when the underlying operator is indexable and useOr is true.
That is, indexkey op ANY (ARRAY[...]) is effectively translated into an
OR combination of one indexscan for each array element. This only works
for bitmap index scans, of course, since regular indexscans no longer
support OR'ing of scans. There are still some loose ends to clean up
before changing 'x IN (list)' to translate as a ScalarArrayOpExpr;
for instance predtest.c ought to be taught about it. But this gets the
basic functionality in place.
a TupleTableSlot: instead of calling ExecClearTuple, inline the needed
operations, so that we can avoid redundant steps. In particular, when
the old and new tuples are both on the same disk page, avoid releasing
and re-acquiring the buffer pin --- this saves work in both the bufmgr
and ResourceOwner modules. To make this improvement actually useful,
partially revert a change I made on 2004-04-21 that caused SeqNext
et al to call ExecClearTuple before ExecStoreTuple. The motivation
for that, to avoid grabbing the BufMgrLock separately for releasing
the old buffer and grabbing the new one, no longer applies. My
profiling says that this saves about 5% of the CPU time for an
all-in-memory seqscan.
generate their output tuple descriptors from their target lists (ie, using
ExecAssignResultTypeFromTL()). We long ago fixed things so that all node
types have minimally valid tlists, so there's no longer any good reason to
have two different ways of doing it. This change is needed to fix bug
reported by Hayden James: the fix of 2005-11-03 to emit the correct column
names after optimizing away a SubqueryScan node didn't work if the new
top-level plan node used ExecAssignResultTypeFromOuterPlan to generate its
tupdesc, since the next plan node down won't have the correct column labels.
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
tuple in-place, but instead passes back an all-new tuple structure if
any changes are needed. This is a much cleaner and more robust solution
for the bug discovered by Alexey Beschiokov; accordingly, revert the
quick hack I installed yesterday.
With this change, HeapTupleData.t_datamcxt is no longer needed; will
remove it in a separate commit in HEAD only.
doing heap_insert or heap_update, wipe out any extracted fields in
the TupleTableSlot containing the tuple, because they might not be valid
anymore if tuptoaster.c changed the tuple. Safe because slot must be
in the materialized state, but mighty ugly --- find a better answer!
functionality, but I still need to make another pass looking at places
that incidentally use arrays (such as ACL manipulation) to make sure they
are null-safe. Contrib needs work too.
I have not changed the behaviors that are still under discussion about
array comparison and what to do with lower bounds.
slot of the topmost plan node when a trigger returns a modified tuple.
These appear to be the only places where a plan node's caller did not
treat the result slot as read-only, which is an assumption that nodeUnique
makes as of 8.1. Fixes trigger-vs-DISTINCT bug reported by Frank van Vugt.
generated from subquery outputs: use the type info stored in the Var
itself. To avoid making ExecEvalVar and slot_getattr more complex
and slower, I split out the whole-row case into a separate ExecEval routine.
type ID information even when it's a record type. This is needed to
handle whole-row Vars referencing subquery outputs. Per example from
Richard Huxton.
generated by bitmap index scans. Along the way, simplify and speed up
the code for counting sequential and index scans; it was both confusing
and inefficient to be taking care of that in the per-tuple loops, IMHO.
initdb forced because of internal changes in pg_stat view definitions.
the ProcessUtility case, resulting in an intratransaction memory leak
if a utility command actually did return any tuples, as reported by
Dmitry Karasik. Fix this and also make the behavior more consistent
for cases involving nested SPI operations and multiple query trees,
by ensuring that we store the state locally until it is ready to be
returned to the caller.
outer relation is empty did not work, per test case from Patrick Welche.
It tried to use nodeHashjoin.c's high-level mechanisms for fetching an
outer-relation tuple, but that code expected the hash table to be filled
already. As patched, the code failed in corner cases such as having no
outer-relation tuples for the first hash batch. Revert and rewrite.
the parent table, even if the command that creates them is executed by
someone else (such as a superuser or a member of the owning role).
Per gripe from Michael Fuhr.
insufficient paranoia in code that follows t_ctid links. (We must do both
because even with VACUUM doing it properly, the intermediate state with
a dangling t_ctid link is visible concurrently during lazy VACUUM, and
could be seen afterwards if either type of VACUUM crashes partway through.)
Also try to improve documentation about what's going on. Patch is a bit
bulky because passing the XMAX information around required changing the
APIs of some low-level heapam.c routines, but it's not conceptually very
complicated. Per trouble report from Teodor and subsequent analysis.
This needs to be back-patched, but I'll do that after 8.1 beta is out.
and pg_auth_members. There are still many loose ends to finish in this
patch (no documentation, no regression tests, no pg_dump support for
instance). But I'm going to commit it now anyway so that Alvaro can
make some progress on shared dependencies. The catalog changes should
be pretty much done.
(a/k/a SELECT INTO). Instead, flush and fsync the whole relation before
committing. We do still need the WAL log when PITR is active, however.
Simon Riggs and Tom Lane.
work if either of the join relations are empty. The logic is:
(1) if the inner relation's startup cost is less than the outer
relation's startup cost and this is not an outer join, read
a single tuple from the inner relation via ExecHash()
- if NULL, we're done
(2) read a single tuple from the outer relation
- if NULL, we're done
(3) build the hash table on the inner relation
- if hash table is empty and this is not an outer join,
we're done
(4) otherwise, do hash join as usual
The implementation uses the new MultiExecProcNode API, per a
suggestion from Tom: invoking ExecHash() now produces the first
tuple from the Hash node's child node, whereas MultiExecHash()
builds the hash table.
I had to put in a bit of a kludge to get the row count returned
for EXPLAIN ANALYZE to be correct: since ExecHash() is invoked to
return a tuple, and then MultiExecHash() is invoked, we would
return one too many tuples to EXPLAIN ANALYZE. I hacked around
this by just manually detecting this situation and subtracting 1
from the EXPLAIN ANALYZE row count.
this in turn causes CREATE TABLE AS in plpgsql to set ROW_COUNT.
This is how it behaved before 7.4; I had unintentionally changed the
behavior in a bit of sloppy micro-optimization.
spotted by Qingqing Zhou. The HASH_ENTER action now automatically
fails with elog(ERROR) on out-of-memory --- which incidentally lets
us eliminate duplicate error checks in quite a bunch of places. If
you really need the old return-NULL-on-out-of-memory behavior, you
can ask for HASH_ENTER_NULL. But there is now an Assert in that path
checking that you aren't hoping to get that behavior in a palloc-based
hash table.
Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions,
which were not being used anywhere anymore, and were surely too ugly
and unsafe to want to see revived again.
aren't doing anything useful (ie, neither selection nor projection).
Also, extend to SubqueryScan the hacks already in place to avoid
unnecessary ExecProject calls when the result would just be the same
tuple the subquery already delivered. This saves some overhead in
UNION and other set operations, as well as avoiding overhead for
unflatten-able subqueries. Per example from Sokolov Yura.
in an inconsistent state. (This is only latent because in reality
ExecSeqRestrPos is dead code at the moment ... but someday maybe it won't
be.) Add some comments about what the API for plan node mark/restore
actually is, because it's not immediately obvious.
When one side of the join has a NULL, we don't want to uselessly try
to match it against every remaining tuple of the other side. While
at it, rewrite the comparison machinery to avoid multiple evaluations
of the left and right input expressions and to use a btree comparator
where available, instead of double operator calls. Also revise the
state machine to eliminate redundant comparisons and hopefully make it
more readable too.