1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* execUtils.c
|
2000-07-12 04:37:39 +02:00
|
|
|
* miscellaneous executor utility routines
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2020-01-01 18:21:45 +01:00
|
|
|
* Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/executor/execUtils.c
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
/*
|
|
|
|
* INTERFACE ROUTINES
|
2002-12-15 17:17:59 +01:00
|
|
|
* CreateExecutorState Create/delete executor working state
|
|
|
|
* FreeExecutorState
|
|
|
|
* CreateExprContext
|
2006-08-04 23:33:36 +02:00
|
|
|
* CreateStandaloneExprContext
|
2002-12-15 17:17:59 +01:00
|
|
|
* FreeExprContext
|
2003-12-18 21:21:37 +01:00
|
|
|
* ReScanExprContext
|
2002-12-15 17:17:59 +01:00
|
|
|
*
|
2000-07-12 04:37:39 +02:00
|
|
|
* ExecAssignExprContext Common code for plan node init routines.
|
2002-12-15 17:17:59 +01:00
|
|
|
* etc
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2005-12-02 21:03:42 +01:00
|
|
|
* ExecOpenScanRelation Common code for scan node init routines.
|
2018-10-04 20:03:37 +02:00
|
|
|
*
|
2018-10-04 21:48:17 +02:00
|
|
|
* ExecInitRangeTable Set up executor's range-table-related data.
|
|
|
|
*
|
2018-10-04 20:03:37 +02:00
|
|
|
* ExecGetRangeTableRelation Fetch Relation for a rangetable entry.
|
2005-12-02 21:03:42 +01:00
|
|
|
*
|
2017-04-18 19:20:59 +02:00
|
|
|
* executor_errposition Report syntactic position of an error.
|
|
|
|
*
|
2002-05-12 22:10:05 +02:00
|
|
|
* RegisterExprContextCallback Register function shutdown callback
|
|
|
|
* UnregisterExprContextCallback Deregister function shutdown callback
|
|
|
|
*
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
* GetAttributeByName Runtime extraction of columns from tuples.
|
|
|
|
* GetAttributeByNum
|
|
|
|
*
|
1997-09-07 07:04:48 +02:00
|
|
|
* NOTES
|
|
|
|
* This file has traditionally been the place to stick misc.
|
|
|
|
* executor support stuff that doesn't really go anyplace else.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
|
|
|
|
1996-10-31 11:12:26 +01:00
|
|
|
#include "postgres.h"
|
|
|
|
|
2018-10-03 22:05:05 +02:00
|
|
|
#include "access/parallel.h"
|
2009-12-07 06:22:23 +01:00
|
|
|
#include "access/relscan.h"
|
2019-01-21 19:18:20 +01:00
|
|
|
#include "access/table.h"
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
#include "access/tableam.h"
|
2009-12-07 06:22:23 +01:00
|
|
|
#include "access/transam.h"
|
2015-04-24 08:33:23 +02:00
|
|
|
#include "executor/executor.h"
|
2018-07-26 01:31:49 +02:00
|
|
|
#include "jit/jit.h"
|
2017-04-18 19:20:59 +02:00
|
|
|
#include "mb/pg_wchar.h"
|
2009-04-03 00:39:30 +02:00
|
|
|
#include "nodes/nodeFuncs.h"
|
2005-12-02 21:03:42 +01:00
|
|
|
#include "parser/parsetree.h"
|
Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.
We still require AccessExclusiveLock on the partition itself, because
otherwise an insert that violates the newly-imposed partition
constraint could be in progress at the same time that we're changing
that constraint; only the lock level on the parent relation is
weakened.
To make this safe, we have to cope with (at least) three separate
problems. First, relevant DDL might commit while we're in the process
of building a PartitionDesc. If so, find_inheritance_children() might
see a new partition while the RELOID system cache still has the old
partition bound cached, and even before invalidation messages have
been queued. To fix that, if we see that the pg_class tuple seems to
be missing or to have a null relpartbound, refetch the value directly
from the table. We can't get the wrong value, because DETACH PARTITION
still requires AccessExclusiveLock throughout; if we ever want to
change that, this will need more thought. In testing, I found it quite
difficult to hit even the null-relpartbound case; the race condition
is extremely tight, but the theoretical risk is there.
Second, successive calls to RelationGetPartitionDesc might not return
the same answer. The query planner will get confused if lookup up the
PartitionDesc for a particular relation does not return a consistent
answer for the entire duration of query planning. Likewise, query
execution will get confused if the same relation seems to have a
different PartitionDesc at different times. Invent a new
PartitionDirectory concept and use it to ensure consistency. This
ensures that a single invocation of either the planner or the executor
sees the same view of the PartitionDesc from beginning to end, but it
does not guarantee that the planner and the executor see the same
view. Since this allows pointers to old PartitionDesc entries to
survive even after a relcache rebuild, also postpone removing the old
PartitionDesc entry until we're certain no one is using it.
For the most part, it seems to be OK for the planner and executor to
have different views of the PartitionDesc, because the executor will
just ignore any concurrently added partitions which were unknown at
plan time; those partitions won't be part of the inheritance
expansion, but invalidation messages will trigger replanning at some
point. Normally, this happens by the time the very next command is
executed, but if the next command acquires no locks and executes a
prepared query, it can manage not to notice until a new transaction is
started. We might want to tighten that up, but it's material for a
separate patch. There would still be a small window where a query
that started just after an ATTACH PARTITION command committed might
fail to notice its results -- but only if the command starts before
the commit has been acknowledged to the user. All in all, the warts
here around serializability seem small enough to be worth accepting
for the considerable advantage of being able to add partitions without
a full table lock.
Although in general the consequences of new partitions showing up
between planning and execution are limited to the query not noticing
the new partitions, run-time partition pruning will get confused in
that case, so that's the third problem that this patch fixes.
Run-time partition pruning assumes that indexes into the PartitionDesc
are stable between planning and execution. So, add code so that if
new partitions are added between plan time and execution time, the
indexes stored in the subplan_map[] and subpart_map[] arrays within
the plan's PartitionedRelPruneInfo get adjusted accordingly. There
does not seem to be a simple way to generalize this scheme to cope
with partitions that are removed, mostly because they could then get
added back again with different bounds, but it works OK for added
partitions.
This code does not try to ensure that every backend participating in
a parallel query sees the same view of the PartitionDesc. That
currently doesn't matter, because we never pass PartitionDesc
indexes between backends. Each backend will ignore the concurrently
added partitions which it notices, and it doesn't matter if different
backends are ignoring different sets of concurrently added partitions.
If in the future that matters, for example because we allow writes in
parallel query and want all participants to do tuple routing to the same
set of partitions, the PartitionDirectory concept could be improved to
share PartitionDescs across backends. There is a draft patch to
serialize and restore PartitionDescs on the thread where this patch
was discussed, which may be a useful place to start.
Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs,
Amit Langote, and Michael Paquier for discussion, and to Alvaro
Herrera for some review.
Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
2019-03-07 17:13:12 +01:00
|
|
|
#include "partitioning/partdesc.h"
|
2017-03-21 14:48:04 +01:00
|
|
|
#include "storage/lmgr.h"
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
#include "utils/builtins.h"
|
2000-07-12 04:37:39 +02:00
|
|
|
#include "utils/memutils.h"
|
2015-04-24 08:33:23 +02:00
|
|
|
#include "utils/rel.h"
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
#include "utils/typcache.h"
|
1996-07-09 08:22:35 +02:00
|
|
|
|
1997-08-19 23:40:56 +02:00
|
|
|
|
2017-11-25 16:49:17 +01:00
|
|
|
static bool tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc);
|
2009-07-18 21:15:42 +02:00
|
|
|
static void ShutdownExprContext(ExprContext *econtext, bool isCommit);
|
2002-05-12 22:10:05 +02:00
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/* ----------------------------------------------------------------
|
2002-12-15 17:17:59 +01:00
|
|
|
* Executor state and memory management functions
|
1996-07-09 08:22:35 +02:00
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* ----------------
|
2002-12-15 17:17:59 +01:00
|
|
|
* CreateExecutorState
|
2000-07-12 04:37:39 +02:00
|
|
|
*
|
2002-12-15 17:17:59 +01:00
|
|
|
* Create and initialize an EState node, which is the root of
|
|
|
|
* working storage for an entire Executor invocation.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2002-12-15 17:17:59 +01:00
|
|
|
* Principally, this creates the per-query memory context that will be
|
|
|
|
* used to hold all working data that lives till the end of the query.
|
|
|
|
* Note that the per-query context will become a child of the caller's
|
|
|
|
* CurrentMemoryContext.
|
1996-07-09 08:22:35 +02:00
|
|
|
* ----------------
|
|
|
|
*/
|
2002-12-15 17:17:59 +01:00
|
|
|
EState *
|
|
|
|
CreateExecutorState(void)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2007-02-27 02:11:26 +01:00
|
|
|
EState *estate;
|
2002-12-15 17:17:59 +01:00
|
|
|
MemoryContext qcontext;
|
2007-02-27 02:11:26 +01:00
|
|
|
MemoryContext oldcontext;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
/*
|
|
|
|
* Create the per-query context for this Executor run.
|
|
|
|
*/
|
|
|
|
qcontext = AllocSetContextCreate(CurrentMemoryContext,
|
|
|
|
"ExecutorState",
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 23:50:38 +02:00
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
2001-03-22 05:01:46 +01:00
|
|
|
|
2000-07-12 04:37:39 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Make the EState node within the per-query context. This way, we don't
|
|
|
|
* need a separate pfree() operation for it at shutdown.
|
2000-07-12 04:37:39 +02:00
|
|
|
*/
|
2002-12-15 17:17:59 +01:00
|
|
|
oldcontext = MemoryContextSwitchTo(qcontext);
|
|
|
|
|
|
|
|
estate = makeNode(EState);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize all fields of the Executor State structure
|
|
|
|
*/
|
|
|
|
estate->es_direction = ForwardScanDirection;
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
|
|
|
estate->es_snapshot = InvalidSnapshot; /* caller must initialize this */
|
2004-09-11 20:28:34 +02:00
|
|
|
estate->es_crosscheck_snapshot = InvalidSnapshot; /* no crosscheck */
|
2002-12-15 17:17:59 +01:00
|
|
|
estate->es_range_table = NIL;
|
2018-10-04 21:48:17 +02:00
|
|
|
estate->es_range_table_size = 0;
|
2018-10-04 20:03:37 +02:00
|
|
|
estate->es_relations = NULL;
|
2018-10-08 16:41:34 +02:00
|
|
|
estate->es_rowmarks = NULL;
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
2009-10-26 03:26:45 +01:00
|
|
|
estate->es_plannedstmt = NULL;
|
2002-12-15 17:17:59 +01:00
|
|
|
|
2009-10-12 20:10:51 +02:00
|
|
|
estate->es_junkFilter = NULL;
|
|
|
|
|
2007-11-30 22:22:54 +01:00
|
|
|
estate->es_output_cid = (CommandId) 0;
|
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
estate->es_result_relations = NULL;
|
|
|
|
estate->es_num_result_relations = 0;
|
|
|
|
estate->es_result_relation_info = NULL;
|
|
|
|
|
2017-08-18 19:01:05 +02:00
|
|
|
estate->es_root_result_relations = NULL;
|
|
|
|
estate->es_num_root_result_relations = 0;
|
|
|
|
|
2018-02-08 20:29:05 +01:00
|
|
|
estate->es_tuple_routing_result_relations = NIL;
|
2017-08-18 19:01:05 +02:00
|
|
|
|
2007-08-15 23:39:50 +02:00
|
|
|
estate->es_trig_target_relations = NIL;
|
2005-11-14 18:42:55 +01:00
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
estate->es_param_list_info = NULL;
|
|
|
|
estate->es_param_exec_vals = NULL;
|
|
|
|
|
2017-04-01 06:17:18 +02:00
|
|
|
estate->es_queryEnv = NULL;
|
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
estate->es_query_cxt = qcontext;
|
|
|
|
|
2009-09-27 22:09:58 +02:00
|
|
|
estate->es_tupleTable = NIL;
|
2000-07-12 04:37:39 +02:00
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
estate->es_processed = 0;
|
|
|
|
|
2011-02-27 19:43:29 +01:00
|
|
|
estate->es_top_eflags = 0;
|
|
|
|
estate->es_instrument = 0;
|
|
|
|
estate->es_finished = false;
|
2002-12-15 17:17:59 +01:00
|
|
|
|
|
|
|
estate->es_exprcontexts = NIL;
|
|
|
|
|
2007-02-27 02:11:26 +01:00
|
|
|
estate->es_subplanstates = NIL;
|
|
|
|
|
2011-02-26 00:56:23 +01:00
|
|
|
estate->es_auxmodifytables = NIL;
|
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
estate->es_per_tuple_exprcontext = NULL;
|
|
|
|
|
2017-02-22 07:45:17 +01:00
|
|
|
estate->es_sourceText = NULL;
|
2002-12-15 17:17:59 +01:00
|
|
|
|
2017-10-27 16:04:01 +02:00
|
|
|
estate->es_use_parallel_mode = false;
|
|
|
|
|
2018-03-22 19:45:07 +01:00
|
|
|
estate->es_jit_flags = 0;
|
|
|
|
estate->es_jit = NULL;
|
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
/*
|
|
|
|
* Return the executor state structure
|
|
|
|
*/
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
|
|
|
return estate;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* ----------------
|
|
|
|
* FreeExecutorState
|
|
|
|
*
|
|
|
|
* Release an EState along with all remaining working storage.
|
|
|
|
*
|
2018-07-26 01:31:49 +02:00
|
|
|
* Note: this is not responsible for releasing non-memory resources, such as
|
|
|
|
* open relations or buffer pins. But it will shut down any still-active
|
|
|
|
* ExprContexts within the EState and deallocate associated JITed expressions.
|
|
|
|
* That is sufficient cleanup for situations where the EState has only been
|
|
|
|
* used for expression evaluation, and not to run a complete Plan.
|
2002-12-15 17:17:59 +01:00
|
|
|
*
|
|
|
|
* This can be called in any memory context ... so long as it's not one
|
|
|
|
* of the ones to be freed.
|
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
FreeExecutorState(EState *estate)
|
|
|
|
{
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Shut down and free any remaining ExprContexts. We do this explicitly
|
|
|
|
* to ensure that any remaining shutdown callbacks get called (since they
|
|
|
|
* might need to release resources that aren't simply memory within the
|
|
|
|
* per-query memory context).
|
2002-12-15 17:17:59 +01:00
|
|
|
*/
|
|
|
|
while (estate->es_exprcontexts)
|
|
|
|
{
|
2004-08-29 07:07:03 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* XXX: seems there ought to be a faster way to implement this than
|
|
|
|
* repeated list_delete(), no?
|
2004-05-26 06:41:50 +02:00
|
|
|
*/
|
2009-07-18 21:15:42 +02:00
|
|
|
FreeExprContext((ExprContext *) linitial(estate->es_exprcontexts),
|
|
|
|
true);
|
2002-12-15 17:17:59 +01:00
|
|
|
/* FreeExprContext removed the list link for us */
|
|
|
|
}
|
2003-08-04 02:43:34 +02:00
|
|
|
|
2018-07-26 01:31:49 +02:00
|
|
|
/* release JIT context, if allocated */
|
|
|
|
if (estate->es_jit)
|
|
|
|
{
|
|
|
|
jit_release_context(estate->es_jit);
|
|
|
|
estate->es_jit = NULL;
|
|
|
|
}
|
|
|
|
|
Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.
We still require AccessExclusiveLock on the partition itself, because
otherwise an insert that violates the newly-imposed partition
constraint could be in progress at the same time that we're changing
that constraint; only the lock level on the parent relation is
weakened.
To make this safe, we have to cope with (at least) three separate
problems. First, relevant DDL might commit while we're in the process
of building a PartitionDesc. If so, find_inheritance_children() might
see a new partition while the RELOID system cache still has the old
partition bound cached, and even before invalidation messages have
been queued. To fix that, if we see that the pg_class tuple seems to
be missing or to have a null relpartbound, refetch the value directly
from the table. We can't get the wrong value, because DETACH PARTITION
still requires AccessExclusiveLock throughout; if we ever want to
change that, this will need more thought. In testing, I found it quite
difficult to hit even the null-relpartbound case; the race condition
is extremely tight, but the theoretical risk is there.
Second, successive calls to RelationGetPartitionDesc might not return
the same answer. The query planner will get confused if lookup up the
PartitionDesc for a particular relation does not return a consistent
answer for the entire duration of query planning. Likewise, query
execution will get confused if the same relation seems to have a
different PartitionDesc at different times. Invent a new
PartitionDirectory concept and use it to ensure consistency. This
ensures that a single invocation of either the planner or the executor
sees the same view of the PartitionDesc from beginning to end, but it
does not guarantee that the planner and the executor see the same
view. Since this allows pointers to old PartitionDesc entries to
survive even after a relcache rebuild, also postpone removing the old
PartitionDesc entry until we're certain no one is using it.
For the most part, it seems to be OK for the planner and executor to
have different views of the PartitionDesc, because the executor will
just ignore any concurrently added partitions which were unknown at
plan time; those partitions won't be part of the inheritance
expansion, but invalidation messages will trigger replanning at some
point. Normally, this happens by the time the very next command is
executed, but if the next command acquires no locks and executes a
prepared query, it can manage not to notice until a new transaction is
started. We might want to tighten that up, but it's material for a
separate patch. There would still be a small window where a query
that started just after an ATTACH PARTITION command committed might
fail to notice its results -- but only if the command starts before
the commit has been acknowledged to the user. All in all, the warts
here around serializability seem small enough to be worth accepting
for the considerable advantage of being able to add partitions without
a full table lock.
Although in general the consequences of new partitions showing up
between planning and execution are limited to the query not noticing
the new partitions, run-time partition pruning will get confused in
that case, so that's the third problem that this patch fixes.
Run-time partition pruning assumes that indexes into the PartitionDesc
are stable between planning and execution. So, add code so that if
new partitions are added between plan time and execution time, the
indexes stored in the subplan_map[] and subpart_map[] arrays within
the plan's PartitionedRelPruneInfo get adjusted accordingly. There
does not seem to be a simple way to generalize this scheme to cope
with partitions that are removed, mostly because they could then get
added back again with different bounds, but it works OK for added
partitions.
This code does not try to ensure that every backend participating in
a parallel query sees the same view of the PartitionDesc. That
currently doesn't matter, because we never pass PartitionDesc
indexes between backends. Each backend will ignore the concurrently
added partitions which it notices, and it doesn't matter if different
backends are ignoring different sets of concurrently added partitions.
If in the future that matters, for example because we allow writes in
parallel query and want all participants to do tuple routing to the same
set of partitions, the PartitionDirectory concept could be improved to
share PartitionDescs across backends. There is a draft patch to
serialize and restore PartitionDescs on the thread where this patch
was discussed, which may be a useful place to start.
Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs,
Amit Langote, and Michael Paquier for discussion, and to Alvaro
Herrera for some review.
Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
2019-03-07 17:13:12 +01:00
|
|
|
/* release partition directory, if allocated */
|
|
|
|
if (estate->es_partition_directory)
|
|
|
|
{
|
|
|
|
DestroyPartitionDirectory(estate->es_partition_directory);
|
|
|
|
estate->es_partition_directory = NULL;
|
|
|
|
}
|
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
/*
|
|
|
|
* Free the per-query memory context, thereby releasing all working
|
2007-02-27 02:11:26 +01:00
|
|
|
* memory, including the EState node itself.
|
2002-12-15 17:17:59 +01:00
|
|
|
*/
|
2007-02-27 02:11:26 +01:00
|
|
|
MemoryContextDelete(estate->es_query_cxt);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* ----------------
|
2002-12-15 17:17:59 +01:00
|
|
|
* CreateExprContext
|
|
|
|
*
|
|
|
|
* Create a context for expression evaluation within an EState.
|
|
|
|
*
|
|
|
|
* An executor run may require multiple ExprContexts (we usually make one
|
|
|
|
* for each Plan node, and a separate one for per-output-tuple processing
|
|
|
|
* such as constraint checking). Each ExprContext has its own "per-tuple"
|
|
|
|
* memory context.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2002-12-15 17:17:59 +01:00
|
|
|
* Note we make no assumption about the caller's memory context.
|
1996-07-09 08:22:35 +02:00
|
|
|
* ----------------
|
|
|
|
*/
|
2000-07-12 04:37:39 +02:00
|
|
|
ExprContext *
|
2002-12-15 17:17:59 +01:00
|
|
|
CreateExprContext(EState *estate)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2002-12-15 17:17:59 +01:00
|
|
|
ExprContext *econtext;
|
|
|
|
MemoryContext oldcontext;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
/* Create the ExprContext node within the per-query memory context */
|
|
|
|
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
|
|
|
|
|
|
|
|
econtext = makeNode(ExprContext);
|
|
|
|
|
|
|
|
/* Initialize fields of ExprContext */
|
|
|
|
econtext->ecxt_scantuple = NULL;
|
2000-07-12 04:37:39 +02:00
|
|
|
econtext->ecxt_innertuple = NULL;
|
|
|
|
econtext->ecxt_outertuple = NULL;
|
2002-12-15 17:17:59 +01:00
|
|
|
|
|
|
|
econtext->ecxt_per_query_memory = estate->es_query_cxt;
|
2001-03-22 05:01:46 +01:00
|
|
|
|
2000-07-12 04:37:39 +02:00
|
|
|
/*
|
2002-12-15 17:17:59 +01:00
|
|
|
* Create working memory for expression evaluation in this context.
|
2000-07-12 04:37:39 +02:00
|
|
|
*/
|
|
|
|
econtext->ecxt_per_tuple_memory =
|
2002-12-15 17:17:59 +01:00
|
|
|
AllocSetContextCreate(estate->es_query_cxt,
|
|
|
|
"ExprContext",
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 23:50:38 +02:00
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
2002-12-15 17:17:59 +01:00
|
|
|
|
|
|
|
econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
|
|
|
|
econtext->ecxt_param_list_info = estate->es_param_list_info;
|
|
|
|
|
2000-07-12 04:37:39 +02:00
|
|
|
econtext->ecxt_aggvalues = NULL;
|
|
|
|
econtext->ecxt_aggnulls = NULL;
|
2002-12-15 17:17:59 +01:00
|
|
|
|
2004-03-17 21:48:43 +01:00
|
|
|
econtext->caseValue_datum = (Datum) 0;
|
|
|
|
econtext->caseValue_isNull = true;
|
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
econtext->domainValue_datum = (Datum) 0;
|
|
|
|
econtext->domainValue_isNull = true;
|
|
|
|
|
|
|
|
econtext->ecxt_estate = estate;
|
|
|
|
|
2002-05-12 22:10:05 +02:00
|
|
|
econtext->ecxt_callbacks = NULL;
|
2000-07-12 04:37:39 +02:00
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Link the ExprContext into the EState to ensure it is shut down when the
|
|
|
|
* EState is freed. Because we use lcons(), shutdowns will occur in
|
|
|
|
* reverse order of creation, which may not be essential but can't hurt.
|
2002-12-15 17:17:59 +01:00
|
|
|
*/
|
|
|
|
estate->es_exprcontexts = lcons(econtext, estate->es_exprcontexts);
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
2000-07-12 04:37:39 +02:00
|
|
|
return econtext;
|
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-08-04 23:33:36 +02:00
|
|
|
/* ----------------
|
|
|
|
* CreateStandaloneExprContext
|
|
|
|
*
|
|
|
|
* Create a context for standalone expression evaluation.
|
|
|
|
*
|
|
|
|
* An ExprContext made this way can be used for evaluation of expressions
|
|
|
|
* that contain no Params, subplans, or Var references (it might work to
|
|
|
|
* put tuple references into the scantuple field, but it seems unwise).
|
|
|
|
*
|
|
|
|
* The ExprContext struct is allocated in the caller's current memory
|
|
|
|
* context, which also becomes its "per query" context.
|
|
|
|
*
|
|
|
|
* It is caller's responsibility to free the ExprContext when done,
|
|
|
|
* or at least ensure that any shutdown callbacks have been called
|
|
|
|
* (ReScanExprContext() is suitable). Otherwise, non-memory resources
|
|
|
|
* might be leaked.
|
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
ExprContext *
|
|
|
|
CreateStandaloneExprContext(void)
|
|
|
|
{
|
|
|
|
ExprContext *econtext;
|
|
|
|
|
|
|
|
/* Create the ExprContext node within the caller's memory context */
|
|
|
|
econtext = makeNode(ExprContext);
|
|
|
|
|
|
|
|
/* Initialize fields of ExprContext */
|
|
|
|
econtext->ecxt_scantuple = NULL;
|
|
|
|
econtext->ecxt_innertuple = NULL;
|
|
|
|
econtext->ecxt_outertuple = NULL;
|
|
|
|
|
|
|
|
econtext->ecxt_per_query_memory = CurrentMemoryContext;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create working memory for expression evaluation in this context.
|
|
|
|
*/
|
|
|
|
econtext->ecxt_per_tuple_memory =
|
|
|
|
AllocSetContextCreate(CurrentMemoryContext,
|
|
|
|
"ExprContext",
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 23:50:38 +02:00
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
2006-08-04 23:33:36 +02:00
|
|
|
|
|
|
|
econtext->ecxt_param_exec_vals = NULL;
|
|
|
|
econtext->ecxt_param_list_info = NULL;
|
|
|
|
|
|
|
|
econtext->ecxt_aggvalues = NULL;
|
|
|
|
econtext->ecxt_aggnulls = NULL;
|
|
|
|
|
|
|
|
econtext->caseValue_datum = (Datum) 0;
|
|
|
|
econtext->caseValue_isNull = true;
|
|
|
|
|
|
|
|
econtext->domainValue_datum = (Datum) 0;
|
|
|
|
econtext->domainValue_isNull = true;
|
|
|
|
|
|
|
|
econtext->ecxt_estate = NULL;
|
|
|
|
|
|
|
|
econtext->ecxt_callbacks = NULL;
|
|
|
|
|
|
|
|
return econtext;
|
|
|
|
}
|
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
/* ----------------
|
|
|
|
* FreeExprContext
|
|
|
|
*
|
|
|
|
* Free an expression context, including calling any remaining
|
|
|
|
* shutdown callbacks.
|
|
|
|
*
|
|
|
|
* Since we free the temporary context used for expression evaluation,
|
|
|
|
* any previously computed pass-by-reference expression result will go away!
|
|
|
|
*
|
2009-07-18 21:15:42 +02:00
|
|
|
* If isCommit is false, we are being called in error cleanup, and should
|
2014-05-06 18:12:18 +02:00
|
|
|
* not call callbacks but only release memory. (It might be better to call
|
2009-07-18 21:15:42 +02:00
|
|
|
* the callbacks and pass the isCommit flag to them, but that would require
|
|
|
|
* more invasive code changes than currently seems justified.)
|
|
|
|
*
|
2002-12-15 17:17:59 +01:00
|
|
|
* Note we make no assumption about the caller's memory context.
|
|
|
|
* ----------------
|
2000-07-12 04:37:39 +02:00
|
|
|
*/
|
|
|
|
void
|
2009-07-18 21:15:42 +02:00
|
|
|
FreeExprContext(ExprContext *econtext, bool isCommit)
|
2000-07-12 04:37:39 +02:00
|
|
|
{
|
2002-12-15 17:17:59 +01:00
|
|
|
EState *estate;
|
|
|
|
|
2002-05-12 22:10:05 +02:00
|
|
|
/* Call any registered callbacks */
|
2009-07-18 21:15:42 +02:00
|
|
|
ShutdownExprContext(econtext, isCommit);
|
2002-05-12 22:10:05 +02:00
|
|
|
/* And clean up the memory used */
|
2000-07-12 04:37:39 +02:00
|
|
|
MemoryContextDelete(econtext->ecxt_per_tuple_memory);
|
2006-08-04 23:33:36 +02:00
|
|
|
/* Unlink self from owning EState, if any */
|
2002-12-15 17:17:59 +01:00
|
|
|
estate = econtext->ecxt_estate;
|
2006-08-04 23:33:36 +02:00
|
|
|
if (estate)
|
|
|
|
estate->es_exprcontexts = list_delete_ptr(estate->es_exprcontexts,
|
|
|
|
econtext);
|
2002-12-15 17:17:59 +01:00
|
|
|
/* And delete the ExprContext node */
|
2000-07-12 04:37:39 +02:00
|
|
|
pfree(econtext);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2003-12-18 21:21:37 +01:00
|
|
|
/*
|
|
|
|
* ReScanExprContext
|
|
|
|
*
|
|
|
|
* Reset an expression context in preparation for a rescan of its
|
2014-05-06 18:12:18 +02:00
|
|
|
* plan node. This requires calling any registered shutdown callbacks,
|
2003-12-18 21:21:37 +01:00
|
|
|
* since any partially complete set-returning-functions must be canceled.
|
|
|
|
*
|
|
|
|
* Note we make no assumption about the caller's memory context.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ReScanExprContext(ExprContext *econtext)
|
|
|
|
{
|
|
|
|
/* Call any registered callbacks */
|
2009-07-18 21:15:42 +02:00
|
|
|
ShutdownExprContext(econtext, true);
|
2003-12-18 21:21:37 +01:00
|
|
|
/* And clean up the memory used */
|
|
|
|
MemoryContextReset(econtext->ecxt_per_tuple_memory);
|
|
|
|
}
|
|
|
|
|
2001-01-22 01:50:07 +01:00
|
|
|
/*
|
|
|
|
* Build a per-output-tuple ExprContext for an EState.
|
|
|
|
*
|
2002-12-15 17:17:59 +01:00
|
|
|
* This is normally invoked via GetPerTupleExprContext() macro,
|
|
|
|
* not directly.
|
2001-01-22 01:50:07 +01:00
|
|
|
*/
|
|
|
|
ExprContext *
|
|
|
|
MakePerTupleExprContext(EState *estate)
|
|
|
|
{
|
|
|
|
if (estate->es_per_tuple_exprcontext == NULL)
|
2002-12-15 17:17:59 +01:00
|
|
|
estate->es_per_tuple_exprcontext = CreateExprContext(estate);
|
2001-01-22 01:50:07 +01:00
|
|
|
|
|
|
|
return estate->es_per_tuple_exprcontext;
|
|
|
|
}
|
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/* ----------------------------------------------------------------
|
2002-12-15 17:17:59 +01:00
|
|
|
* miscellaneous node-init support functions
|
|
|
|
*
|
|
|
|
* Note: all of these are expected to be called with CurrentMemoryContext
|
|
|
|
* equal to the per-query memory context.
|
1996-07-09 08:22:35 +02:00
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
2002-12-15 17:17:59 +01:00
|
|
|
/* ----------------
|
|
|
|
* ExecAssignExprContext
|
|
|
|
*
|
2014-05-06 18:12:18 +02:00
|
|
|
* This initializes the ps_ExprContext field. It is only necessary
|
2002-12-15 17:17:59 +01:00
|
|
|
* to do this for nodes which use ExecQual or ExecProject
|
2003-08-04 02:43:34 +02:00
|
|
|
* because those routines require an econtext. Other nodes that
|
2002-12-15 17:17:59 +01:00
|
|
|
* don't have to evaluate expressions don't need to do this.
|
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
void
|
2003-08-08 23:42:59 +02:00
|
|
|
ExecAssignExprContext(EState *estate, PlanState *planstate)
|
2002-12-15 17:17:59 +01:00
|
|
|
{
|
|
|
|
planstate->ps_ExprContext = CreateExprContext(estate);
|
|
|
|
}
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/* ----------------
|
1997-09-07 07:04:48 +02:00
|
|
|
* ExecGetResultType
|
1996-07-09 08:22:35 +02:00
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
TupleDesc
|
2003-08-08 23:42:59 +02:00
|
|
|
ExecGetResultType(PlanState *planstate)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
Don't require return slots for nodes without projection.
In a lot of nodes the return slot is not required. That can either be
because the node doesn't do any projection (say an Append node), or
because the node does perform projections but the projection is
optimized away because the projection would yield an identical row.
Slots aren't that small, especially for wide rows, so it's worthwhile
to avoid creating them. It's not possible to just skip creating the
slot - it's currently used to determine the tuple descriptor returned
by ExecGetResultType(). So separate the determination of the result
type from the slot creation. The work previously done internally
ExecInitResultTupleSlotTL() can now also be done separately with
ExecInitResultTypeTL() and ExecInitResultSlot(). That way nodes that
aren't guaranteed to need a result slot, can use
ExecInitResultTypeTL() to determine the result type of the node, and
ExecAssignScanProjectionInfo() (via
ExecConditionalAssignProjectionInfo()) determines that a result slot
is needed, it is created with ExecInitResultSlot().
Besides the advantage of avoiding to create slots that then are
unused, this is necessary preparation for later patches around tuple
table slot abstraction. In particular separating the return descriptor
and slot is a prerequisite to allow JITing of tuple deforming with
knowledge of the underlying tuple format, and to avoid unnecessarily
creating JITed tuple deforming for virtual slots.
This commit removes a redundant argument from
ExecInitResultTupleSlotTL(). While this commit touches a lot of the
relevant lines anyway, it'd normally still not worthwhile to cause
breakage, except that aforementioned later commits will touch *all*
ExecInitResultTupleSlotTL() callers anyway (but fits worse
thematically).
Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-10 02:19:39 +01:00
|
|
|
return planstate->ps_ResultTupleDesc;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
Introduce notion of different types of slots (without implementing them).
Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.
Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots. Thus different types of
slots are needed, which requires adapting creators of slots.
The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.
Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.
But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.
Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().
The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.
This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit. The different types of slots introduced will, for now, still
use the same backing implementation.
While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16 07:00:30 +01:00
|
|
|
/*
|
|
|
|
* ExecGetResultSlotOps - information about node's type of result slot
|
|
|
|
*/
|
|
|
|
const TupleTableSlotOps *
|
|
|
|
ExecGetResultSlotOps(PlanState *planstate, bool *isfixed)
|
|
|
|
{
|
|
|
|
if (planstate->resultopsset && planstate->resultops)
|
|
|
|
{
|
|
|
|
if (isfixed)
|
|
|
|
*isfixed = planstate->resultopsfixed;
|
|
|
|
return planstate->resultops;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (isfixed)
|
|
|
|
{
|
|
|
|
if (planstate->resultopsset)
|
|
|
|
*isfixed = planstate->resultopsfixed;
|
|
|
|
else if (planstate->ps_ResultTupleSlot)
|
|
|
|
*isfixed = TTS_FIXED(planstate->ps_ResultTupleSlot);
|
|
|
|
else
|
|
|
|
*isfixed = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!planstate->ps_ResultTupleSlot)
|
|
|
|
return &TTSOpsVirtual;
|
|
|
|
|
|
|
|
return planstate->ps_ResultTupleSlot->tts_ops;
|
|
|
|
}
|
|
|
|
|
2009-04-03 00:39:30 +02:00
|
|
|
|
2003-01-12 05:03:34 +01:00
|
|
|
/* ----------------
|
|
|
|
* ExecAssignProjectionInfo
|
|
|
|
*
|
|
|
|
* forms the projection information from the node's targetlist
|
2007-02-02 01:07:03 +01:00
|
|
|
*
|
|
|
|
* Notes for inputDesc are same as for ExecBuildProjectionInfo: supply it
|
|
|
|
* for a relation-scan node, can pass NULL for upper-level nodes
|
2003-01-12 05:03:34 +01:00
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
void
|
2007-02-02 01:07:03 +01:00
|
|
|
ExecAssignProjectionInfo(PlanState *planstate,
|
|
|
|
TupleDesc inputDesc)
|
2003-01-12 05:03:34 +01:00
|
|
|
{
|
|
|
|
planstate->ps_ProjInfo =
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
ExecBuildProjectionInfo(planstate->plan->targetlist,
|
2003-01-12 05:03:34 +01:00
|
|
|
planstate->ps_ExprContext,
|
2007-02-02 01:07:03 +01:00
|
|
|
planstate->ps_ResultTupleSlot,
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
planstate,
|
2007-02-02 01:07:03 +01:00
|
|
|
inputDesc);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2017-11-25 16:49:17 +01:00
|
|
|
/* ----------------
|
|
|
|
* ExecConditionalAssignProjectionInfo
|
|
|
|
*
|
|
|
|
* as ExecAssignProjectionInfo, but store NULL rather than building projection
|
|
|
|
* info if no projection is required
|
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ExecConditionalAssignProjectionInfo(PlanState *planstate, TupleDesc inputDesc,
|
|
|
|
Index varno)
|
|
|
|
{
|
|
|
|
if (tlist_matches_tupdesc(planstate,
|
|
|
|
planstate->plan->targetlist,
|
|
|
|
varno,
|
|
|
|
inputDesc))
|
Introduce notion of different types of slots (without implementing them).
Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.
Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots. Thus different types of
slots are needed, which requires adapting creators of slots.
The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.
Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.
But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.
Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().
The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.
This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit. The different types of slots introduced will, for now, still
use the same backing implementation.
While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16 07:00:30 +01:00
|
|
|
{
|
2017-11-25 16:49:17 +01:00
|
|
|
planstate->ps_ProjInfo = NULL;
|
Introduce notion of different types of slots (without implementing them).
Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.
Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots. Thus different types of
slots are needed, which requires adapting creators of slots.
The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.
Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.
But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.
Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().
The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.
This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit. The different types of slots introduced will, for now, still
use the same backing implementation.
While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16 07:00:30 +01:00
|
|
|
planstate->resultopsset = planstate->scanopsset;
|
|
|
|
planstate->resultopsfixed = planstate->scanopsfixed;
|
|
|
|
planstate->resultops = planstate->scanops;
|
|
|
|
}
|
2017-11-25 16:49:17 +01:00
|
|
|
else
|
Don't require return slots for nodes without projection.
In a lot of nodes the return slot is not required. That can either be
because the node doesn't do any projection (say an Append node), or
because the node does perform projections but the projection is
optimized away because the projection would yield an identical row.
Slots aren't that small, especially for wide rows, so it's worthwhile
to avoid creating them. It's not possible to just skip creating the
slot - it's currently used to determine the tuple descriptor returned
by ExecGetResultType(). So separate the determination of the result
type from the slot creation. The work previously done internally
ExecInitResultTupleSlotTL() can now also be done separately with
ExecInitResultTypeTL() and ExecInitResultSlot(). That way nodes that
aren't guaranteed to need a result slot, can use
ExecInitResultTypeTL() to determine the result type of the node, and
ExecAssignScanProjectionInfo() (via
ExecConditionalAssignProjectionInfo()) determines that a result slot
is needed, it is created with ExecInitResultSlot().
Besides the advantage of avoiding to create slots that then are
unused, this is necessary preparation for later patches around tuple
table slot abstraction. In particular separating the return descriptor
and slot is a prerequisite to allow JITing of tuple deforming with
knowledge of the underlying tuple format, and to avoid unnecessarily
creating JITed tuple deforming for virtual slots.
This commit removes a redundant argument from
ExecInitResultTupleSlotTL(). While this commit touches a lot of the
relevant lines anyway, it'd normally still not worthwhile to cause
breakage, except that aforementioned later commits will touch *all*
ExecInitResultTupleSlotTL() callers anyway (but fits worse
thematically).
Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-10 02:19:39 +01:00
|
|
|
{
|
|
|
|
if (!planstate->ps_ResultTupleSlot)
|
Introduce notion of different types of slots (without implementing them).
Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.
Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots. Thus different types of
slots are needed, which requires adapting creators of slots.
The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.
Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.
But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.
Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().
The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.
This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit. The different types of slots introduced will, for now, still
use the same backing implementation.
While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16 07:00:30 +01:00
|
|
|
{
|
|
|
|
ExecInitResultSlot(planstate, &TTSOpsVirtual);
|
|
|
|
planstate->resultops = &TTSOpsVirtual;
|
|
|
|
planstate->resultopsfixed = true;
|
|
|
|
planstate->resultopsset = true;
|
|
|
|
}
|
2017-11-25 16:49:17 +01:00
|
|
|
ExecAssignProjectionInfo(planstate, inputDesc);
|
Don't require return slots for nodes without projection.
In a lot of nodes the return slot is not required. That can either be
because the node doesn't do any projection (say an Append node), or
because the node does perform projections but the projection is
optimized away because the projection would yield an identical row.
Slots aren't that small, especially for wide rows, so it's worthwhile
to avoid creating them. It's not possible to just skip creating the
slot - it's currently used to determine the tuple descriptor returned
by ExecGetResultType(). So separate the determination of the result
type from the slot creation. The work previously done internally
ExecInitResultTupleSlotTL() can now also be done separately with
ExecInitResultTypeTL() and ExecInitResultSlot(). That way nodes that
aren't guaranteed to need a result slot, can use
ExecInitResultTypeTL() to determine the result type of the node, and
ExecAssignScanProjectionInfo() (via
ExecConditionalAssignProjectionInfo()) determines that a result slot
is needed, it is created with ExecInitResultSlot().
Besides the advantage of avoiding to create slots that then are
unused, this is necessary preparation for later patches around tuple
table slot abstraction. In particular separating the return descriptor
and slot is a prerequisite to allow JITing of tuple deforming with
knowledge of the underlying tuple format, and to avoid unnecessarily
creating JITed tuple deforming for virtual slots.
This commit removes a redundant argument from
ExecInitResultTupleSlotTL(). While this commit touches a lot of the
relevant lines anyway, it'd normally still not worthwhile to cause
breakage, except that aforementioned later commits will touch *all*
ExecInitResultTupleSlotTL() callers anyway (but fits worse
thematically).
Author: Andres Freund
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-10 02:19:39 +01:00
|
|
|
}
|
2017-11-25 16:49:17 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static bool
|
|
|
|
tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc)
|
|
|
|
{
|
|
|
|
int numattrs = tupdesc->natts;
|
|
|
|
int attrno;
|
|
|
|
ListCell *tlist_item = list_head(tlist);
|
|
|
|
|
|
|
|
/* Check the tlist attributes */
|
|
|
|
for (attrno = 1; attrno <= numattrs; attrno++)
|
|
|
|
{
|
|
|
|
Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
|
|
|
|
Var *var;
|
|
|
|
|
|
|
|
if (tlist_item == NULL)
|
|
|
|
return false; /* tlist too short */
|
|
|
|
var = (Var *) ((TargetEntry *) lfirst(tlist_item))->expr;
|
|
|
|
if (!var || !IsA(var, Var))
|
|
|
|
return false; /* tlist item not a Var */
|
|
|
|
/* if these Asserts fail, planner messed up */
|
|
|
|
Assert(var->varno == varno);
|
|
|
|
Assert(var->varlevelsup == 0);
|
|
|
|
if (var->varattno != attrno)
|
|
|
|
return false; /* out of order */
|
|
|
|
if (att_tup->attisdropped)
|
|
|
|
return false; /* table contains dropped columns */
|
2018-03-28 02:13:52 +02:00
|
|
|
if (att_tup->atthasmissing)
|
|
|
|
return false; /* table contains cols with missing values */
|
2017-11-25 16:49:17 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Note: usually the Var's type should match the tupdesc exactly, but
|
|
|
|
* in situations involving unions of columns that have different
|
|
|
|
* typmods, the Var may have come from above the union and hence have
|
|
|
|
* typmod -1. This is a legitimate situation since the Var still
|
|
|
|
* describes the column, just not as exactly as the tupdesc does. We
|
|
|
|
* could change the planner to prevent it, but it'd then insert
|
|
|
|
* projection steps just to convert from specific typmod to typmod -1,
|
|
|
|
* which is pretty silly.
|
|
|
|
*/
|
|
|
|
if (var->vartype != att_tup->atttypid ||
|
|
|
|
(var->vartypmod != att_tup->atttypmod &&
|
|
|
|
var->vartypmod != -1))
|
|
|
|
return false; /* type mismatch */
|
|
|
|
|
Represent Lists as expansible arrays, not chains of cons-cells.
Originally, Postgres Lists were a more or less exact reimplementation of
Lisp lists, which consist of chains of separately-allocated cons cells,
each having a value and a next-cell link. We'd hacked that once before
(commit d0b4399d8) to add a separate List header, but the data was still
in cons cells. That makes some operations -- notably list_nth() -- O(N),
and it's bulky because of the next-cell pointers and per-cell palloc
overhead, and it's very cache-unfriendly if the cons cells end up
scattered around rather than being adjacent.
In this rewrite, we still have List headers, but the data is in a
resizable array of values, with no next-cell links. Now we need at
most two palloc's per List, and often only one, since we can allocate
some values in the same palloc call as the List header. (Of course,
extending an existing List may require repalloc's to enlarge the array.
But this involves just O(log N) allocations not O(N).)
Of course this is not without downsides. The key difficulty is that
addition or deletion of a list entry may now cause other entries to
move, which it did not before.
For example, that breaks foreach() and sister macros, which historically
used a pointer to the current cons-cell as loop state. We can repair
those macros transparently by making their actual loop state be an
integer list index; the exposed "ListCell *" pointer is no longer state
carried across loop iterations, but is just a derived value. (In
practice, modern compilers can optimize things back to having just one
loop state value, at least for simple cases with inline loop bodies.)
In principle, this is a semantics change for cases where the loop body
inserts or deletes list entries ahead of the current loop index; but
I found no such cases in the Postgres code.
The change is not at all transparent for code that doesn't use foreach()
but chases lists "by hand" using lnext(). The largest share of such
code in the backend is in loops that were maintaining "prev" and "next"
variables in addition to the current-cell pointer, in order to delete
list cells efficiently using list_delete_cell(). However, we no longer
need a previous-cell pointer to delete a list cell efficiently. Keeping
a next-cell pointer doesn't work, as explained above, but we can improve
matters by changing such code to use a regular foreach() loop and then
using the new macro foreach_delete_current() to delete the current cell.
(This macro knows how to update the associated foreach loop's state so
that no cells will be missed in the traversal.)
There remains a nontrivial risk of code assuming that a ListCell *
pointer will remain good over an operation that could now move the list
contents. To help catch such errors, list.c can be compiled with a new
define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents
whenever that could possibly happen. This makes list operations
significantly more expensive so it's not normally turned on (though it
is on by default if USE_VALGRIND is on).
There are two notable API differences from the previous code:
* lnext() now requires the List's header pointer in addition to the
current cell's address.
* list_delete_cell() no longer requires a previous-cell argument.
These changes are somewhat unfortunate, but on the other hand code using
either function needs inspection to see if it is assuming anything
it shouldn't, so it's not all bad.
Programmers should be aware of these significant performance changes:
* list_nth() and related functions are now O(1); so there's no
major access-speed difference between a list and an array.
* Inserting or deleting a list element now takes time proportional to
the distance to the end of the list, due to moving the array elements.
(However, it typically *doesn't* require palloc or pfree, so except in
long lists it's probably still faster than before.) Notably, lcons()
used to be about the same cost as lappend(), but that's no longer true
if the list is long. Code that uses lcons() and list_delete_first()
to maintain a stack might usefully be rewritten to push and pop at the
end of the list rather than the beginning.
* There are now list_insert_nth...() and list_delete_nth...() functions
that add or remove a list cell identified by index. These have the
data-movement penalty explained above, but there's no search penalty.
* list_concat() and variants now copy the second list's data into
storage belonging to the first list, so there is no longer any
sharing of cells between the input lists. The second argument is
now declared "const List *" to reflect that it isn't changed.
This patch just does the minimum needed to get the new implementation
in place and fix bugs exposed by the regression tests. As suggested
by the foregoing, there's a fair amount of followup work remaining to
do.
Also, the ENABLE_LIST_COMPAT macros are finally removed in this
commit. Code using those should have been gone a dozen years ago.
Patch by me; thanks to David Rowley, Jesper Pedersen, and others
for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-07-15 19:41:58 +02:00
|
|
|
tlist_item = lnext(tlist, tlist_item);
|
2017-11-25 16:49:17 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
if (tlist_item)
|
|
|
|
return false; /* tlist too long */
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
1999-03-20 02:13:22 +01:00
|
|
|
/* ----------------
|
|
|
|
* ExecFreeExprContext
|
2002-12-15 17:17:59 +01:00
|
|
|
*
|
2005-04-23 23:32:34 +02:00
|
|
|
* A plan node's ExprContext should be freed explicitly during executor
|
|
|
|
* shutdown because there may be shutdown callbacks to call. (Other resources
|
|
|
|
* made by the above routines, such as projection info, don't need to be freed
|
2002-12-15 17:17:59 +01:00
|
|
|
* explicitly because they're just memory in the per-query memory context.)
|
2005-04-23 23:32:34 +02:00
|
|
|
*
|
|
|
|
* However ... there is no particular need to do it during ExecEndNode,
|
|
|
|
* because FreeExecutorState will free any remaining ExprContexts within
|
2014-05-06 18:12:18 +02:00
|
|
|
* the EState. Letting FreeExecutorState do it allows the ExprContexts to
|
2005-04-23 23:32:34 +02:00
|
|
|
* be freed in reverse order of creation, rather than order of creation as
|
|
|
|
* will happen if we delete them here, which saves O(N^2) work in the list
|
|
|
|
* cleanup inside FreeExprContext.
|
1999-03-20 02:13:22 +01:00
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
void
|
2003-08-08 23:42:59 +02:00
|
|
|
ExecFreeExprContext(PlanState *planstate)
|
1999-03-20 02:13:22 +01:00
|
|
|
{
|
2001-03-22 07:16:21 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Per above discussion, don't actually delete the ExprContext. We do
|
|
|
|
* unlink it from the plan node, though.
|
1999-03-20 02:13:22 +01:00
|
|
|
*/
|
2002-12-05 16:50:39 +01:00
|
|
|
planstate->ps_ExprContext = NULL;
|
1999-03-20 02:13:22 +01:00
|
|
|
}
|
|
|
|
|
2018-02-17 06:17:38 +01:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/* ----------------------------------------------------------------
|
2018-02-17 06:17:38 +01:00
|
|
|
* Scan node support
|
1996-07-09 08:22:35 +02:00
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* ----------------
|
1997-09-07 07:04:48 +02:00
|
|
|
* ExecAssignScanType
|
1996-07-09 08:22:35 +02:00
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
void
|
2006-06-16 20:42:24 +02:00
|
|
|
ExecAssignScanType(ScanState *scanstate, TupleDesc tupDesc)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2002-12-05 16:50:39 +01:00
|
|
|
TupleTableSlot *slot = scanstate->ss_ScanTupleSlot;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-06-16 20:42:24 +02:00
|
|
|
ExecSetSlotDescriptor(slot, tupDesc);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* ----------------
|
2019-06-17 09:13:16 +02:00
|
|
|
* ExecCreateScanSlotFromOuterPlan
|
1996-07-09 08:22:35 +02:00
|
|
|
* ----------------
|
|
|
|
*/
|
|
|
|
void
|
Introduce notion of different types of slots (without implementing them).
Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.
Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots. Thus different types of
slots are needed, which requires adapting creators of slots.
The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.
Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.
But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.
Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().
The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.
This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit. The different types of slots introduced will, for now, still
use the same backing implementation.
While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16 07:00:30 +01:00
|
|
|
ExecCreateScanSlotFromOuterPlan(EState *estate,
|
|
|
|
ScanState *scanstate,
|
|
|
|
const TupleTableSlotOps *tts_ops)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2002-12-05 16:50:39 +01:00
|
|
|
PlanState *outerPlan;
|
1997-09-08 04:41:22 +02:00
|
|
|
TupleDesc tupDesc;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2002-12-05 16:50:39 +01:00
|
|
|
outerPlan = outerPlanState(scanstate);
|
2003-05-05 19:57:47 +02:00
|
|
|
tupDesc = ExecGetResultType(outerPlan);
|
1996-07-09 08:22:35 +02:00
|
|
|
|
Introduce notion of different types of slots (without implementing them).
Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.
Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots. Thus different types of
slots are needed, which requires adapting creators of slots.
The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.
Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.
But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.
Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().
The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.
This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit. The different types of slots introduced will, for now, still
use the same backing implementation.
While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16 07:00:30 +01:00
|
|
|
ExecInitScanTupleSlot(estate, scanstate, tupDesc, tts_ops);
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2005-12-03 06:51:03 +01:00
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* ExecRelationIsTargetRelation
|
|
|
|
*
|
|
|
|
* Detect whether a relation (identified by rangetable index)
|
|
|
|
* is one of the target relations of the query.
|
Make queries' locking of indexes more consistent.
The assertions added by commit b04aeb0a0 exposed that there are some
code paths wherein the executor will try to open an index without
holding any lock on it. We do have some lock on the index's table,
so it seems likely that there's no fatal problem with this (for
instance, the index couldn't get dropped from under us). Still,
it's bad practice and we should fix it.
To do so, remove the optimizations in ExecInitIndexScan and friends
that tried to avoid taking a lock on an index belonging to a target
relation, and just take the lock always. In non-bug cases, this
will result in no additional shared-memory access, since we'll find
in the local lock table that we already have a lock of the desired
type; hence, no significant performance degradation should occur.
Also, adjust the planner and executor so that the type of lock taken
on an index is always identical to the type of lock taken for its table,
by relying on the recently added RangeTblEntry.rellockmode field.
This avoids some corner cases where that might not have been true
before (possibly resulting in extra locking overhead), and prevents
future maintenance issues from having multiple bits of logic that
all needed to be in sync. In addition, this change removes all core
calls to ExecRelationIsTargetRelation, which avoids a possible O(N^2)
startup penalty for queries with large numbers of target relations.
(We'd probably remove that function altogether, were it not that we
advertise it as something that FDWs might want to use.)
Also adjust some places in selfuncs.c to not take any lock on indexes
they are transiently opening, since we can assume that plancat.c
did that already.
In passing, change gin_clean_pending_list() to take RowExclusiveLock
not AccessShareLock on its target index. Although it's not clear that
that's actually a bug, it seemed very strange for a function that's
explicitly going to modify the index to use only AccessShareLock.
David Rowley, reviewed by Julien Rouhaud and Amit Langote,
a bit of further tweaking by me
Discussion: https://postgr.es/m/19465.1541636036@sss.pgh.pa.us
2019-04-04 21:12:51 +02:00
|
|
|
*
|
|
|
|
* Note: This is currently no longer used in core. We keep it around
|
|
|
|
* because FDWs may wish to use it to determine if their foreign table
|
|
|
|
* is a target relation.
|
2005-12-03 06:51:03 +01:00
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
|
|
|
|
{
|
|
|
|
ResultRelInfo *resultRelInfos;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
resultRelInfos = estate->es_result_relations;
|
|
|
|
for (i = 0; i < estate->es_num_result_relations; i++)
|
|
|
|
{
|
|
|
|
if (resultRelInfos[i].ri_RangeTableIndex == scanrelid)
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2005-12-02 21:03:42 +01:00
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* ExecOpenScanRelation
|
|
|
|
*
|
|
|
|
* Open the heap relation to be scanned by a base-level scan plan node.
|
|
|
|
* This should be called during the node's ExecInit routine.
|
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
Relation
|
2013-04-27 23:48:57 +02:00
|
|
|
ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
|
2005-12-02 21:03:42 +01:00
|
|
|
{
|
2013-04-27 23:48:57 +02:00
|
|
|
Relation rel;
|
2005-12-02 21:03:42 +01:00
|
|
|
|
2018-10-04 20:03:37 +02:00
|
|
|
/* Open the relation. */
|
|
|
|
rel = ExecGetRangeTableRelation(estate, scanrelid);
|
2013-04-27 23:48:57 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Complain if we're attempting a scan of an unscannable relation, except
|
|
|
|
* when the query won't actually be run. This is a slightly klugy place
|
|
|
|
* to do this, perhaps, but there is no better place.
|
|
|
|
*/
|
|
|
|
if ((eflags & (EXEC_FLAG_EXPLAIN_ONLY | EXEC_FLAG_WITH_NO_DATA)) == 0 &&
|
|
|
|
!RelationIsScannable(rel))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
|
|
|
errmsg("materialized view \"%s\" has not been populated",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errhint("Use the REFRESH MATERIALIZED VIEW command.")));
|
|
|
|
|
|
|
|
return rel;
|
2005-12-02 21:03:42 +01:00
|
|
|
}
|
|
|
|
|
2018-10-04 21:48:17 +02:00
|
|
|
/*
|
|
|
|
* ExecInitRangeTable
|
|
|
|
* Set up executor's range-table-related data
|
|
|
|
*
|
2019-08-12 17:58:35 +02:00
|
|
|
* In addition to the range table proper, initialize arrays that are
|
|
|
|
* indexed by rangetable index.
|
2018-10-04 21:48:17 +02:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
ExecInitRangeTable(EState *estate, List *rangeTable)
|
|
|
|
{
|
|
|
|
/* Remember the range table List as-is */
|
|
|
|
estate->es_range_table = rangeTable;
|
|
|
|
|
2019-08-12 17:58:35 +02:00
|
|
|
/* Set size of associated arrays */
|
2018-10-04 21:48:17 +02:00
|
|
|
estate->es_range_table_size = list_length(rangeTable);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate an array to store an open Relation corresponding to each
|
|
|
|
* rangetable entry, and initialize entries to NULL. Relations are opened
|
|
|
|
* and stored here as needed.
|
|
|
|
*/
|
|
|
|
estate->es_relations = (Relation *)
|
|
|
|
palloc0(estate->es_range_table_size * sizeof(Relation));
|
2018-10-08 16:41:34 +02:00
|
|
|
|
|
|
|
/*
|
2019-08-12 17:58:35 +02:00
|
|
|
* es_rowmarks is also parallel to the es_range_table, but it's allocated
|
|
|
|
* only if needed.
|
2018-10-08 16:41:34 +02:00
|
|
|
*/
|
|
|
|
estate->es_rowmarks = NULL;
|
2018-10-04 21:48:17 +02:00
|
|
|
}
|
|
|
|
|
2018-10-04 20:03:37 +02:00
|
|
|
/*
|
|
|
|
* ExecGetRangeTableRelation
|
|
|
|
* Open the Relation for a range table entry, if not already done
|
2005-12-02 21:03:42 +01:00
|
|
|
*
|
2018-10-04 20:03:37 +02:00
|
|
|
* The Relations will be closed again in ExecEndPlan().
|
2005-12-02 21:03:42 +01:00
|
|
|
*/
|
2018-10-04 20:03:37 +02:00
|
|
|
Relation
|
|
|
|
ExecGetRangeTableRelation(EState *estate, Index rti)
|
2005-12-02 21:03:42 +01:00
|
|
|
{
|
2018-10-04 20:03:37 +02:00
|
|
|
Relation rel;
|
|
|
|
|
2018-10-04 21:48:17 +02:00
|
|
|
Assert(rti > 0 && rti <= estate->es_range_table_size);
|
2018-10-04 20:03:37 +02:00
|
|
|
|
|
|
|
rel = estate->es_relations[rti - 1];
|
|
|
|
if (rel == NULL)
|
|
|
|
{
|
|
|
|
/* First time through, so open the relation */
|
2018-10-04 21:48:17 +02:00
|
|
|
RangeTblEntry *rte = exec_rt_fetch(rti, estate);
|
2018-10-04 20:03:37 +02:00
|
|
|
|
|
|
|
Assert(rte->rtekind == RTE_RELATION);
|
|
|
|
|
2018-10-06 21:49:37 +02:00
|
|
|
if (!IsParallelWorker())
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* In a normal query, we should already have the appropriate lock,
|
|
|
|
* but verify that through an Assert. Since there's already an
|
2019-01-21 19:32:19 +01:00
|
|
|
* Assert inside table_open that insists on holding some lock, it
|
2018-10-06 21:49:37 +02:00
|
|
|
* seems sufficient to check this only when rellockmode is higher
|
|
|
|
* than the minimum.
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(rte->relid, NoLock);
|
2018-10-06 21:49:37 +02:00
|
|
|
Assert(rte->rellockmode == AccessShareLock ||
|
|
|
|
CheckRelationLockedByMe(rel, rte->rellockmode, false));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* If we are a parallel worker, we need to obtain our own local
|
|
|
|
* lock on the relation. This ensures sane behavior in case the
|
|
|
|
* parent process exits before we do.
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(rte->relid, rte->rellockmode);
|
2018-10-06 21:49:37 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
estate->es_relations[rti - 1] = rel;
|
2018-10-04 20:03:37 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
return rel;
|
2005-12-02 21:03:42 +01:00
|
|
|
}
|
|
|
|
|
2003-02-09 01:30:41 +01:00
|
|
|
/*
|
|
|
|
* UpdateChangedParamSet
|
|
|
|
* Add changed parameters to a plan node's chgParam set
|
|
|
|
*/
|
1998-02-26 05:46:47 +01:00
|
|
|
void
|
2003-08-08 23:42:59 +02:00
|
|
|
UpdateChangedParamSet(PlanState *node, Bitmapset *newchg)
|
1998-02-13 04:26:53 +01:00
|
|
|
{
|
2003-02-09 01:30:41 +01:00
|
|
|
Bitmapset *parmset;
|
1998-02-26 05:46:47 +01:00
|
|
|
|
2003-02-09 01:30:41 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* The plan node only depends on params listed in its allParam set. Don't
|
|
|
|
* include anything else into its chgParam set.
|
2003-02-09 01:30:41 +01:00
|
|
|
*/
|
|
|
|
parmset = bms_intersect(node->plan->allParam, newchg);
|
2003-08-04 02:43:34 +02:00
|
|
|
|
2003-02-09 01:30:41 +01:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Keep node->chgParam == NULL if there's not actually any members; this
|
|
|
|
* allows the simplest possible tests in executor node files.
|
2003-02-09 01:30:41 +01:00
|
|
|
*/
|
|
|
|
if (!bms_is_empty(parmset))
|
|
|
|
node->chgParam = bms_join(node->chgParam, parmset);
|
|
|
|
else
|
|
|
|
bms_free(parmset);
|
1998-02-13 04:26:53 +01:00
|
|
|
}
|
2002-05-12 22:10:05 +02:00
|
|
|
|
2017-04-18 19:20:59 +02:00
|
|
|
/*
|
|
|
|
* executor_errposition
|
|
|
|
* Report an execution-time cursor position, if possible.
|
|
|
|
*
|
|
|
|
* This is expected to be used within an ereport() call. The return value
|
|
|
|
* is a dummy (always 0, in fact).
|
|
|
|
*
|
|
|
|
* The locations stored in parsetrees are byte offsets into the source string.
|
|
|
|
* We have to convert them to 1-based character indexes for reporting to
|
|
|
|
* clients. (We do things this way to avoid unnecessary overhead in the
|
|
|
|
* normal non-error case: computing character indexes would be much more
|
|
|
|
* expensive than storing token offsets.)
|
|
|
|
*/
|
2020-03-25 16:57:36 +01:00
|
|
|
int
|
2017-04-18 19:20:59 +02:00
|
|
|
executor_errposition(EState *estate, int location)
|
|
|
|
{
|
|
|
|
int pos;
|
|
|
|
|
|
|
|
/* No-op if location was not provided */
|
|
|
|
if (location < 0)
|
2020-03-25 16:57:36 +01:00
|
|
|
return 0;
|
2017-04-18 19:20:59 +02:00
|
|
|
/* Can't do anything if source text is not available */
|
|
|
|
if (estate == NULL || estate->es_sourceText == NULL)
|
2020-03-25 16:57:36 +01:00
|
|
|
return 0;
|
2017-04-18 19:20:59 +02:00
|
|
|
/* Convert offset to character number */
|
|
|
|
pos = pg_mbstrlen_with_len(estate->es_sourceText, location) + 1;
|
|
|
|
/* And pass it to the ereport mechanism */
|
2020-03-25 16:57:36 +01:00
|
|
|
return errposition(pos);
|
2017-04-18 19:20:59 +02:00
|
|
|
}
|
|
|
|
|
2002-05-12 22:10:05 +02:00
|
|
|
/*
|
|
|
|
* Register a shutdown callback in an ExprContext.
|
|
|
|
*
|
|
|
|
* Shutdown callbacks will be called (in reverse order of registration)
|
|
|
|
* when the ExprContext is deleted or rescanned. This provides a hook
|
|
|
|
* for functions called in the context to do any cleanup needed --- it's
|
|
|
|
* particularly useful for functions returning sets. Note that the
|
|
|
|
* callback will *not* be called in the event that execution is aborted
|
|
|
|
* by an error.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
RegisterExprContextCallback(ExprContext *econtext,
|
|
|
|
ExprContextCallbackFunction function,
|
|
|
|
Datum arg)
|
|
|
|
{
|
2002-09-04 22:31:48 +02:00
|
|
|
ExprContext_CB *ecxt_callback;
|
2002-05-12 22:10:05 +02:00
|
|
|
|
|
|
|
/* Save the info in appropriate memory context */
|
|
|
|
ecxt_callback = (ExprContext_CB *)
|
|
|
|
MemoryContextAlloc(econtext->ecxt_per_query_memory,
|
|
|
|
sizeof(ExprContext_CB));
|
|
|
|
|
|
|
|
ecxt_callback->function = function;
|
|
|
|
ecxt_callback->arg = arg;
|
|
|
|
|
|
|
|
/* link to front of list for appropriate execution order */
|
|
|
|
ecxt_callback->next = econtext->ecxt_callbacks;
|
|
|
|
econtext->ecxt_callbacks = ecxt_callback;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Deregister a shutdown callback in an ExprContext.
|
|
|
|
*
|
|
|
|
* Any list entries matching the function and arg will be removed.
|
|
|
|
* This can be used if it's no longer necessary to call the callback.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
UnregisterExprContextCallback(ExprContext *econtext,
|
|
|
|
ExprContextCallbackFunction function,
|
|
|
|
Datum arg)
|
|
|
|
{
|
2002-09-04 22:31:48 +02:00
|
|
|
ExprContext_CB **prev_callback;
|
|
|
|
ExprContext_CB *ecxt_callback;
|
2002-05-12 22:10:05 +02:00
|
|
|
|
|
|
|
prev_callback = &econtext->ecxt_callbacks;
|
|
|
|
|
|
|
|
while ((ecxt_callback = *prev_callback) != NULL)
|
|
|
|
{
|
|
|
|
if (ecxt_callback->function == function && ecxt_callback->arg == arg)
|
|
|
|
{
|
|
|
|
*prev_callback = ecxt_callback->next;
|
|
|
|
pfree(ecxt_callback);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
prev_callback = &ecxt_callback->next;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Call all the shutdown callbacks registered in an ExprContext.
|
|
|
|
*
|
|
|
|
* The callback list is emptied (important in case this is only a rescan
|
|
|
|
* reset, and not deletion of the ExprContext).
|
2009-07-18 21:15:42 +02:00
|
|
|
*
|
|
|
|
* If isCommit is false, just clean the callback list but don't call 'em.
|
|
|
|
* (See comment for FreeExprContext.)
|
2002-05-12 22:10:05 +02:00
|
|
|
*/
|
|
|
|
static void
|
2009-07-18 21:15:42 +02:00
|
|
|
ShutdownExprContext(ExprContext *econtext, bool isCommit)
|
2002-05-12 22:10:05 +02:00
|
|
|
{
|
2002-09-04 22:31:48 +02:00
|
|
|
ExprContext_CB *ecxt_callback;
|
2002-12-15 17:17:59 +01:00
|
|
|
MemoryContext oldcontext;
|
|
|
|
|
|
|
|
/* Fast path in normal case where there's nothing to do. */
|
|
|
|
if (econtext->ecxt_callbacks == NULL)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Call the callbacks in econtext's per-tuple context. This ensures that
|
|
|
|
* any memory they might leak will get cleaned up.
|
2002-12-15 17:17:59 +01:00
|
|
|
*/
|
|
|
|
oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
|
2002-05-12 22:10:05 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Call each callback function in reverse registration order.
|
|
|
|
*/
|
|
|
|
while ((ecxt_callback = econtext->ecxt_callbacks) != NULL)
|
|
|
|
{
|
|
|
|
econtext->ecxt_callbacks = ecxt_callback->next;
|
2009-07-18 21:15:42 +02:00
|
|
|
if (isCommit)
|
2017-09-07 18:06:23 +02:00
|
|
|
ecxt_callback->function(ecxt_callback->arg);
|
2002-05-12 22:10:05 +02:00
|
|
|
pfree(ecxt_callback);
|
|
|
|
}
|
2002-12-15 17:17:59 +01:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
2002-05-12 22:10:05 +02:00
|
|
|
}
|
2017-03-21 14:48:04 +01:00
|
|
|
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
/*
|
|
|
|
* GetAttributeByName
|
|
|
|
* GetAttributeByNum
|
|
|
|
*
|
|
|
|
* These functions return the value of the requested attribute
|
|
|
|
* out of the given tuple Datum.
|
|
|
|
* C functions which take a tuple as an argument are expected
|
|
|
|
* to use these. Ex: overpaid(EMP) might call GetAttributeByNum().
|
|
|
|
* Note: these are actually rather slow because they do a typcache
|
|
|
|
* lookup on each call.
|
|
|
|
*/
|
|
|
|
Datum
|
|
|
|
GetAttributeByName(HeapTupleHeader tuple, const char *attname, bool *isNull)
|
|
|
|
{
|
|
|
|
AttrNumber attrno;
|
|
|
|
Datum result;
|
|
|
|
Oid tupType;
|
|
|
|
int32 tupTypmod;
|
|
|
|
TupleDesc tupDesc;
|
|
|
|
HeapTupleData tmptup;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (attname == NULL)
|
|
|
|
elog(ERROR, "invalid attribute name");
|
|
|
|
|
|
|
|
if (isNull == NULL)
|
|
|
|
elog(ERROR, "a NULL isNull pointer was passed");
|
|
|
|
|
|
|
|
if (tuple == NULL)
|
|
|
|
{
|
|
|
|
/* Kinda bogus but compatible with old behavior... */
|
|
|
|
*isNull = true;
|
|
|
|
return (Datum) 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
tupType = HeapTupleHeaderGetTypeId(tuple);
|
|
|
|
tupTypmod = HeapTupleHeaderGetTypMod(tuple);
|
|
|
|
tupDesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
|
|
|
|
|
|
|
|
attrno = InvalidAttrNumber;
|
|
|
|
for (i = 0; i < tupDesc->natts; i++)
|
|
|
|
{
|
2017-08-20 20:19:07 +02:00
|
|
|
Form_pg_attribute att = TupleDescAttr(tupDesc, i);
|
|
|
|
|
|
|
|
if (namestrcmp(&(att->attname), attname) == 0)
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
{
|
2017-08-20 20:19:07 +02:00
|
|
|
attrno = att->attnum;
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (attrno == InvalidAttrNumber)
|
|
|
|
elog(ERROR, "attribute \"%s\" does not exist", attname);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* heap_getattr needs a HeapTuple not a bare HeapTupleHeader. We set all
|
|
|
|
* the fields in the struct just in case user tries to inspect system
|
|
|
|
* columns.
|
|
|
|
*/
|
|
|
|
tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
|
|
|
|
ItemPointerSetInvalid(&(tmptup.t_self));
|
|
|
|
tmptup.t_tableOid = InvalidOid;
|
|
|
|
tmptup.t_data = tuple;
|
|
|
|
|
|
|
|
result = heap_getattr(&tmptup,
|
|
|
|
attrno,
|
|
|
|
tupDesc,
|
|
|
|
isNull);
|
|
|
|
|
|
|
|
ReleaseTupleDesc(tupDesc);
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
Datum
|
|
|
|
GetAttributeByNum(HeapTupleHeader tuple,
|
|
|
|
AttrNumber attrno,
|
|
|
|
bool *isNull)
|
|
|
|
{
|
|
|
|
Datum result;
|
|
|
|
Oid tupType;
|
|
|
|
int32 tupTypmod;
|
|
|
|
TupleDesc tupDesc;
|
|
|
|
HeapTupleData tmptup;
|
|
|
|
|
|
|
|
if (!AttributeNumberIsValid(attrno))
|
|
|
|
elog(ERROR, "invalid attribute number %d", attrno);
|
|
|
|
|
|
|
|
if (isNull == NULL)
|
|
|
|
elog(ERROR, "a NULL isNull pointer was passed");
|
|
|
|
|
|
|
|
if (tuple == NULL)
|
|
|
|
{
|
|
|
|
/* Kinda bogus but compatible with old behavior... */
|
|
|
|
*isNull = true;
|
|
|
|
return (Datum) 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
tupType = HeapTupleHeaderGetTypeId(tuple);
|
|
|
|
tupTypmod = HeapTupleHeaderGetTypMod(tuple);
|
|
|
|
tupDesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* heap_getattr needs a HeapTuple not a bare HeapTupleHeader. We set all
|
|
|
|
* the fields in the struct just in case user tries to inspect system
|
|
|
|
* columns.
|
|
|
|
*/
|
|
|
|
tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
|
|
|
|
ItemPointerSetInvalid(&(tmptup.t_self));
|
|
|
|
tmptup.t_tableOid = InvalidOid;
|
|
|
|
tmptup.t_data = tuple;
|
|
|
|
|
|
|
|
result = heap_getattr(&tmptup,
|
|
|
|
attrno,
|
|
|
|
tupDesc,
|
|
|
|
isNull);
|
|
|
|
|
|
|
|
ReleaseTupleDesc(tupDesc);
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Number of items in a tlist (including any resjunk items!)
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
ExecTargetListLength(List *targetlist)
|
|
|
|
{
|
|
|
|
/* This used to be more complex, but fjoins are dead */
|
|
|
|
return list_length(targetlist);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Number of items in a tlist, not including any resjunk items
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
ExecCleanTargetListLength(List *targetlist)
|
|
|
|
{
|
|
|
|
int len = 0;
|
|
|
|
ListCell *tl;
|
|
|
|
|
|
|
|
foreach(tl, targetlist)
|
|
|
|
{
|
Improve castNode notation by introducing list-extraction-specific variants.
This extends the castNode() notation introduced by commit 5bcab1114 to
provide, in one step, extraction of a list cell's pointer and coercion to
a concrete node type. For example, "lfirst_node(Foo, lc)" is the same
as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode
that have appeared so far include a list extraction call, so this is
pretty widely useful, and it saves a few more keystrokes compared to the
old way.
As with the previous patch, back-patch the addition of these macros to
pg_list.h, so that the notation will be available when back-patching.
Patch by me, after an idea of Andrew Gierth's.
Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us
2017-04-10 19:51:29 +02:00
|
|
|
TargetEntry *curTle = lfirst_node(TargetEntry, tl);
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
|
|
|
|
|
|
|
if (!curTle->resjunk)
|
|
|
|
len++;
|
|
|
|
}
|
|
|
|
return len;
|
|
|
|
}
|
2019-02-27 05:30:28 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Return a relInfo's tuple slot for a trigger's OLD tuples.
|
|
|
|
*/
|
|
|
|
TupleTableSlot *
|
|
|
|
ExecGetTriggerOldSlot(EState *estate, ResultRelInfo *relInfo)
|
|
|
|
{
|
|
|
|
if (relInfo->ri_TrigOldSlot == NULL)
|
|
|
|
{
|
|
|
|
Relation rel = relInfo->ri_RelationDesc;
|
|
|
|
MemoryContext oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
|
|
|
|
|
|
|
|
relInfo->ri_TrigOldSlot =
|
|
|
|
ExecInitExtraTupleSlot(estate,
|
|
|
|
RelationGetDescr(rel),
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_slot_callbacks(rel));
|
2019-02-27 05:30:28 +01:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
}
|
|
|
|
|
|
|
|
return relInfo->ri_TrigOldSlot;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return a relInfo's tuple slot for a trigger's NEW tuples.
|
|
|
|
*/
|
|
|
|
TupleTableSlot *
|
|
|
|
ExecGetTriggerNewSlot(EState *estate, ResultRelInfo *relInfo)
|
|
|
|
{
|
|
|
|
if (relInfo->ri_TrigNewSlot == NULL)
|
|
|
|
{
|
|
|
|
Relation rel = relInfo->ri_RelationDesc;
|
|
|
|
MemoryContext oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
|
|
|
|
|
|
|
|
relInfo->ri_TrigNewSlot =
|
|
|
|
ExecInitExtraTupleSlot(estate,
|
|
|
|
RelationGetDescr(rel),
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_slot_callbacks(rel));
|
2019-02-27 05:30:28 +01:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
}
|
|
|
|
|
|
|
|
return relInfo->ri_TrigNewSlot;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return a relInfo's tuple slot for processing returning tuples.
|
|
|
|
*/
|
|
|
|
TupleTableSlot *
|
|
|
|
ExecGetReturningSlot(EState *estate, ResultRelInfo *relInfo)
|
|
|
|
{
|
|
|
|
if (relInfo->ri_ReturningSlot == NULL)
|
|
|
|
{
|
|
|
|
Relation rel = relInfo->ri_RelationDesc;
|
|
|
|
MemoryContext oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
|
|
|
|
|
|
|
|
relInfo->ri_ReturningSlot =
|
|
|
|
ExecInitExtraTupleSlot(estate,
|
|
|
|
RelationGetDescr(rel),
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_slot_callbacks(rel));
|
2019-02-27 05:30:28 +01:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
}
|
|
|
|
|
|
|
|
return relInfo->ri_ReturningSlot;
|
|
|
|
}
|