Commit Graph

1559 Commits

Author SHA1 Message Date
Robert Haas 691b8d5928 Allow for parallel execution whenever ExecutorRun() is done only once.
Previously, it was unsafe to execute a plan in parallel if
ExecutorRun() might be called with a non-zero row count.  However,
it's quite easy to fix things up so that we can support that case,
provided that it is known that we will never call ExecutorRun() a
second time for the same QueryDesc.  Add infrastructure to signal
this, and cross-checks to make sure that a caller who claims this is
true doesn't later reneg.

While that pattern never happens with queries received directly from a
client -- there's no way to know whether multiple Execute messages
will be sent unless the first one requests all the rows -- it's pretty
common for queries originating from procedural languages, which often
limit the result to a single tuple or to a user-specified number of
tuples.

This commit doesn't actually enable parallelism in any additional
cases, because currently none of the places that would be able to
benefit from this infrastructure pass CURSOR_OPT_PARALLEL_OK in the
first place, but it makes it much more palatable to pass
CURSOR_OPT_PARALLEL_OK in places where we currently don't, because it
eliminates some cases where we'd end up having to run the parallel
plan serially.

Patch by me, based on some ideas from Rafia Sabih and corrected by
Rafia Sabih based on feedback from Dilip Kumar and myself.

Discussion: http://postgr.es/m/CA+TgmobXEhvHbJtWDuPZM9bVSLiTj-kShxQJ2uM5GPDze9fRYA@mail.gmail.com
2017-03-23 13:14:36 -04:00
Robert Haas d3cc37f1d8 Don't scan partitioned tables.
Partitioned tables do not contain any data; only their unpartitioned
descendents need to be scanned.  However, the partitioned tables still
need to be locked, even though they're not scanned.  To make that
work, Append and MergeAppend relations now need to carry a list of
(unscanned) partitioned relations that must be locked, and InitPlan
must lock all partitioned result relations.

Aside from the obvious advantage of avoiding some work at execution
time, this has two other advantages.  First, it may improve the
planner's decision-making in some cases since the empty relation
might throw things off.  Second, it paves the way to getting rid of
the storage for partitioned tables altogether.

Amit Langote, reviewed by me.

Discussion: http://postgr.es/m/6837c359-45c4-8044-34d1-736756335a15@lab.ntt.co.jp
2017-03-21 09:48:04 -04:00
Peter Eisentraut a47b38c9ee Spelling fixes
From: Josh Soref <jsoref@gmail.com>
2017-03-14 12:58:39 -04:00
Noah Misch 3a0d473192 Use wrappers of PG_DETOAST_DATUM_PACKED() more.
This makes almost all core code follow the policy introduced in the
previous commit.  Specific decisions:

- Text search support functions with char* and length arguments, such as
  prsstart and lexize, may receive unaligned strings.  I doubt
  maintainers of non-core text search code will notice.

- Use plain VARDATA() on values detoasted or synthesized earlier in the
  same function.  Use VARDATA_ANY() on varlenas sourced outside the
  function, even if they happen to always have four-byte headers.  As an
  exception, retain the universal practice of using VARDATA() on return
  values of SendFunctionCall().

- Retain PG_GETARG_BYTEA_P() in pageinspect.  (Page images are too large
  for a one-byte header, so this misses no optimization.)  Sites that do
  not call get_page_from_raw() typically need the four-byte alignment.

- For now, do not change btree_gist.  Its use of four-byte headers in
  memory is partly entangled with storage of 4-byte headers inside
  GBT_VARKEY, on disk.

- For now, do not change gtrgm_consistent() or gtrgm_distance().  They
  incorporate the varlena header into a cache, and there are multiple
  credible implementation strategies to consider.
2017-03-12 19:35:34 -04:00
Tom Lane 5d3f7c57ab Remove dead code in nodeGatherMerge.c.
Coverity noted that the last line of gather_merge_getnext() was
unreachable, since each arm of the preceding "if" ends in a "return".
Drop it as an oversight.  In passing, improve some nearby comments.
2017-03-12 15:52:50 -04:00
Andres Freund ce38949ba2 Improve expression evaluation test coverage.
Upcoming patches are revamping expression evaluation significantly. It
therefore seems prudent to try to ensure that the coverage of the
existing evaluation code is high.

This commit adds coverage for the cases that can reasonably be
tested. There's still a bunch of unreachable error messages and such,
but otherwise this achieves nearly full regression test coverage (with
the exception of the unused GetAttributeByNum/GetAttributeByName).

Author: Andres Freund
Discussion: https://postgr.es/m/20170310194021.ek4bs4bl2khxkmll@alap3.anarazel.de
2017-03-11 15:41:34 -08:00
Robert Haas 355d3993c5 Add a Gather Merge executor node.
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers.  (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)

This works out to a win if it saves us from having to perform an
expensive Sort.  In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't.  But if this avoids an expensive sort then it's a win.

Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.

Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
2017-03-09 07:49:29 -05:00
Tom Lane 15d03e5976 Silence compiler warnings in BitmapHeapNext().
Same disease as 270d7dd8a5.
2017-03-08 12:43:39 -05:00
Robert Haas f35742ccb7 Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.

Dilip Kumar, with some corrections and cosmetic changes by me.  The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.

Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 12:05:43 -05:00
Alvaro Herrera fcec6caafa Support XMLTABLE query expression
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.

This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE.  (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).

The implemented syntax deviates slightly from what the standard
requires.  First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only.  Second, we don't
currently support a default namespace to be specified.

This implementation relies on a new executor node based on a hardcoded
method table.  (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)

Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
2017-03-08 12:40:26 -03:00
Robert Haas 09529a70bb Fix parallel index and index-only scans to fall back to serial.
Parallel executor nodes can't assume that parallel execution will
happen in every case where the plan calls for it, because it might
not work out that way.  However, parallel index scan and parallel
index-only scan failed to do the right thing here.  Repair.

Amit Kapila, per a report from me.

Discussion: http://postgr.es/m/CAA4eK1Kq5qb_u2AOoda5XBB91vVWz90w=LgtRLgsssriS8pVTw@mail.gmail.com
2017-03-08 08:15:24 -05:00
Robert Haas 98e6e89040 tidbitmap: Support shared iteration.
When a shared iterator is used, each call to tbm_shared_iterate()
returns a result that has not yet been returned to any process
attached to the shared iterator.  In other words, each cooperating
processes gets a disjoint subset of the full result set, but all
results are returned exactly once.

This is infrastructure for parallel bitmap heap scan.

Dilip Kumar.  The larger patch set of which this is a part has been
reviewed and tested by (at least) Andres Freund, Amit Khandekar,
Tushar Ahuja, Rafia Sabih, Haribabu Kommi, and Thomas Munro.

Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 08:09:38 -05:00
Robert Haas 5a73e17317 Improve error reporting for tuple-routing failures.
Currently, the whole row is shown without column names.  Instead,
adopt a style similar to _bt_check_unique() in ExecFindPartition()
and show the failing key: (key1, ...) = (val1, ...).

Amit Langote, per a complaint from Simon Riggs.  Reviewed by me;
I also adjusted the grammar in one of the comments.

Discussion: http://postgr.es/m/9f9dc7ae-14f0-4a25-5485-964d9bfc19bd@lab.ntt.co.jp
2017-03-03 09:09:52 +05:30
Robert Haas 9e0fe09fc5 Refactor bitmap heap scan in preparation for parallel support.
The final patch will be less messy if the prefetching support is
a bit better isolated, so do that.

Dilip Kumar, with some changes by me.  The larger patch set of which
this is a part has been reviewed and tested by (at least) Andres
Freund, Amit Khandekar, Tushar Ahuja, Rafia Sabih, Haribabu Kommi, and
Thomas Munro.
2017-03-02 18:47:40 +05:30
Peter Eisentraut eb75f4fced Use proper enum constants for LockWaitPolicy 2017-02-28 13:28:17 -05:00
Tom Lane 9b88f27cb4 Allow index AMs to return either HeapTuple or IndexTuple format during IOS.
Previously, only IndexTuple format was supported for the output data of
an index-only scan.  This is fine for btree, which is just returning a
verbatim index tuple anyway.  It's not so fine for SP-GiST, which can
return reconstructed data that's much larger than a page.

To fix, extend the index AM API so that index-only scan data can be
returned in either HeapTuple or IndexTuple format.  There's other ways
we could have done it, but this way avoids an API break for index AMs
that aren't concerned with the issue, and it costs little except a couple
more fields in IndexScanDescs.

I changed both GiST and SP-GiST to use the HeapTuple method.  I'm not
very clear on whether GiST can reconstruct data that's too large for an
IndexTuple, but that seems possible, and it's not much of a code change to
fix.

Per a complaint from Vik Fearing.  Reviewed by Jason Li.

Discussion: https://postgr.es/m/49527f79-530d-0bfe-3dad-d183596afa92@2ndquadrant.fr
2017-02-27 17:20:34 -05:00
Robert Haas a315b967cc Allow custom and foreign scans to have shutdown callbacks.
This is expected to be useful mostly when performing such scans in
parallel, because in that case it allows (in combination with commit
acf555bc53) nodes below a Gather to get
control just before the DSM segment goes away.

KaiGai Kohei, except that I rewrote the documentation.  Reviewed by
Claudio Freire.

Discussion: http://postgr.es/m/CADyhKSXJK0jUJ8rWv4AmKDhsUh124_rEn39eqgfC5D8fu6xVuw@mail.gmail.com
2017-02-26 13:41:12 +05:30
Tom Lane c29aff959d Consistently declare timestamp variables as TimestampTz.
Twiddle the replication-related code so that its timestamp variables
are declared TimestampTz, rather than the uninformative "int64" that
was previously used for meant-to-be-always-integer timestamps.
This resolves the int64-vs-TimestampTz declaration inconsistencies
introduced by commit 7c030783a, though in the opposite direction to
what was originally suggested.

This required including datatype/timestamp.h in a couple more places
than before.  I decided it would be a good idea to slim down that
header by not having it pull in <float.h> etc, as those headers are
no longer at all relevant to its purpose.  Unsurprisingly, a small number
of .c files turn out to have been depending on those inclusions, so add
them back in the .c files as needed.

Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
Discussion: https://postgr.es/m/27694.1487456324@sss.pgh.pa.us
2017-02-23 15:57:08 -05:00
Robert Haas 4c728f3829 Pass the source text for a parallel query to the workers.
With this change, you can see the query that a parallel worker is
executing in pg_stat_activity, and if the worker crashes you can
see what query it was executing when it crashed.

Rafia Sabih, reviewed by Kuntal Ghosh and Amit Kapila and slightly
revised by me.
2017-02-22 12:18:29 +05:30
Robert Haas acf555bc53 Shut down Gather's children before shutting down Gather itself.
It turns out that the original shutdown order here does not work well.
Multiple people attempting to develop further parallel query patches
have discovered that they need to do cleanup before the DSM goes away,
and you can't do that if the parent node gets cleaned up first.

Patch by me, reviewed by KaiGai Kohei and Dilip Kumar.

Discussion: http://postgr.es/m/CA+TgmoY6bOc1YnhcAQnMfCBDbsJzROQ3sYxSAL-SYB5tMJcTKg@mail.gmail.com
Discussion: http://postgr.es/m/9A28C8860F777E439AA12E8AEA7694F8012AEB82@BPXM15GP.gisp.nec.co.jp
Discussion: http://postgr.es/m/CA+TgmoYuPOc=+xrG1v0fCsoLbKAab9F1ddOeaaiLMzKOiBar1Q@mail.gmail.com
2017-02-22 08:08:07 +05:30
Robert Haas 0414b26bac Add optimizer and executor support for parallel index-only scans.
Commit 5262f7a4fc added similar support
for parallel index scans; this extends that work to index-only scans.
As with parallel index scans, this requires support from the index AM,
so currently parallel index-only scans will only be possible for btree
indexes.

Rafia Sabih, reviewed and tested by Rahila Syed, Tushar Ahuja,
and Amit Kapila

Discussion: http://postgr.es/m/CAOGQiiPEAs4C=TBp0XShxBvnWXuzGL2u++Hm1=qnCpd6_Mf8Fw@mail.gmail.com
2017-02-19 15:57:55 +05:30
Tom Lane f2ec57dee9 Make sure that hash join's bulk-tuple-transfer loops are interruptible.
The loops in ExecHashJoinNewBatch(), ExecHashIncreaseNumBatches(), and
ExecHashRemoveNextSkewBucket() are all capable of iterating over many
tuples without ever doing a CHECK_FOR_INTERRUPTS, so that the backend
might fail to respond to SIGINT or SIGTERM for an unreasonably long time.
Fix that.  In the case of ExecHashJoinNewBatch(), it seems useful to put
the added CHECK_FOR_INTERRUPTS into ExecHashJoinGetSavedTuple() rather
than directly in the loop, because that will also ensure that both
principal code paths through ExecHashJoinOuterGetTuple() will do a
CHECK_FOR_INTERRUPTS, which seems like a good idea to avoid surprises.

Back-patch to all supported branches.

Tom Lane and Thomas Munro

Discussion: https://postgr.es/m/6044.1487121720@sss.pgh.pa.us
2017-02-15 16:40:05 -05:00
Robert Haas 5262f7a4fc Add optimizer and executor support for parallel index scans.
In combination with 569174f1be, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes.  This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.

Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
2017-02-15 13:53:24 -05:00
Robert Haas 5e6d8d2bbb Allow parallel workers to execute subplans.
This doesn't do anything to make Param nodes anything other than
parallel-restricted, so this only helps with uncorrelated subplans,
and it's not necessarily very cheap because each worker will run the
subplan separately (just as a Hash Join will build a separate copy of
the hash table in each participating process), but it's a first step
toward supporting cases that are more likely to help in practice, and
is occasionally useful on its own.

Amit Kapila, reviewed and tested by Rafia Sabih, Dilip Kumar, and
me.

Discussion: http://postgr.es/m/CAA4eK1+e8Z45D2n+rnDMDYsVEb5iW7jqaCH_tvPMYau=1Rru9w@mail.gmail.com
2017-02-14 18:16:03 -05:00
Robert Haas 72257f9578 simplehash: Additional tweaks to make specifying an allocator work.
Even if we don't emit definitions for SH_ALLOCATE and SH_FREE, we
still need prototypes.  The user can't define them before including
simplehash.h because SH_TYPE isn't available yet.

For the allocator to be able to access private_data, it needs to
become an argument to SH_CREATE.  Previously we relied on callers
to set that after returning from SH_CREATE, but SH_CREATE calls
SH_ALLOCATE before returning.

Dilip Kumar, reviewed by me.
2017-02-09 14:59:57 -05:00
Tom Lane 86d911ec0f Allow index AMs to cache data across aminsert calls within a SQL command.
It's always been possible for index AMs to cache data across successive
amgettuple calls within a single SQL command: the IndexScanDesc.opaque
field is meant for precisely that.  However, no comparable facility
exists for amortizing setup work across successive aminsert calls.
This patch adds such a feature and teaches GIN, GIST, and BRIN to use it
to amortize catalog lookups they'd previously been doing on every call.
(The other standard index AMs keep everything they need in the relcache,
so there's little to improve there.)

For GIN, the overall improvement in a statement that inserts many rows
can be as much as 10%, though it seems a bit less for the other two.
In addition, this makes a really significant difference in runtime
for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated
catalog lookups are vastly more expensive.

The reason this has been hard up to now is that the aminsert function is
not passed any useful place to cache per-statement data.  What I chose to
do is to add suitable fields to struct IndexInfo and pass that to aminsert.
That's not widening the index AM API very much because IndexInfo is already
within the ken of ambuild; in fact, by passing the same info to aminsert
as to ambuild, this is really removing an inconsistency in the AM API.

Discussion: https://postgr.es/m/27568.1486508680@sss.pgh.pa.us
2017-02-09 11:52:12 -05:00
Robert Haas c3c4f6e174 Revise the way the element allocator for a simplehash is specified.
This method is more elegant and more efficient.

Per a suggestion from Andres Freund, who also briefly reviewed
the patch.
2017-02-07 17:10:08 -05:00
Robert Haas 565903af47 Allow the element allocator for a simplehash to be specified.
This is infrastructure for a pending patch to allow parallel bitmap
heap scans.

Dilip Kumar, reviewed (in earlier versions) by Andres Freund and
(more recently) by me.  Some further renaming by me, also.
2017-02-07 16:01:44 -05:00
Heikki Linnakangas 181bdb90ba Fix typos in comments.
Backpatch to all supported versions, where applicable, to make backpatching
of future fixes go more smoothly.

Josh Soref

Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
2017-02-06 11:33:58 +02:00
Robert Haas 6f4b4ceefa Remove redundant comment.
Rafia Sabih
2017-02-03 19:05:49 -05:00
Tom Lane 7afd56c3c6 Use castNode() in a bunch of statement-list-related code.
When I wrote commit ab1f0c822, I really missed the castNode() macro that
Peter E. had proposed shortly before.  This back-fills the uses I would
have put it to.  It's probably not all that significant, but there are
more assertions here than there were before, and conceivably they will
help catch any bugs associated with those representation changes.

I left behind a number of usages like "(Query *) copyObject(query_var)".
Those could have been converted as well, but Peter has proposed another
notational improvement that would handle copyObject cases automatically,
so I let that be for now.
2017-01-26 22:09:34 -05:00
Andres Freund 9ba8a9ce45 Use the new castNode() macro in a number of places.
This is far from a pervasive conversion, but it's a good starting
point.

Author: Peter Eisentraut, with some minor changes by me
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/c5d387d9-3440-f5e0-f9d4-71d53b9fbe52@2ndquadrant.com
2017-01-26 16:47:03 -08:00
Peter Eisentraut 3d9e73ea5f Update copyright years in some recently added files 2017-01-25 12:32:05 -05:00
Robert Haas 587cda35ca Fix things so that updatable views work with partitioned tables.
Previously, ExecInitModifyTable was missing handling for WITH CHECK
OPTION, and view_query_is_auto_updatable was missing handling for
RELKIND_PARTITIONED_TABLE.

Amit Langote, reviewed by me.
2017-01-24 15:46:50 -05:00
Robert Haas 132488bfee Set ecxt_scantuple correctly for tuple routing.
In 2ac3ef7a01, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successively lower levels.  If we do end up changing the slot
from the original, we must update ecxt_scantuple to point to the new one
for partition key of the tuple to be computed correctly.

Reported by Rajkumar Raghuwanshi.  Patch by Amit Langote.

Discussion: http://postgr.es/m/CAKcux6%3Dm1qyqB2k6cjniuMMrYXb75O-MB4qGQMu8zg-iGGLjDw%40mail.gmail.com
2017-01-24 15:34:39 -05:00
Robert Haas 27cdb3414b Reindent table partitioning code.
We've accumulated quite a bit of stuff with which pgindent is not
quite happy in this code; clean it up to provide a less-annoying base
for future pgindent runs.
2017-01-24 10:20:02 -05:00
Tom Lane 0a8b9d3b2c Remove no-longer-needed loop in ExecGather().
Coverity complained quite properly that commit ea15e1867 had introduced
unreachable code into ExecGather(); to wit, it was no longer possible to
iterate the final for-loop more or less than once.  So remove the for().

In passing, clean up a couple of comments, and make better use of a local
variable.
2017-01-22 11:47:38 -05:00
Peter Eisentraut 665d1fad99 Logical replication
- Add PUBLICATION catalogs and DDL
- Add SUBSCRIPTION catalog and DDL
- Define logical replication protocol and output plugin
- Add logical replication workers

From: Petr Jelinek <petr@2ndquadrant.com>
Reviewed-by: Steve Singer <steve@ssinger.info>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Erik Rijkers <er@xs4all.nl>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
2017-01-20 09:04:49 -05:00
Andres Freund ea15e18677 Remove obsoleted code relating to targetlist SRF evaluation.
Since 69f4b9c plain expression evaluation (and thus normal projection)
can't return sets of tuples anymore. Thus remove code dealing with
that possibility.

This will require adjustments in external code using
ExecEvalExpr()/ExecProject() - that should neither be hard nor very
common.

Author: Andres Freund and Tom Lane
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-19 14:40:41 -08:00
Robert Haas 05bd889904 Fix RETURNING to work correctly with partition tuple routing.
In ExecInsert(), do not switch back to the root partitioned table
ResultRelInfo until after we finish ExecProcessReturning(), so that
RETURNING projection is done using the partition's descriptor.  For
the projection to work correctly, we must initialize the same for each
leaf partition during ModifyTableState initialization.

Amit Langote
2017-01-19 13:20:11 -05:00
Robert Haas 39162b2030 Fix failure to enforce partitioning contraint for internal partitions.
When a tuple is inherited into a partitioning root, no partition
constraints need to be enforced; when it is inserted into a leaf, the
parent's partitioning quals needed to be enforced.  The previous
coding got both of those cases right.  When a tuple is inserted into
an intermediate level of the partitioning hierarchy (i.e. a table
which is both a partition itself and in turn partitioned), it must
enforce the partitioning qual inherited from its parent.  That case
got overlooked; repair.

Amit Langote
2017-01-19 12:30:27 -05:00
Andres Freund 69f4b9c85f Move targetlist SRF handling from expression evaluation to new executor node.
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.

This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.

That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.

To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist.  The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes.  The ProjectSet nodes always get
input from an underlying node.

We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.

While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e.  continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones.  We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.

As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.

Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node.  As that's quite confusing, we're
likely to instead prohibit SRFs in those places.  But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.

There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.

Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-18 13:40:27 -08:00
Peter Eisentraut 352a24a1f9 Generate fmgr prototypes automatically
Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains
prototypes for all functions registered in pg_proc.h.  This avoids
having to manually maintain these prototypes across a random variety of
header files.  It also automatically enforces a correct function
signature, and since there are warnings about missing prototypes, it
will detect functions that are defined but not registered in
pg_proc.h (or otherwise used).

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2017-01-17 14:06:07 -05:00
Tom Lane ab1f0c8225 Change representation of statement lists, and add statement location info.
This patch makes several changes that improve the consistency of
representation of lists of statements.  It's always been the case
that the output of parse analysis is a list of Query nodes, whatever
the types of the individual statements in the list.  This patch brings
similar consistency to the outputs of raw parsing and planning steps:

* The output of raw parsing is now always a list of RawStmt nodes;
the statement-type-dependent nodes are one level down from that.

* The output of pg_plan_queries() is now always a list of PlannedStmt
nodes, even for utility statements.  In the case of a utility statement,
"planning" just consists of wrapping a CMD_UTILITY PlannedStmt around
the utility node.  This list representation is now used in Portal and
CachedPlan plan lists, replacing the former convention of intermixing
PlannedStmts with bare utility-statement nodes.

Now, every list of statements has a consistent head-node type depending
on how far along it is in processing.  This allows changing many places
that formerly used generic "Node *" pointers to use a more specific
pointer type, thus reducing the number of IsA() tests and casts needed,
as well as improving code clarity.

Also, the post-parse-analysis representation of DECLARE CURSOR is changed
so that it looks more like EXPLAIN, PREPARE, etc.  That is, the contained
SELECT remains a child of the DeclareCursorStmt rather than getting flipped
around to be the other way.  It's now true for both Query and PlannedStmt
that utilityStmt is non-null if and only if commandType is CMD_UTILITY.
That allows simplifying a lot of places that were testing both fields.
(I think some of those were just defensive programming, but in many places,
it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.)

Because PlannedStmt carries a canSetTag field, we're also able to get rid
of some ad-hoc rules about how to reconstruct canSetTag for a bare utility
statement; specifically, the assumption that a utility is canSetTag if and
only if it's the only one in its list.  While I see no near-term need for
relaxing that restriction, it's nice to get rid of the ad-hocery.

The API of ProcessUtility() is changed so that what it's passed is the
wrapper PlannedStmt not just the bare utility statement.  This will affect
all users of ProcessUtility_hook, but the changes are pretty trivial; see
the affected contrib modules for examples of the minimum change needed.
(Most compilers should give pointer-type-mismatch warnings for uncorrected
code.)

There's also a change in the API of ExplainOneQuery_hook, to pass through
cursorOptions instead of expecting hook functions to know what to pick.
This is needed because of the DECLARE CURSOR changes, but really should
have been done in 9.6; it's unlikely that any extant hook functions
know about using CURSOR_OPT_PARALLEL_OK.

Finally, teach gram.y to save statement boundary locations in RawStmt
nodes, and pass those through to Query and PlannedStmt nodes.  This allows
more intelligent handling of cases where a source query string contains
multiple statements.  This patch doesn't actually do anything with the
information, but a follow-on patch will.  (Passing this information through
cleanly is the true motivation for these changes; while I think this is all
good cleanup, it's unlikely we'd have bothered without this end goal.)

catversion bump because addition of location fields to struct Query
affects stored rules.

This patch is by me, but it owes a good deal to Fabien Coelho who did
a lot of preliminary work on the problem, and also reviewed the patch.

Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
2017-01-14 16:02:35 -05:00
Tom Lane 75abb955df Throw suitable error for COPY TO STDOUT/FROM STDIN in a SQL function.
A client copy can't work inside a function because the FE/BE wire protocol
doesn't support nesting of a COPY operation within query results.  (Maybe
it could, but the protocol spec doesn't suggest that clients should support
this, and libpq for one certainly doesn't.)

In most PLs, this prohibition is enforced by spi.c, but SQL functions don't
use SPI.  A comparison of _SPI_execute_plan() and init_execution_state()
shows that rejecting client COPY is the only discrepancy in what they
allow, so there's no other similar bugs.

This is an astonishingly ancient oversight, so back-patch to all supported
branches.

Report: https://postgr.es/m/BY2PR05MB2309EABA3DEFA0143F50F0D593780@BY2PR05MB2309.namprd05.prod.outlook.com
2017-01-14 13:27:47 -05:00
Robert Haas 76568d3786 Fix incorrect function name in comment.
Amit Langote
2017-01-12 09:05:14 -05:00
Robert Haas 0355e6f310 Repair commit b81b5a96f4.
This commit purported to use a variable hash seed for Partial
HashAggregate, but actually did the opposite - it made us use a
variable seed for any HashAggregate that is NOT partial.  Woops.
2017-01-06 09:34:26 -05:00
Robert Haas 175ff6598e Fix possible crash reading pg_stat_activity.
With the old code, a backend that read pg_stat_activity without ever
having executed a parallel query might see a backend in the midst of
executing one waiting on a DSA LWLock, resulting in a crash.  The
solution is for backends to register the tranche at startup time, not
the first time a parallel query is executed.

Report by Andreas Seltenreich.  Patch by me, reviewed by Thomas Munro.
2017-01-05 12:27:09 -05:00
Robert Haas 18fc5192a6 Remove unnecessary arguments from partitioning functions.
RelationGetPartitionQual() and generate_partition_qual() are always
called with recurse = true, so we don't need an argument for that.

Extracted by me from a larger patch by Amit Langote.
2017-01-04 14:56:37 -05:00
Robert Haas f1b4c771ea Fix reporting of constraint violations for table partitioning.
After a tuple is routed to a partition, it has been converted from the
root table's row type to the partition's row type.  ExecConstraints
needs to report the failure using the original tuple and the parent's
tuple descriptor rather than the ones for the selected partition.

Amit Langote
2017-01-04 14:36:34 -05:00