Commit Graph

2576 Commits

Author SHA1 Message Date
Tom Lane 57cd2b6e6d Fix create_scan_plan's handling of sortgrouprefs for physical tlists.
We should only run apply_pathtarget_labeling_to_tlist if CP_LABEL_TLIST
was specified, because only in that case has use_physical_tlist checked
that the labeling will succeed; otherwise we may get an "ORDER/GROUP BY
expression not found in targetlist" error.  (This subsumes the previous
test about gating_clauses, because we reset "flags" to zero earlier
if there are gating clauses to apply.)

The only known case in which a failure can occur is with a ProjectSet
path directly atop a table scan path, although it seems likely that there
are other cases or will be such in future.  This means that the failure
is currently only visible in the v10 branch: 9.6 didn't have ProjectSet,
while in v11 and HEAD, apply_scanjoin_target_to_paths for some weird
reason is using create_projection_path not apply_projection_to_path,
masking the problem because there's a ProjectionPath in between.

Nonetheless this code is clearly wrong on its own terms, so back-patch
to 9.6 where this logic was introduced.

Per report from Regina Obe.

Discussion: https://postgr.es/m/001501d40f88$75186950$5f493bf0$@pcorp.us
2018-07-11 15:25:28 -04:00
Tom Lane ff4f889164 Fix bugs with degenerate window ORDER BY clauses in GROUPS/RANGE mode.
nodeWindowAgg.c failed to cope with the possibility that no ordering
columns are defined in the window frame for GROUPS mode or RANGE OFFSET
mode, leading to assertion failures or odd errors, as reported by Masahiko
Sawada and Lukas Eder.  In RANGE OFFSET mode, an ordering column is really
required, so add an Assert about that.  In GROUPS mode, the code would
work, except that the node initialization code wasn't in sync with the
execution code about when to set up tuplestore read pointers and spare
slots.  Fix the latter for consistency's sake (even though I think the
changes described below make the out-of-sync cases unreachable for now).

Per SQL spec, a single ordering column is required for RANGE OFFSET mode,
and at least one ordering column is required for GROUPS mode.  The parser
enforced the former but not the latter; add a check for that.

We were able to reach the no-ordering-column cases even with fully spec
compliant queries, though, because the planner would drop partitioning
and ordering columns from the generated plan if they were redundant with
earlier columns according to the redundant-pathkey logic, for instance
"PARTITION BY x ORDER BY y" in the presence of a "WHERE x=y" qual.
While in principle that's an optimization that could save some pointless
comparisons at runtime, it seems unlikely to be meaningful in the real
world.  I think this behavior was not so much an intentional optimization
as a side-effect of an ancient decision to construct the plan node's
ordering-column info by reverse-engineering the PathKeys of the input
path.  If we give up redundant-column removal then it takes very little
code to generate the plan node info directly from the WindowClause,
ensuring that we have the expected number of ordering columns in all
cases.  (If anyone does complain about this, the planner could perhaps
be taught to remove redundant columns only when it's safe to do so,
ie *not* in RANGE OFFSET mode.  But I doubt anyone ever will.)

With these changes, the WindowAggPath.winpathkeys field is not used for
anything anymore, so remove it.

The test cases added here are not actually very interesting given the
removal of the redundant-column-removal logic, but they would represent
important corner cases if anyone ever tries to put that back.

Tom Lane and Masahiko Sawada.  Back-patch to v11 where RANGE OFFSET
and GROUPS modes were added.

Discussion: https://postgr.es/m/CAD21AoDrWqycq-w_+Bx1cjc+YUhZ11XTj9rfxNiNDojjBx8Fjw@mail.gmail.com
Discussion: https://postgr.es/m/153086788677.17476.8002640580496698831@wrigleys.postgresql.org
2018-07-11 12:07:20 -04:00
Alvaro Herrera f2c587067a Rethink how to get float.h in old Windows API for isnan/isinf
We include <float.h> in every place that needs isnan(), because MSVC
used to require it.  However, since MSVC 2013 that's no longer necessary
(cf. commit cec8394b5c), so we can retire the inclusion to a
version-specific stanza in win32_port.h, where it doesn't need to
pollute random .c files.  The header is of course still needed in a few
places for other reasons.

I (Álvaro) removed float.h from a few more files than in Emre's original
patch.  This doesn't break the build in my system, but we'll see what
the buildfarm has to say about it all.

Author: Emre Hasegeli
Discussion: https://postgr.es/m/CAE2gYzyc0+5uG+Cd9-BSL7NKC8LSHLNg1Aq2=8ubjnUwut4_iw@mail.gmail.com
2018-07-11 09:11:48 -04:00
Peter Eisentraut bcbd940806 Remove dynamic_shared_memory_type=none
PostgreSQL nowadays offers some kind of dynamic shared memory feature on
all supported platforms.  Having the choice of "none" prevents us from
relying on DSM in core features.  So this patch removes the choice of
"none".

Author: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
2018-07-10 18:35:24 +02:00
Michael Paquier fc057b2b8f Remove dead code for temporary relations in partition planning
Since recent commit 1c7c317c, temporary relations cannot be mixed with
permanent relations within the same partition tree, and the same counts
for temporary relations created by other sessions, which the planner
simply discarded.  Instead be paranoid and issue an error, as those
should be blocked at definition time, at least for now.

At the same time, a test case is added to stress what has been moved
when expand_partitioned_rtentry gets called recursively but bumps on a
partitioned relation with no partitions which should be handled the same
way as the non-inheritance case.  This code may be reworked in a close
future, and covering this code path will limit surprises.

Reported-by: David Rowley
Author: David Rowley
Reviewed-by: Amit Langote, Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/CAKJS1f_HyV1txn_4XSdH5EOhBMYaCwsXyAj6bHXk9gOu4JKsbw@mail.gmail.com
2018-07-04 10:37:40 +09:00
Alvaro Herrera 7d872c91a3 Allow direct lookups of AppendRelInfo by child relid
find_appinfos_by_relids had quite a large overhead when the number of
items in the append_rel_list was high, as it had to trawl through the
append_rel_list looking for AppendRelInfos belonging to the given
childrelids.  Since there can only be a single AppendRelInfo for each
child rel, it seems much better to store an array in PlannerInfo which
indexes these by child relid, making the function O(1) rather than O(N).
This function was only called once inside the planner, so just replace
that call with a lookup to the new array.  find_childrel_appendrelinfo
is now unused and thus removed.

This fixes a planner performance regression new to v11 reported by
Thomas Reiss.

Author: David Rowley
Reported-by: Thomas Reiss
Reviewed-by: Ashutosh Bapat
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/94dd7a4b-5e50-0712-911d-2278e055c622@dalibo.com
2018-06-26 10:35:26 -04:00
Robert Haas c6f28af5d7 Avoid generating bogus paths with partitionwise aggregate.
Previously, if some or all partitions had no partially aggregated path,
we would still try to generate a partially aggregated path for the
parent, leading to assertion failures or wrong answers.

Report by Rajkumar Raghuwanshi.  Patch by Jeevan Chalke, reviewed
by Ashutosh Bapat.  A few changes by me.

Discussion: http://postgr.es/m/CAKcux6=q4+Mw8gOOX16ef6ZMFp9Cve7KWFstUsrDa4GiFaXGUQ@mail.gmail.com
2018-06-22 09:20:19 -04:00
Amit Kapila 98d476a965 Improve coding pattern in Parallel Append code.
The create_append_path code didn't consider that list_concat will
modify it's first argument leading to inconsistent traversal of
resulting list.  In practice, it won't lead to any user-visible bug
but changing it for making the code behave consistently.

Reported-by: Tom Lane
Author: Tom Lane
Reviewed-by: Amit Khandekar and Amit Kapila
Discussion: https://postgr.es/m/32365.1528994120@sss.pgh.pa.us
2018-06-22 08:43:36 +05:30
Tom Lane 07e5a21352 Fix mishandling of sortgroupref labels while splitting SRF targetlists.
split_pathtarget_at_srfs() neglected to worry about sortgroupref labels
in the intermediate PathTargets it constructs.  I think we'd supposed
that their labeling didn't matter, but it does at least for the case that
GroupAggregate/GatherMerge nodes appear immediately under the ProjectSet
step(s).  This results in "ERROR: ORDER/GROUP BY expression not found in
targetlist" during create_plan(), as reported by Rajkumar Raghuwanshi.

To fix, make this logic track the sortgroupref labeling of expressions,
not just their contents.  This also restores the pre-v10 behavior that
separate GROUP BY expressions will be kept distinct even if they are
textually equal().

Discussion: https://postgr.es/m/CAKcux6=1_Ye9kx8YLBPmJs_xE72PPc6vNi5q2AOHowMaCWjJ2w@mail.gmail.com
2018-06-21 10:58:42 -04:00
Amit Kapila 403318b71f Don't consider parallel append for parallel unsafe paths.
Commit ab72716778 allowed Parallel Append paths to be generated for a
relation that is not parallel safe.  Prevent that from happening.

Initial analysis by Tom Lane.

Reported-by: Rajkumar Raghuwanshi
Author: Amit Kapila and Rajkumar Raghuwanshi
Reviewed-by: Amit Khandekar and Robert Haas
Discussion:https://postgr.es/m/CAKcux6=tPJ6nJ08r__nU_pmLQiC0xY15Fn0HvG1Cprsjdd9s_Q@mail.gmail.com
2018-06-20 07:51:42 +05:30
Tom Lane 4e23236403 Improve commentary about run-time partition pruning data structures.
No code changes except for a couple of new Asserts.

David Rowley and Tom Lane

Discussion: https://postgr.es/m/CAKJS1f-6GODRNgEtdPxCnAPme2h2hTztB6LmtfdmcYAAOE0kQg@mail.gmail.com
2018-06-11 17:35:53 -04:00
Tom Lane 939449de0e Relocate partition pruning structs to a saner place.
These struct definitions were originally dropped into primnodes.h,
which is a poor choice since that's mainly intended for primitive
expression node types; these are not in that category.  What they
are is auxiliary info in Plan trees, so move them to plannodes.h.

For consistency, also relocate some related code that was apparently
placed with the aid of a dartboard.

There's no interesting code changes in this commit, just reshuffling.

David Rowley and Tom Lane

Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
2018-06-10 16:30:14 -04:00
Magnus Hagander 848b1f3e35 Fix typo in README
Author: Daniel Gustafsson <daniel@yesql.se>
2018-06-07 14:40:38 +02:00
Heikki Linnakangas a0b37684ba Fix typo in comment. 2018-05-22 11:18:16 +03:00
Tom Lane a11b3bd37f Fix misprocessing of equivalence classes involving record_eq().
canonicalize_ec_expression() is supposed to agree with coerce_type() as to
whether a RelabelType should be inserted to make a subexpression be valid
input for the operators of a given opclass.  However, it did the wrong
thing with named-composite-type inputs to record_eq(): it put in a
RelabelType to RECORDOID, which the parser doesn't.  In some cases this was
harmless because all code paths involving a particular equivalence class
did the same thing, but in other cases this would result in failing to
recognize a composite-type expression as being a member of an equivalence
class that it actually is a member of.  The most obvious bad effect was to
fail to recognize that an index on a composite column could provide the
sort order needed for a mergejoin on that column, as reported by Teodor
Sigaev.  I think there might be other, subtler, cases that result in
misoptimization.  It also seems possible that an unwanted RelabelType
would sometimes get into an emitted plan --- but because record_eq and
friends don't examine the declared type of their input expressions, that
would not create any visible problems.

To fix, just treat RECORDOID as if it were a polymorphic type, which in
some sense it is.  We might want to consider formalizing that a bit more
someday, but for the moment this seems to be the only place where an
IsPolymorphicType() test ought to include RECORDOID as well.

This has been broken for a long time, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/a6b22369-e3bf-4d49-f59d-0c41d3551e81@sigaev.ru
2018-05-16 13:46:23 -04:00
Robert Haas 7fc7dac1a7 Pass the correct PlannerInfo to PlanForeignModify/PlanDirectModify.
Previously, we passed the toplevel PlannerInfo, but we actually want
to pass the relevant subroot.  One problem with passing the toplevel
PlannerInfo is that the FDW which wants to push down an UPDATE or
DELETE against a join won't find the relevant joinrel there.
As of commit 1bc0100d27, postgres_fdw
tries to do exactly this and can be made to fail an assertion as a
result.

It's possible that this should be regarded as a bug fix and
back-patched to earlier releases, but for lack of a test case that
fails in earlier releases, no back-patch for now.

Etsuro Fujita, reviewed by Amit Langote.

Discussion: http://postgr.es/m/5AF43E02.30000@lab.ntt.co.jp
2018-05-16 11:32:38 -04:00
Tom Lane bdf46af748 Post-feature-freeze pgindent run.
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
2018-04-26 14:47:16 -04:00
Robert Haas dc1057fcd8 Prevent generation of bogus subquery scan paths.
Commit 0927d2f46d didn't check that
consider_parallel was set for the target relation or account for
the possibility that required_outer might be non-empty.

To prevent future bugs of this ilk, add some assertions to
add_partial_path and do a bit of future-proofing of the code
recently added to recurse_set_operations.

Report by Andreas Seltenreich.  Patch by Jeevan Chalke.  Review
by Amit Kapila and by me.

Discussion: http://postgr.es/m/CAM2+6=U+9otsyF2fYB8x_2TBeHTR90itarqW=qAEjN-kHaC7kw@mail.gmail.com
2018-04-25 15:25:55 -04:00
Alvaro Herrera 055fb8d33d Add GUC enable_partition_pruning
This controls both plan-time and execution-time new-style partition
pruning.  While finer-grain control is possible (maybe using an enum GUC
instead of boolean), there doesn't seem to be much need for that.

This new parameter controls partition pruning for all queries:
trivially, SELECT queries that affect partitioned tables are naturally
under its control since they are using the new technology.  However,
while UPDATE/DELETE queries do not use the new code, we make the new GUC
control their behavior also (stealing control from
constraint_exclusion), because it is more natural, and it leads to a
more natural transition to the future in which those queries will also
use the new pruning code.

Constraint exclusion still controls pruning for regular inheritance
situations (those not involving partitioned tables).

Author: David Rowley
Review: Amit Langote, Ashutosh Bapat, Justin Pryzby, David G. Johnston
Discussion: https://postgr.es/m/CAKJS1f_0HwsxJG9m+nzU+CizxSdGtfe6iF_ykPYBiYft302DCw@mail.gmail.com
2018-04-23 17:57:43 -03:00
Tom Lane ec38dcd363 Tweak a couple of planner APIs to save recalculating join relids.
Discussion: https://postgr.es/m/f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se
2018-04-20 16:00:47 -04:00
Tom Lane c792c7db41 Change more places to be less trusting of RestrictInfo.is_pushed_down.
On further reflection, commit e5d83995e didn't go far enough: pretty much
everywhere in the planner that examines a clause's is_pushed_down flag
ought to be changed to use the more complicated behavior where we also
check the clause's required_relids.  Otherwise we could make incorrect
decisions about whether, say, a clause is safe to use as a hash clause.

Some (many?) of these places are safe as-is, either because they are
never reached while considering a parameterized path, or because there
are additional checks that would reject a pushed-down clause anyway.
However, it seems smarter to just code them all the same way rather
than rely on easily-broken reasoning of that sort.

In support of that, invent a new macro RINFO_IS_PUSHED_DOWN that should
be used in place of direct tests on the is_pushed_down flag.

Like the previous patch, back-patch to all supported branches.

Discussion: https://postgr.es/m/f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se
2018-04-20 15:19:16 -04:00
Tom Lane e5d83995e9 Fix incorrect handling of join clauses pushed into parameterized paths.
In some cases a clause attached to an outer join can be pushed down into
the outer join's RHS even though the clause is not degenerate --- this
can happen if we choose to make a parameterized path for the RHS.  If
the clause ends up attached to a lower outer join, we'd misclassify it
as being a "join filter" not a plain "filter" condition at that node,
leading to wrong query results.

To fix, teach extract_actual_join_clauses to examine each join clause's
required_relids, not just its is_pushed_down flag.  (The latter now
seems vestigial, or at least in need of rethinking, but we won't do
anything so invasive as redefining it in a bug-fix patch.)

This has been wrong since we introduced parameterized paths in 9.2,
though it's evidently hard to hit given the lack of previous reports.
The test case used here involves a lateral function call, and I think
that a lateral reference may be required to get the planner to select
a broken plan; though I wouldn't swear to that.  In any case, even if
LATERAL is needed to trigger the bug, it still affects all supported
branches, so back-patch to all.

Per report from Andreas Karlsson.  Thanks to Andrew Gierth for
preliminary investigation.

Discussion: https://postgr.es/m/f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se
2018-04-19 15:49:30 -04:00
Alvaro Herrera da6f3e45dd Reorganize partitioning code
There's been a massive addition of partitioning code in PostgreSQL 11,
with little oversight on its placement, resulting in a
catalog/partition.c with poorly defined boundaries and responsibilities.
This commit tries to set a couple of distinct modules to separate things
a little bit.  There are no code changes here, only code movement.

There are three new files:
  src/backend/utils/cache/partcache.c
  src/include/partitioning/partdefs.h
  src/include/utils/partcache.h

The previous arrangement of #including catalog/partition.h almost
everywhere is no more.

Authors: Amit Langote and Álvaro Herrera
Discussion: https://postgr.es/m/98e8d509-790a-128c-be7f-e48a5b2d8d97@lab.ntt.co.jp
	https://postgr.es/m/11aa0c50-316b-18bb-722d-c23814f39059@lab.ntt.co.jp
	https://postgr.es/m/143ed9a4-6038-76d4-9a55-502035815e68@lab.ntt.co.jp
	https://postgr.es/m/20180413193503.nynq7bnmgh6vs5vm@alvherre.pgsql
2018-04-14 21:12:14 -03:00
Peter Eisentraut a8677e3ff6 Support named and default arguments in CALL
We need to call expand_function_arguments() to expand named and default
arguments.

In PL/pgSQL, we also need to deal with named and default INOUT arguments
when receiving the output values into variables.

Author: Pavel Stehule <pavel.stehule@gmail.com>
2018-04-14 09:13:53 -04:00
Teodor Sigaev c266ed31a8 Cleanup covering infrastructure
- Explicitly forbids opclass, collation and indoptions (like DESC/ASC etc) for
  including columns. Throw an error if user points that.
- Truncated storage arrays for such attributes to store only key atrributes,
  added assertion checks.
- Do not check opfamily and collation for including columns in
  CompareIndexInfo()

Discussion: https://www.postgresql.org/message-id/5ee72852-3c4e-ee35-e2ed-c1d053d45c08@sigaev.ru
2018-04-12 16:37:22 +03:00
Simon Riggs 08ea7a2291 Revert MERGE patch
This reverts commits d204ef6377,
83454e3c2b and a few more commits thereafter
(complete list at the end) related to MERGE feature.

While the feature was fully functional, with sufficient test coverage and
necessary documentation, it was felt that some parts of the executor and
parse-analyzer can use a different design and it wasn't possible to do that in
the available time. So it was decided to revert the patch for PG11 and retry
again in the future.

Thanks again to all reviewers and bug reporters.

List of commits reverted, in reverse chronological order:

 f1464c5380 Improve parse representation for MERGE
 ddb4158579 MERGE syntax diagram correction
 530e69e59b Allow cpluspluscheck to pass by renaming variable
 01b88b4df5 MERGE minor errata
 3af7b2b0d4 MERGE fix variable warning in non-assert builds
 a5d86181ec MERGE INSERT allows only one VALUES clause
 4b2d44031f MERGE post-commit review
 4923550c20 Tab completion for MERGE
 aa3faa3c7a WITH support in MERGE
 83454e3c2b New files for MERGE
 d204ef6377 MERGE SQL Command following SQL:2016

Author: Pavan Deolasee
Reviewed-by: Michael Paquier
2018-04-12 11:22:56 +01:00
Tom Lane cefa387153 Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.
Traditionally, include/catalog/pg_foo.h contains extern declarations
for functions in backend/catalog/pg_foo.c, in addition to its function
as the authoritative definition of the pg_foo catalog's rowtype.
In some cases, we'd been forced to split out those extern declarations
into separate pg_foo_fn.h headers so that the catalog definitions
could be #include'd by frontend code.  That problem is gone as of
commit 9c0a0de4c, so let's undo the splits to make things less
confusing.

Discussion: https://postgr.es/m/23690.1523031777@sss.pgh.pa.us
2018-04-08 14:35:29 -04:00
Teodor Sigaev 02f3e558f2 match_clause_to_index should check only key columns
Alexander Korotkov per gripe from Tom Lane noticed on valgrind-enabled
buildfarm members
2018-04-08 19:58:15 +03:00
Alvaro Herrera 499be013de Support partition pruning at execution time
Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query.  This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions.

This commit adds support for pruning subnodes of Append which cannot
possibly contain any matching tuples, during execution, by evaluating
Params to determine the minimum set of subnodes that can possibly match.
We support more than just simple Params in WHERE clauses. Support
additionally includes:

1. Parameterized Nested Loop Joins: The parameter from the outer side of the
   join can be used to determine the minimum set of inner side partitions to
   scan.

2. Initplans: Once an initplan has been executed we can then determine which
   partitions match the value from the initplan.

Partition pruning is performed in two ways.  When Params external to the plan
are found to match the partition key we attempt to prune away unneeded Append
subplans during the initialization of the executor.  This allows us to bypass
the initialization of non-matching subplans meaning they won't appear in the
EXPLAIN or EXPLAIN ANALYZE output.

For parameters whose value is only known during the actual execution
then the pruning of these subplans must wait.  Subplans which are
eliminated during this stage of pruning are still visible in the EXPLAIN
output.  In order to determine if pruning has actually taken place, the
EXPLAIN ANALYZE must be viewed.  If a certain Append subplan was never
executed due to the elimination of the partition then the execution
timing area will state "(never executed)".  Whereas, if, for example in
the case of parameterized nested loops, the number of loops stated in
the EXPLAIN ANALYZE output for certain subplans may appear lower than
others due to the subplan having been scanned fewer times.  This is due
to the list of matching subnodes having to be evaluated whenever a
parameter which was found to match the partition key changes.

This commit required some additional infrastructure that permits the
building of a data structure which is able to perform the translation of
the matching partition IDs, as returned by get_matching_partitions, into
the list index of a subpaths list, as exist in node types such as
Append, MergeAppend and ModifyTable.  This allows us to translate a list
of clauses into a Bitmapset of all the subpath indexes which must be
included to satisfy the clause list.

Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
2018-04-07 17:54:39 -03:00
Teodor Sigaev 8224de4f42 Indexes with INCLUDE columns and their support in B-tree
This patch introduces INCLUDE clause to index definition.  This clause
specifies a list of columns which will be included as a non-key part in
the index.  The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans.  Also, such columns don't need to have
appropriate operator classes.  Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.

Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine.  For now, only B-tree indexes support INCLUDE clause.

In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys).  Therefore, B-tree indexes
now might have variable number of attributes.  This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid.  Free 13th bit of t_info is used for indicating
that.  This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn't need special
handling of B-tree indexes for that.

Bump catalog version

Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
			 David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru
2018-04-07 23:00:39 +03:00
Alvaro Herrera 9fdb675fc5 Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning.  The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.

At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.

This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.

Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 16:44:05 -03:00
Simon Riggs 4b2d44031f MERGE post-commit review
Review comments from Andres Freund

* Consolidate code into AfterTriggerGetTransitionTable()
* Rename nodeMerge.c to execMerge.c
* Rename nodeMerge.h to execMerge.h
* Move MERGE handling in ExecInitModifyTable()
  into a execMerge.c ExecInitMerge()
* Move mt_merge_subcommands flags into execMerge.h
* Rename opt_and_condition to opt_merge_when_and_condition
* Wordsmith various comments

Author: Pavan Deolasee
Reviewer: Simon Riggs
2018-04-05 09:54:07 +01:00
Simon Riggs d204ef6377 MERGE SQL Command following SQL:2016
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.

MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
  UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
  DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
  INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
  DO NOTHING;

MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.

MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.

MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.

Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.

This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.

Various issues reported via sqlsmith by Andreas Seltenreich

Authors: Pavan Deolasee, Simon Riggs
Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs

Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
2018-04-03 09:28:16 +01:00
Simon Riggs 7cf8a5c302 Revert "Modified files for MERGE"
This reverts commit 354f13855e.
2018-04-02 21:34:15 +01:00
Simon Riggs 354f13855e Modified files for MERGE 2018-04-02 21:12:47 +01:00
Robert Haas 7e0d64c7a5 postgres_fdw: Push down partition-wise aggregation.
Since commit 7012b132d0, postgres_fdw
has been able to push down the toplevel aggregation operation to the
remote server.  Commit e2f1eb0ee3 made
it possible to break down the toplevel aggregation into one
aggregate per partition.  This commit lets postgres_fdw push down
aggregation in that case just as it does at the top level.

In order to make this work, this commit adds an additional argument
to the GetForeignUpperPaths FDW API.  A matching argument is added
to the signature for create_upper_paths_hook.  Third-party code using
either of these will need to be updated.

Also adjust create_foreignscan_plan() so that it picks up the correct
set of relids in this case.

Jeevan Chalke, reviewed by Ashutosh Bapat and by me and with some
adjustments by me.  The larger patch series of which this patch is a
part was also reviewed and tested by Antonin Houska, Rajkumar
Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal
Legrand, and Rafia Sabih.

Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
Discussion: http://postgr.es/m/CAM2+6=XPWujjmj5zUaBTGDoB38CemwcPmjkRy0qOcsQj_V+2sQ@mail.gmail.com
2018-04-02 10:51:50 -04:00
Tom Lane 0b11a674fb Fix a boatload of typos in C comments.
Justin Pryzby

Discussion: https://postgr.es/m/20180331105640.GK28454@telsasoft.com
2018-04-01 15:01:28 -04:00
Robert Haas 96030f9a48 Don't call IS_DUMMY_REL() when cheapest_total_path might be junk.
Unlike the previous coding, this might result in a Gather per Append
subplan when the target list is parallel-restricted, but such a plan
is probably worth considering in that case, since a single Gather
on top of the entire Append is impossible.

Per Andres Freund and the buildfarm.

Discussion: http://postgr.es/m/20180330050351.bmxx4cdtz67czjda@alap3.anarazel.de
2018-03-30 11:40:41 -04:00
Robert Haas c1de1a3a8b Remove 'target' from GroupPathExtraData.
It's not needed.

Jeevan Chalke

Discussion: http://postgr.es/m/CAM2+6=XPWujjmj5zUaBTGDoB38CemwcPmjkRy0qOcsQj_V+2sQ@mail.gmail.com
2018-03-29 16:17:18 -04:00
Robert Haas 11cf92f6e2 Rewrite the code that applies scan/join targets to paths.
If the toplevel scan/join target list is parallel-safe, postpone
generating Gather (or Gather Merge) paths until after the toplevel has
been adjusted to return it.  This (correctly) makes queries with
expensive functions in the target list more likely to choose a
parallel plan, since the cost of the plan now reflects the fact that
the evaluation will happen in the workers rather than the leader.
The original complaint about this problem was from Jeff Janes.

If the toplevel scan/join relation is partitioned, recursively apply
the changes to all partitions.  This sometimes allows us to get rid of
Result nodes, because Append is not projection-capable but its
children may be.  It also cleans up what appears to be incorrect SRF
handling from commit e2f1eb0ee30d144628ab523432320f174a2c8966: the old
code had no knowledge of SRFs for child scan/join rels.

Because we now use create_projection_path() in some cases where we
formerly used apply_projection_to_path(), this changes the ordering
of columns in some queries generated by postgres_fdw.  Update
regression outputs accordingly.

Patch by me, reviewed by Amit Kapila and by Ashutosh Bapat.  Other
fixes for this problem (substantially different from this version)
were reviewed by Dilip Kumar, Amit Khandekar, and Marina Polyakova.

Discussion: http://postgr.es/m/CAMkU=1ycXNipvhWuweUVpKuyu6SpNjF=yHWu4c4US5JgVGxtZQ@mail.gmail.com
2018-03-29 15:49:31 -04:00
Robert Haas 3f90ec8597 Postpone generate_gather_paths for topmost scan/join rel.
Don't call generate_gather_paths for the topmost scan/join relation
when it is initially populated with paths.  Instead, do the work in
grouping_planner.  By itself, this gains nothing; in fact it loses
slightly because we end up calling set_cheapest() for the topmost
scan/join rel twice rather than once.  However, it paves the way for
a future commit which will postpone generate_gather_paths for the
topmost scan/join relation even further, allowing more accurate
costing of parallel paths.

Amit Kapila and Robert Haas.  Earlier versions of this patch (which
different substantially) were reviewed by Dilip Kumar, Amit
Khandekar, Marina Polyakova, and Ashutosh Bapat.
2018-03-29 15:40:40 -04:00
Robert Haas d7c19e62a8 Teach create_projection_plan to omit projection where possible.
We sometimes insert a ProjectionPath into a plan tree when projection
is not strictly required. The existing code already arranges to avoid
emitting a Result node when the ProjectionPath's subpath can perform
the projection itself, but previously it didn't consider the
possibility that the parent node might not actually require the
projection to be performed at all.

Skipping projection when it's not required can not only avoid Result
nodes that aren't needed, but also avoid losing the "physical tlist"
optimization unneccessarily.

Patch by me, reviewed by Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoakT5gmahbPWGqrR2nAdFOMAOnOXYoWHRdVfGWs34t6_A@mail.gmail.com
2018-03-29 15:37:48 -04:00
Andres Freund 9370462e9a Add inlining support to LLVM JIT provider.
This provides infrastructure to allow JITed code to inline code
implemented in C. This e.g. can be postgres internal functions or
extension code.

This already speeds up long running queries, by allowing the LLVM
optimizer to optimize across function boundaries. The optimization
potential currently doesn't reach its full potential because LLVM
cannot optimize the FunctionCallInfoData argument fully away, because
it's allocated on the heap rather than the stack. Fixing that is
beyond what's realistic for v11.

To be able to do that, use CLANG to convert C code to LLVM bitcode,
and have LLVM build a summary for it. That bitcode can then be used to
to inline functions at runtime. For that the bitcode needs to be
installed. Postgres bitcode goes into $pkglibdir/bitcode/postgres,
extensions go into equivalent directories.  PGXS has been modified so
that happens automatically if postgres has been compiled with LLVM
support.

Currently this isn't the fastest inline implementation, modules are
reloaded from disk during inlining. That's to work around an apparent
LLVM bug, triggering an apparently spurious error in LLVM assertion
enabled builds.  Once that is resolved we can remove the superfluous
read from disk.

Docs will follow in a later commit containing docs for the whole JIT
feature.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-28 13:19:08 -07:00
Andrew Dunstan 16828d5c02 Fast ALTER TABLE ADD COLUMN with a non-NULL default
Currently adding a column to a table with a non-NULL default results in
a rewrite of the table. For large tables this can be both expensive and
disruptive. This patch removes the need for the rewrite as long as the
default value is not volatile. The default expression is evaluated at
the time of the ALTER TABLE and the result stored in a new column
(attmissingval) in pg_attribute, and a new column (atthasmissing) is set
to true. Any existing row when fetched will be supplied with the
attmissingval. New rows will have the supplied value or the default and
so will never need the attmissingval.

Any time the table is rewritten all the atthasmissing and attmissingval
settings for the attributes are cleared, as they are no longer needed.

The most visible code change from this is in heap_attisnull, which
acquires a third TupleDesc argument, allowing it to detect a missing
value if there is one. In many cases where it is known that there will
not be any (e.g.  catalog relations) NULL can be passed for this
argument.

Andrew Dunstan, heavily modified from an original patch from Serge
Rielau.
Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley.

Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com
2018-03-28 10:43:52 +10:30
Andres Freund 32af96b2b1 JIT tuple deforming in LLVM JIT provider.
Performing JIT compilation for deforming gains performance benefits
over unJITed deforming from compile-time knowledge of the tuple
descriptor. Fixed column widths, NOT NULLness, etc can be taken
advantage of.

Right now the JITed deforming is only used when deforming tuples as
part of expression evaluation (and obviously only if the descriptor is
known). It's likely to be beneficial in other cases, too.

By default tuple deforming is JITed whenever an expression is JIT
compiled. There's a separate boolean GUC controlling it, but that's
expected to be primarily useful for development and benchmarking.

Docs will follow in a later commit containing docs for the whole JIT
feature.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-26 12:57:19 -07:00
Alvaro Herrera 1b89c2188b Fix typo 2018-03-26 09:56:41 -03:00
Andres Freund 2a0faed9d7 Add expression compilation support to LLVM JIT provider.
In addition to the interpretation of expressions (which back
evaluation of WHERE clauses, target list projection, aggregates
transition values etc) support compiling expressions to native code,
using the infrastructure added in earlier commits.

To avoid duplicating a lot of code, only support emitting code for
cases that are likely to be performance critical. For expression steps
that aren't deemed that, use the existing interpreter.

The generated code isn't great - some architectural changes are
required to address that. But this already yields a significant
speedup for some analytics queries, particularly with WHERE clauses
filtering a lot, or computing multiple aggregates.

Author: Andres Freund
Tested-By: Thomas Munro
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de

Disable JITing for VALUES() nodes.

VALUES() nodes are only ever executed once. This is primarily helpful
for debugging, when forcing JITing even for cheap queries.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-22 14:45:59 -07:00
Robert Haas 88ba0ae2aa Consider Parallel Append of partial paths for UNION [ALL].
Without this patch, we can implement a UNION or UNION ALL as an
Append where Gather appears beneath one or more of the Append
branches, but this lets us put the Gather node on top, with
a partial path for each relation underneath.

There is considerably more work that could be done to improve
planning in this area, but that will probably need to wait
for a future release.

Patch by me, reviewed and tested by Ashutosh Bapat and Rajkumar
Raghuwanshi.

Discussion: http://postgr.es/m/CA+TgmoaLRAOqHmMZx=ESM3VDEPceg+-XXZsRXQ8GtFJO_zbMSw@mail.gmail.com
2018-03-22 16:09:28 -04:00
Andres Freund cc415a56d0 Basic planner and executor integration for JIT.
This adds simple cost based plan time decision about whether JIT
should be performed. jit_above_cost, jit_optimize_above_cost are
compared with the total cost of a plan, and if the cost is above them
JIT is performed / optimization is performed respectively.

For that PlannedStmt and EState have a jitFlags (es_jit_flags) field
that stores information about what JIT operations should be performed.

EState now also has a new es_jit field, which can store a
JitContext. When there are no errors the context is released in
standard_ExecutorEnd().

It is likely that the default values for jit_[optimize_]above_cost
will need to be adapted further, but in my test these values seem to
work reasonably.

Author: Andres Freund, with feedback by Peter Eisentraut
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-22 11:51:58 -07:00
Robert Haas e2f1eb0ee3 Implement partition-wise grouping/aggregation.
If the partition keys of input relation are part of the GROUP BY
clause, all the rows belonging to a given group come from a single
partition.  This allows aggregation/grouping over a partitioned
relation to be broken down * into aggregation/grouping on each
partition.  This should be no worse, and often better, than the normal
approach.

If the GROUP BY clause does not contain all the partition keys, we can
still perform partial aggregation for each partition and then finalize
aggregation after appending the partial results.  This is less certain
to be a win, but it's still useful.

Jeevan Chalke, Ashutosh Bapat, Robert Haas.  The larger patch series
of which this patch is a part was also reviewed and tested by Antonin
Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin
Knizhnik, Pascal Legrand, and Rafia Sabih.

Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
2018-03-22 12:49:48 -04:00
Tom Lane 0f0deb7194 Improve predtest.c's handling of cases with NULL-constant inputs.
Currently, if operator_predicate_proof() is given an operator clause like
"something op NULL", it just throws up its hands and reports it can't prove
anything.  But we can often do better than that, if the operator is strict,
because then we know that the clause returns NULL overall.  Depending on
whether we're trying to prove or refute something, and whether we need
weak or strong semantics for NULL, this may be enough to prove the
implication, especially when we rely on the standard rule that "false
implies anything".  In particular, this lets us do something useful with
questions like "does X IN (1,3,5,NULL) imply X <= 5?"  The null entry
in the IN list can effectively be ignored for this purpose, but the
proof rules were not previously smart enough to deduce that.

This patch is by me, but it owes something to previous work by
Amit Langote to try to solve problems of the form mentioned.
Thanks also to Emre Hasegeli and Ashutosh Bapat for review.

Discussion: https://postgr.es/m/3bad48fc-f257-c445-feeb-8a2b2fb622ba@lab.ntt.co.jp
2018-03-21 18:30:46 -04:00
Andrew Gierth d2d79887ea Repair crash with unsortable grouping sets.
If there were multiple grouping sets, none of them empty, all of which
were unsortable, then an oversight in consider_groupingsets_paths led
to a null pointer dereference. Fix, and add a regression test for this
case.

Per report from Dang Minh Huong, though I didn't use their patch.

Backpatch to 10.x where hashed grouping sets were added.
2018-03-21 11:39:28 +00:00
Robert Haas 94150513ec Don't pass the grouping target around unnecessarily.
Since commit 4f15e5d09d made grouped_rel
set reltarget, a variety of other functions can just get it from
grouped_rel instead of having to pass it around explicitly.  Simplify
accordingly.

Patch by me, reviewed by Ashutosh Bapat.

Discussion: http://postgr.es/m/CA+TgmoZ+ZJTVad-=vEq393N99KTooxv9k7M+z73qnTAqkb49BQ@mail.gmail.com
2018-03-20 11:37:43 -04:00
Robert Haas b5996c2791 Determine grouping strategies in create_grouping_paths.
Partition-wise aggregate will call create_ordinary_grouping_paths
multiple times and we don't want to redo this work every time; have
the caller do it instead and pass the details down.

Patch by me, reviewed by Ashutosh Bapat.

Discussion: http://postgr.es/m/CA+TgmoY7VYYn9a7YHj1nJL6zj6BkHmt4K-un9LRmXkyqRZyynA@mail.gmail.com
2018-03-20 11:31:06 -04:00
Robert Haas 4f15e5d09d Defer creation of partially-grouped relation until it's needed.
This avoids unnecessarily creating a RelOptInfo for which we have no
actual need.  This idea is from Ashutosh Bapat, who wrote a very
different patch to accomplish a similar goal.  It will be more
important if and when we get partition-wise aggregate, since then
there could be many partially grouped relations all of which could
potentially be unnecessary.  In passing, this sets the grouping
relation's reltarget, which wasn't done previously but makes things
simpler for this refactoring.

Along the way, adjust things so that add_paths_to_partial_grouping_rel,
now renamed create_partial_grouping_paths, does not perform the Gather
or Gather Merge steps to generate non-partial paths from partial
paths; have the caller do it instead.  This is again for the
convenience of partition-wise aggregate, which wants to inject
additional partial paths are created and before we decide which ones
to Gather/Gather Merge.  This might seem like a separate change, but
it's actually pretty closely entangled; I couldn't really see much
value in separating it and having to change some things twice.

Patch by me, reviewed by Ashutosh Bapat.

Discussion: http://postgr.es/m/CA+TgmoZ+ZJTVad-=vEq393N99KTooxv9k7M+z73qnTAqkb49BQ@mail.gmail.com
2018-03-20 11:18:04 -04:00
Robert Haas c596fadbfe Generate a separate upper relation for each stage of setop planning.
Commit 3fc6e2d7f5 made setop planning
stages return paths rather than plans, but all such paths were loosely
associated with a single RelOptInfo, and only the final path was added
to the RelOptInfo.  Even at the time, it was foreseen that this should
be changed, because there is otherwise no good way for a single stage
of setop planning to return multiple paths.  With this patch, each
stage of set operation planning now creates a separate RelOptInfo;
these are distinguished by using appropriate relid sets.  Note that
this patch does nothing whatsoever about actually returning multiple
paths for the same set operation; it just makes it possible for a
future patch to do so.

Along the way, adjust things so that create_upper_paths_hook is called
for each of these new RelOptInfos rather than just once, since that
might be useful to extensions using that hook.  It might be a good
to provide an FDW API here as well, but I didn't try to do that for
now.

Patch by me, reviewed and tested by Ashutosh Bapat and Rajkumar
Raghuwanshi.

Discussion: http://postgr.es/m/CA+TgmoaLRAOqHmMZx=ESM3VDEPceg+-XXZsRXQ8GtFJO_zbMSw@mail.gmail.com
2018-03-19 11:55:38 -04:00
Robert Haas 49525c4630 Rewrite recurse_union_children to iterate, rather than recurse.
Also, rename it to plan_union_chidren, so the old name wasn't
very descriptive.  This results in a small net reduction in code,
seems at least to me to be easier to understand, and saves
space on the process stack.

Patch by me, reviewed and tested by Ashutosh Bapat and Rajkumar
Raghuwanshi.

Discussion: http://postgr.es/m/CA+TgmoaLRAOqHmMZx=ESM3VDEPceg+-XXZsRXQ8GtFJO_zbMSw@mail.gmail.com
2018-03-19 11:54:56 -04:00
Tom Lane 877cdf11ea Mop-up for letting VOID-returning SQL functions end with a SELECT.
Part of the intent in commit fd1a421fe was to allow SQL functions that are
declared to return VOID to contain anything, including an unrelated final
SELECT, the same as SQL-language procedures can.  However, the planner's
inlining logic didn't get that memo.  Fix it, and add some regression tests
covering this area, since evidently we had none.

In passing, clean up some typos in comments in create_function_3.sql,
and get rid of its none-too-safe assumption that DROP CASCADE notice
output is immutably ordered.

Per report from Prabhat Sahu.

Discussion: https://postgr.es/m/CANEvxPqxAj6nNHVcaXxpTeEFPmh24Whu+23emgjiuKrhJSct0A@mail.gmail.com
2018-03-16 12:48:13 -04:00
Robert Haas 1466bcfa4a Split create_grouping_paths into degenerate and non-degenerate cases.
There's no functional change here, or at least I hope there isn't,
just code rearrangement.  The rearrangement is motivated by
partition-wise aggregate, which doesn't need to consider the
degenerate case but wants to reuse the logic for the ordinary case.

Based loosely on a patch from Ashutosh Bapat and Jeevan Chalke, but I
whacked it around pretty heavily. The larger patch series of which
this patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.

Discussion: http://postgr.es/m/CAFjFpRewpqCmVkwvq6qrRjmbMDpN0CZvRRzjd8UvncczA3Oz1Q@mail.gmail.com
2018-03-15 14:43:58 -04:00
Robert Haas 648a6c7bd8 Pass additional arguments to a couple of grouping-related functions.
get_number_of_groups() and make_partial_grouping_target() currently
fish information directly out of the PlannerInfo; in the former case,
the target list, and in the latter case, the HAVING qual.  This works
fine if there's only one grouping relation, but if the pending patch
for partition-wise aggregate gets committed, we'll have multiple
grouping relations and must therefore use appropriately translated
versions of these values for each one.  To make that simpler, pass the
values to be used as arguments.

Jeevan Chalke.  The larger patch series of which this patch is a part
was also reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi,
David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal Legrand, Rafia
Sabih, and me.

Discussion: http://postgr.es/m/CAM2+6=UqFnFUypOvLdm5TgC+2M=-E0Q7_LOh0VDFFzmk2BBPzQ@mail.gmail.com
Discussion: http://postgr.es/m/CAM2+6=W+L=C4yBqMrgrfTfNtbtmr4T53-hZhwbA2kvbZ9VMrrw@mail.gmail.com
2018-03-15 11:33:52 -04:00
Stephen Frost 6b960aae90 Fix function-header comments in planner.c
In b5635948ab, a couple of function header comments weren't changed, or
weren't changed correctly, to reflect the arguments being passed into
the functions.  Specifically, get_number_of_groups() had the wrong
argument name in the commit and create_grouping_paths() wasn't
updated even though the arguments had been changed.

The issue with create_grouping_paths() was noticed by Ashutosh Bapat,
while I discovered the issue with get_number_of_groups() by looking to
see if there were any similar issues from that commit.

Discussion: https://postgr.es/m/CAFjFpRcbp4702jcp387PExt3fNCt62QJN8++DQGwBhsW6wRHWA@mail.gmail.com
2018-03-14 13:51:15 -04:00
Stephen Frost 1f7b8967ef Fix typo in add_paths_to_append_rel()
The comment should have been referring to the number of workers, not the
number of paths.

Author: Ashutosh Bapat
Discussion: https://postgr.es/m/CAFjFpRcbp4702jcp387PExt3fNCt62QJN8++DQGwBhsW6wRHWA@mail.gmail.com
2018-03-14 13:51:14 -04:00
Robert Haas 0927d2f46d Let Parallel Append over simple UNION ALL have partial subpaths.
A simple UNION ALL gets flattened into an appendrel of subquery
RTEs, but up until now it's been impossible for the appendrel to use
the partial paths for the subqueries, so we can implement the
appendrel as a Parallel Append but only one with non-partial paths
as children.

There are three separate obstacles to removing that limitation.
First, when planning a subquery, propagate any partial paths to the
final_rel so that they are potentially visible to outer query levels
(but not if they have initPlans attached, because that wouldn't be
safe).  Second, after planning a subquery, propagate any partial paths
for the final_rel to the subquery RTE in the outer query level in the
same way we do for non-partial paths.  Third, teach finalize_plan() to
account for the possibility that the fake parameter we use for rescan
signalling when the plan contains a Gather (Merge) node may be
propagated from an outer query level.

Patch by me, reviewed and tested by Amit Khandekar, Rajkumar
Raghuwanshi, and Ashutosh Bapat.  Test cases based on examples by
Rajkumar Raghuwanshi.

Discussion: http://postgr.es/m/CA+Tgmoa6L9A1nNCk3aTDVZLZ4KkHDn1+tm7mFyFvP+uQPS7bAg@mail.gmail.com
2018-03-13 16:34:08 -04:00
Tom Lane 4a4e2442a7 Fix improper uses of canonicalize_qual().
One of the things canonicalize_qual() does is to remove constant-NULL
subexpressions of top-level AND/OR clauses.  It does that on the assumption
that what it's given is a top-level WHERE clause, so that NULL can be
treated like FALSE.  Although this is documented down inside a subroutine
of canonicalize_qual(), it wasn't mentioned in the documentation of that
function itself, and some callers hadn't gotten that memo.

Notably, commit d007a9505 caused get_relation_constraints() to apply
canonicalize_qual() to CHECK constraints.  That allowed constraint
exclusion to misoptimize situations in which a CHECK constraint had a
provably-NULL subclause, as seen in the regression test case added here,
in which a child table that should be scanned is not.  (Although this
thinko is ancient, the test case doesn't fail before 9.2, for reasons
I've not bothered to track down in detail.  There may be related cases
that do fail before that.)

More recently, commit f0e44751d added an independent bug by applying
canonicalize_qual() to index expressions, which is even sillier since
those might not even be boolean.  If they are, though, I think this
could lead to making incorrect index entries for affected index
expressions in v10.  I haven't attempted to prove that though.

To fix, add an "is_check" parameter to canonicalize_qual() to specify
whether it should assume WHERE or CHECK semantics, and make it perform
NULL-elimination accordingly.  Adjust the callers to apply the right
semantics, or remove the call entirely in cases where it's not known
that the expression has one or the other semantics.  I also removed
the call in some cases involving partition expressions, where it should
be a no-op because such expressions should be canonical already ...
and was a no-op, independently of whether it could in principle have
done something, because it was being handed the qual in implicit-AND
format which isn't what it expects.  In HEAD, add an Assert to catch
that type of mistake in future.

This represents an API break for external callers of canonicalize_qual().
While that's intentional in HEAD to make such callers think about which
case applies to them, it seems like something we probably wouldn't be
thanked for in released branches.  Hence, in released branches, the
extra parameter is added to a new function canonicalize_qual_ext(),
and canonicalize_qual() is a wrapper that retains its old behavior.

Patch by me with suggestions from Dean Rasheed.  Back-patch to all
supported branches.

Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us
2018-03-11 18:10:42 -04:00
Tom Lane 5748f3a0aa Improve predtest.c's internal docs, and enhance its functionality a bit.
Commit b08df9cab left things rather poorly documented as far as the
exact semantics of "clause_is_check" mode went.  Also, that mode did
not really work correctly for predicate_refuted_by; although given the
lack of specification as to what it should do, as well as the lack
of any actual use-case, that's perhaps not surprising.

Rename "clause_is_check" to "weak" proof mode, and provide specifications
for what it should do.  I defined weak refutation as meaning "truth of A
implies non-truth of B", which makes it possible to use the mode in the
part of relation_excluded_by_constraints that checks for mutually
contradictory WHERE clauses.  Fix up several places that did things wrong
for that definition.  (As far as I can see, these errors would only lead
to failure-to-prove, not incorrect claims of proof, making them not
serious bugs even aside from the fact that v10 contains no use of this
mode.  So there seems no need for back-patching.)

In addition, teach predicate_refuted_by_recurse that it can use
predicate_implied_by_recurse after all when processing a strong NOT-clause,
so long as it asks for the correct proof strength.  This is an optimization
that could have been included in commit b08df9cab, but wasn't.

Also, simplify and generalize the logic that checks for whether nullness of
the argument of IS [NOT] NULL would force overall nullness of the predicate
or clause.  (This results in a change in the partition_prune test's output,
as it is now able to prune an all-nulls partition that it did not recognize
before.)

In passing, in PartConstraintImpliedByRelConstraint, remove bogus
conversion of the constraint list to explicit-AND form and then right back
again; that accomplished nothing except forcing a useless extra level of
recursion inside predicate_implied_by.

Discussion: https://postgr.es/m/5983.1520487191@sss.pgh.pa.us
2018-03-09 16:58:26 -05:00
Robert Haas 960df2a971 Correctly assess parallel-safety of tlists when SRFs are used.
Since commit 69f4b9c85f, the existing
code was no longer assessing the parallel-safety of the real tlist
for each upper rel, but rather the first of possibly several tlists
created by split_pathtarget_at_srfs().  Repair.

Even though this is clearly wrong, it's not clear that it has any
user-visible consequences at the moment, so no back-patch for now.  If
we discover later that it does have user-visible consequences, we
might need to back-patch this to v10.

Patch by me, per a report from Rajkumar Raghuwanshi.

Discussion: http://postgr.es/m/CA+Tgmoaob_Strkg4Dcx=VyxnyXtrmkV=ofj=pX7gH9hSre-g0Q@mail.gmail.com
2018-03-08 14:25:31 -05:00
Peter Eisentraut 5b804cc168 Fix costing of parallel hash joins.
Commit 1804284042 established that single-batch
parallel-aware hash joins could create one large shared hash table using the
combined work_mem budget of all participants.  The costing accidentally
assumed that parallel-oblivious hash joins could also do that.  The
documentation for initial_cost_hashjoin() also failed to mention the new
argument.  Repair.

Author: Thomas Munro
Reported-By: Antonin Houska
Reviewed-By: Antonin Houska
Discussion: https://postgr.es/m/12441.1513935950%40localhost
2018-03-06 21:54:37 -05:00
Peter Eisentraut fd1a421fe6 Add prokind column, replacing proisagg and proiswindow
The new column distinguishes normal functions, procedures, aggregates,
and window functions.  This replaces the existing columns proisagg and
proiswindow, and replaces the convention that procedures are indicated
by prorettype == 0.  Also change prorettype to be VOIDOID for procedures.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-02 13:48:33 -05:00
Tom Lane b5febc1d12 Fix IOS planning when only some index columns can return an attribute.
Since 9.5, it's possible that some but not all columns of an index
support returning the indexed value for index-only scans.  If the
same indexed column appears in index columns that behave both ways,
check_index_only() supposed that it'd be OK to do an index-only scan
testing that column; but that fails if we have to recheck the indexed
condition on one of the columns that doesn't support this.

In principle we could make this work by remapping the recheck expressions
to pull the value from a column that does support returning the indexed
value.  But such cases are so weird and rare that, at least for now,
it doesn't seem worth the trouble.  Instead, just teach check_index_only
that a value is returnable only if all the index columns containing it
are returnable, rather than any of them.

Per report from David Pereiro Lagares.  Back-patch to 9.5 where the
possibility of this situation appeared.

Kyotaro Horiguchi

Discussion: https://postgr.es/m/1516210494.1798.16.camel@nlpgo.com
2018-03-01 15:35:03 -05:00
Robert Haas 2af28e6033 For partitionwise join, match on partcollation, not parttypcoll.
The previous code considered two tables to have the partition scheme
if the underlying columns had the same collation, but what we
actually need to compare is not the collations associated with the
column but the collation used for partitioning.  Fix that.

Robert Haas and Amit Langote

Discussion: http://postgr.es/m/0f95f924-0efa-4cf5-eb5f-9a3d1bc3c33d@lab.ntt.co.jp
2018-02-28 12:16:09 -05:00
Robert Haas 5e6a63c0d1 Minor cleanup of code related to partially_grouped_rel.
Jeevan Chalke

Discussion: http://postgr.es/m/CAM2+6=X9kxQoL2ZqZ00E6asBt9z+rfyWbOmhXJ0+8fPAyMZ9Jg@mail.gmail.com
2018-02-27 13:23:50 -05:00
Robert Haas 3bfe957761 Fix logic error in add_paths_to_partial_grouping_rel.
Commit 3bf05e096b sometimes uses the
cheapest_partial_path variable in this function to mean the cheapest
one from the input rel and at other times the cheapest one from the
partially grouped rel, but it never resets it, so we can end up with
bad plans, leading to "ERROR: Aggref found in non-Agg plan node".

Jeevan Chalke, per a report from Andreas Joseph Krogh and a separate
off-list report from Rajkumar Raghuwanshi

Discussion: http://postgr.es/m/CAM2+6=X9kxQoL2ZqZ00E6asBt9z+rfyWbOmhXJ0+8fPAyMZ9Jg@mail.gmail.com
2018-02-27 13:23:50 -05:00
Robert Haas 3bf05e096b Add a new upper planner relation for partially-aggregated results.
Up until now, we've abused grouped_rel->partial_pathlist as a place to
store partial paths that have been partially aggregate, but that's
really not correct, because a partial path for a relation is supposed
to be one which produces the correct results with the addition of only
a Gather or Gather Merge node, and these paths also require a Finalize
Aggregate step.  Instead, add a new partially_group_rel which can hold
either partial paths (which need to be gathered and then have
aggregation finalized) or non-partial paths (which only need to have
aggregation finalized).  This allows us to reuse generate_gather_paths
for partially_grouped_rel instead of writing new code, so that this
patch actually basically no net new code while making things cleaner,
simplifying things for pending patches for partition-wise aggregate.

Robert Haas and Jeevan Chalke.  The larger patch series of which this
patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.

Discussion: http://postgr.es/m/CA+TgmobrzFYS3+U8a_BCy3-hOvh5UyJbC18rEcYehxhpw5=ETA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZyQEjdBNuoG9-wC5GQ5GrO4544Myo13dVptvx+uLg9uQ@mail.gmail.com
2018-02-26 09:32:32 -05:00
Tom Lane 9afd513df0 Fix planner failures with overlapping mergejoin clauses in an outer join.
Given overlapping or partially redundant join clauses, for example
	t1 JOIN t2 ON t1.a = t2.x AND t1.b = t2.x
the planner's EquivalenceClass machinery will ordinarily refactor the
clauses as "t1.a = t1.b AND t1.a = t2.x", so that join processing doesn't
see multiple references to the same EquivalenceClass in a list of join
equality clauses.  However, if the join is outer, it's incorrect to derive
a restriction clause on the outer side from the join conditions, so the
clause refactoring does not happen and we end up with overlapping join
conditions.  The code that attempted to deal with such cases had several
subtle bugs, which could result in "left and right pathkeys do not match in
mergejoin" or "outer pathkeys do not match mergeclauses" planner errors,
if the selected join plan type was a mergejoin.  (It does not appear that
any actually incorrect plan could have been emitted.)

The core of the problem really was failure to recognize that the outer and
inner relations' pathkeys have different relationships to the mergeclause
list.  A join's mergeclause list is constructed by reference to the outer
pathkeys, so it will always be ordered the same as the outer pathkeys, but
this cannot be presumed true for the inner pathkeys.  If the inner sides of
the mergeclauses contain multiple references to the same EquivalenceClass
({t2.x} in the above example) then a simplistic rendering of the required
inner sort order is like "ORDER BY t2.x, t2.x", but the pathkey machinery
recognizes that the second sort column is redundant and throws it away.
The mergejoin planning code failed to account for that behavior properly.
One error was to try to generate cut-down versions of the mergeclause list
from cut-down versions of the inner pathkeys in the same way as the initial
construction of the mergeclause list from the outer pathkeys was done; this
could lead to choosing a mergeclause list that fails to match the outer
pathkeys.  The other problem was that the pathkey cross-checking code in
create_mergejoin_plan treated the inner and outer pathkey lists
identically, whereas actually the expectations for them must be different.
That led to false "pathkeys do not match" failures in some cases, and in
principle could have led to failure to detect bogus plans in other cases,
though there is no indication that such bogus plans could be generated.

Reported by Alexander Kuzmenkov, who also reviewed this patch.  This has
been broken for years (back to around 8.3 according to my testing), so
back-patch to all supported branches.

Discussion: https://postgr.es/m/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru
2018-02-23 13:47:33 -05:00
Robert Haas 7d8ac9814b Charge cpu_tuple_cost * 0.5 for Append and MergeAppend nodes.
Previously, Append didn't charge anything at all, and MergeAppend
charged only cpu_operator_cost, about half the value used here.  This
change might make MergeAppend plans slightly more likely to be chosen
than before, since this commit increases the assumed cost for Append
-- with default values -- by 0.005 per tuple but MergeAppend by only
0.0025 per tuple.  Since the comparisons required by MergeAppend are
costed separately, it's not clear why MergeAppend needs to be
otherwise more expensive than Append, so hopefully this is OK.

Prior to partition-wise join, it didn't really matter whether or not
an Append node had any cost of its own, because every plan had to use
the same number of Append or MergeAppend nodes and in the same places.
Only the relative cost of Append vs. MergeAppend made a difference.
Now, however, it is possible to avoid some of the Append nodes using a
partition-wise join, so it's worth making an effort.  Pending patches
for partition-wise aggregate care too, because an Append of Aggregate
nodes will incur the Append overhead fewer times than an Aggregate
over an Append.  Although in most cases this change will favor the use
of partition-wise techniques, it does the opposite when the join
cardinality is greater than the sum of the input cardinalities.  Since
this situation arises in an existing regression test, I [rhaas]
adjusted it to keep the overall plan shape approximately the same.

Jeevan Chalke, per a suggestion from David Rowley.  Reviewed by
Ashutosh Bapat.  Some changes by me.  The larger patch series of which
this patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.

Discussion: http://postgr.es/m/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B=ZRh-rxy9qxfPA5Gw@mail.gmail.com
2018-02-21 23:09:27 -05:00
Peter Eisentraut 2fb1abaeb0 Rename enable_partition_wise_join to enable_partitionwise_join
Discussion: https://www.postgresql.org/message-id/flat/ad24e4f4-6481-066e-e3fb-6ef4a3121882%402ndquadrant.com
2018-02-16 10:33:59 -05:00
Robert Haas 88ef48c1cc Fix parallel index builds for dynamic_shared_memory_type=none.
The previous code failed to realize that this setting effectively
disables parallelism, and would crash if it decided to attempt
parallelism anyway.  Instead, treat it as a disabling condition.

Kyotaro Horiguchi, who also reported the issue. Reviewed by Michael
Paquier and Peter Geoghegan.

Discussion: http://postgr.es/m/20180209.170635.256350357.horiguchi.kyotaro@lab.ntt.co.jp
2018-02-12 12:55:12 -05:00
Tom Lane 0a459cec96 Support all SQL:2011 options for window frame clauses.
This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING"
frame boundaries in window functions.  We'd punted on that back in the
original patch to add window functions, because it was not clear how to
do it in a reasonably data-type-extensible fashion.  That problem is
resolved here by adding the ability for btree operator classes to provide
an "in_range" support function that defines how to add or subtract the
RANGE offset value.  Factoring it this way also allows the operator class
to avoid overflow problems near the ends of the datatype's range, if it
wishes to expend effort on that.  (In the committed patch, the integer
opclasses handle that issue, but it did not seem worth the trouble to
avoid overflow failures for datetime types.)

The patch includes in_range support for the integer_ops opfamily
(int2/int4/int8) as well as the standard datetime types.  Support for
other numeric types has been requested, but that seems like suitable
material for a follow-on patch.

In addition, the patch adds GROUPS mode which counts the offset in
ORDER-BY peer groups rather than rows, and it adds the frame_exclusion
options specified by SQL:2011.  As far as I can see, we are now fully
up to spec on window framing options.

Existing behaviors remain unchanged, except that I changed the errcode
for a couple of existing error reports to meet the SQL spec's expectation
that negative "offset" values should be reported as SQLSTATE 22013.

Internally and in relevant parts of the documentation, we now consistently
use the terminology "offset PRECEDING/FOLLOWING" rather than "value
PRECEDING/FOLLOWING", since the term "value" is confusingly vague.

Oliver Ford, reviewed and whacked around some by me

Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com
2018-02-07 00:06:56 -05:00
Robert Haas f069c91a57 Fix possible crash in partition-wise join.
The previous code assumed that we'd always succeed in creating
child-joins for a joinrel for which partition-wise join was considered,
but that's not guaranteed, at least in the case where dummy rels
are involved.

Ashutosh Bapat, with some wordsmithing by me.

Discussion: http://postgr.es/m/CAFjFpRf8=uyMYYfeTBjWDMs1tR5t--FgOe2vKZPULxxdYQ4RNw@mail.gmail.com
2018-02-05 17:31:57 -05:00
Robert Haas 9da0cc3528 Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 13:32:44 -05:00
Tom Lane 35a528062c Add stack-overflow guards in set-operation planning.
create_plan_recurse lacked any stack depth check.  This is not per
our normal coding rules, but I'd supposed it was safe because earlier
planner processing is more complex and presumably should eat more
stack.  But bug #15033 from Andrew Grossman shows this isn't true,
at least not for queries having the form of a many-thousand-way
INTERSECT stack.

Further testing showed that recurse_set_operations is also capable
of being crashed in this way, since it likewise will recurse to the
bottom of a parsetree before calling any support functions that
might themselves contain any stack checks.  However, its stack
consumption is only perhaps a third of create_plan_recurse's.

It's possible that this particular problem with create_plan_recurse can
only manifest in 9.6 and later, since before that we didn't build a Path
tree for set operations.  But having seen this example, I now have no
faith in the proposition that create_plan_recurse doesn't need a stack
check, so back-patch to all supported branches.

Discussion: https://postgr.es/m/20180127050845.28812.58244@wrigleys.postgresql.org
2018-01-28 13:39:07 -05:00
Robert Haas 9fd8b7d632 Factor some code out of create_grouping_paths.
This is preparatory refactoring to prepare the way for partition-wise
aggregate, which will reuse the new subroutines for child grouping
rels.  It also does not seem like a bad idea on general principle,
as the function was getting pretty long.

Jeevan Chalke.  The larger patch series of which this patch is a part
was reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi,
Ashutosh Bapat, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, and me.  Some cosmetic changes by me.

Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
2018-01-26 15:03:12 -05:00
Alvaro Herrera 05fb5d6619 Ignore partitioned indexes where appropriate
get_relation_info() was too optimistic about opening indexes in
partitioned tables, which would raise errors when any queries were
planned on such tables.  Fix by ignoring any indexes of the partitioned
kind.

CLUSTER (and ALTER TABLE CLUSTER ON) had a similar problem.  Fix by
disallowing these commands in partitioned tables.

Fallout from 8b08f7d482.
2018-01-25 16:12:15 -03:00
Tom Lane bb94ce4d26 Teach reparameterize_path() to handle AppendPaths.
If we're inside a lateral subquery, there may be no unparameterized paths
for a particular child relation of an appendrel, in which case we *must*
be able to create similarly-parameterized paths for each other child
relation, else the planner will fail with "could not devise a query plan
for the given query".  This means that there are situations where we'd
better be able to reparameterize at least one path for each child.

This calls into question the assumption in reparameterize_path() that
it can just punt if it feels like it.  However, the only case that is
known broken right now is where the child is itself an appendrel so that
all its paths are AppendPaths.  (I think possibly I disregarded that in
the original coding on the theory that nested appendrels would get folded
together --- but that only happens *after* reparameterize_path(), so it's
not excused from handling a child AppendPath.)  Given that this code's been
like this since 9.3 when LATERAL was introduced, it seems likely we'd have
heard of other cases by now if there were a larger problem.

Per report from Elvis Pranskevichus.  Back-patch to 9.3.

Discussion: https://postgr.es/m/5981018.zdth1YWmNy@hammer.magicstack.net
2018-01-23 16:50:34 -05:00
Robert Haas 2f17844104 Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint.  In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one.  This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented.  (There is a pending patch to improve the
situation further, but it needs more review.)

Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me.  A few final revisions by me.

Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 15:33:06 -05:00
Bruce Momjian f033462d8f Reorder C includes
Reorder header files in joinrels.c and pathnode.c in alphabetical order,
removing unnecessary ones.

Author: Etsuro Fujita
2018-01-17 18:10:05 -05:00
Tom Lane 680d540502 Avoid unnecessary failure in SELECT concurrent with ALTER NO INHERIT.
If a query against an inheritance tree runs concurrently with an ALTER
TABLE that's disinheriting one of the tree members, it's possible to get
a "could not find inherited attribute" error because after obtaining lock
on the removed member, make_inh_translation_list sees that its columns
have attinhcount=0 and decides they aren't the columns it's looking for.

An ideal fix, perhaps, would avoid including such a just-removed member
table in the query at all; but there seems no way to accomplish that
without adding expensive catalog rechecks or creating a likelihood of
deadlocks.  Instead, let's just drop the check on attinhcount.  In this
way, a query that's included a just-disinherited child will still
succeed, which is not a completely unreasonable behavior.

This problem has existed for a long time, so back-patch to all supported
branches.  Also add an isolation test verifying related behaviors.

Patch by me; the new isolation test is based on Kyotaro Horiguchi's work.

Discussion: https://postgr.es/m/20170626.174612.23936762.horiguchi.kyotaro@lab.ntt.co.jp
2018-01-12 15:46:37 -05:00
Tom Lane 90947674fc Fix incorrect handling of subquery pullup in the presence of grouping sets.
If we flatten a subquery whose target list contains constants or
expressions, when those output columns are used in GROUPING SET columns,
the planner was capable of doing the wrong thing by merging a pulled-up
expression into the surrounding expression during const-simplification.
Then the late processing that attempts to match subexpressions to grouping
sets would fail to match those subexpressions to grouping sets, with the
effect that they'd not go to null when expected.

To fix, wrap such subquery outputs in PlaceHolderVars, ensuring that
they preserve their separate identity throughout the planner's expression
processing.  This is a bit of a band-aid, because the wrapper defeats
const-simplification even in places where it would be safe to allow.
But a nicer fix would likely be too invasive to back-patch, and the
consequences of the missed optimizations probably aren't large in most
cases.

Back-patch to 9.5 where grouping sets were introduced.

Heikki Linnakangas, with small mods and better test cases by me;
additional review by Andrew Gierth

Discussion: https://postgr.es/m/7dbdcf5c-b5a6-ef89-4958-da212fe10176@iki.fi
2018-01-12 12:24:50 -05:00
Bruce Momjian bdb70c12b3 C comment: fix "the the" mentions in C comments
Reported-by: Christoph Dreis

Discussion: https://postgr.es/m/007e01d3519e$2734ca10$759e5e30$@freenet.de

Author: Christoph Dreis
2018-01-11 21:50:21 -05:00
Peter Eisentraut 9e945f8626 Fix Latin spelling
"c.f." should be "cf.".
2018-01-11 08:32:01 -05:00
Robert Haas 2fd58096f0 Add missing "return" statement to accumulate_append_subpath.
Without this, Parallel Append can end up with extra children.

Report by Rajkumar Raghuwanshi.  Fix by Amit Khandekar.  Brown
paper bag bug by me.

Discussion: http://postgr.es/m/CAKcux6mBF-NiddyEe9LwymoUC5+wh8bQJ=uk2gGkOE+L8cv=LA@mail.gmail.com
2018-01-10 11:21:20 -05:00
Tom Lane 624e440a47 Improve the heuristic for ordering child paths of a parallel append.
Commit ab7271677 introduced code that attempts to order the child
scans of a Parallel Append node in a way that will minimize execution
time, based on total cost and startup cost.  However, it failed to
think hard about what to do when estimated costs are exactly equal;
a case that's particularly likely to occur when comparing on startup
cost.  In such a case the ordering of the child paths would be left
to the whims of qsort, an algorithm that isn't even stable.

We can improve matters by applying the rule used elsewhere in the
planner: if total costs are equal, sort on startup cost, and
vice versa.  When both cost estimates are exactly equal, rather
than letting qsort do something unpredictable, sort based on the
child paths' relids, which should typically result in sorting in
inheritance order.  (The latter provision requires inventing a
qsort-style comparator for bitmapsets, but maybe we'll have use
for that for other reasons in future.)

This results in a few plan changes in the select_parallel test,
but those all look more reasonable than before, when the actual
underlying cost numbers are taken into account.

Discussion: https://postgr.es/m/4944.1515446989@sss.pgh.pa.us
2018-01-09 13:07:52 -05:00
Robert Haas 63008b19ee Fix comment.
RELATION_IS_OTHER_TEMP is tested in the caller, not here.

Discussion: http://postgr.es/m/5A5438E4.3090709@lab.ntt.co.jp
2018-01-09 09:40:31 -05:00
Robert Haas c759395617 Code review for Parallel Append.
- Remove unnecessary #include mistakenly added in execnodes.h.
- Fix mistake in comment in choose_next_subplan_for_leader.
- Adjust row estimates in cost_append for a possibly-different
  parallel divisor.
- Clamp row estimates in cost_append after operations that may
  not produce integers.

Amit Kapila, with cosmetic adjustments by me.

Discussion: http://postgr.es/m/CAA4eK1+qcbeai3coPpRW=GFCzFeLUsuY4T-AKHqMjxpEGZBPQg@mail.gmail.com
2018-01-04 07:56:09 -05:00
Tom Lane 3decd150a2 Teach eval_const_expressions() to handle some more cases.
Add some infrastructure (mostly macros) to make it easier to write
typical cases for constant-expression simplification.  Add simplification
processing for ArrayRef, RowExpr, and ScalarArrayOpExpr node types,
which formerly went unsimplified even if all their inputs were constants.
Also teach it to simplify FieldSelect from a composite constant.
Make use of the new infrastructure to reduce the amount of code needed
for the existing ArrayExpr and ArrayCoerceExpr cases.

One existing test case changes output as a result of the fact that
RowExpr can now be folded to a constant.  All the new code is exercised
by existing test cases according to gcov, so I feel no need to add
additional tests.

Tom Lane, reviewed by Dmitry Dolgov

Discussion: https://postgr.es/m/3be3b82c-e29c-b674-2163-bf47d98817b1@iki.fi
2018-01-03 12:35:09 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Tom Lane c4c2885cbb Fix UNION/INTERSECT/EXCEPT over no columns.
Since 9.4, we've allowed the syntax "select union select" and variants
of that.  However, the planner wasn't expecting a no-column set operation
and ended up treating the set operation as if it were UNION ALL.

Turns out it's trivial to fix in v10 and later; we just need to be careful
about not generating a Sort node with no sort keys.  However, since a weird
corner case like this is never going to be exercised by developers, we'd
better have thorough regression tests if we want to consider it supported.

Per report from Victor Yegorov.

Discussion: https://postgr.es/m/CAGnEbojGJrRSOgJwNGM7JSJZpVAf8xXcVPbVrGdhbVEHZ-BUMw@mail.gmail.com
2017-12-22 12:08:06 -05:00
Tom Lane 6719b238e8 Rearrange execution of PARAM_EXTERN Params for plpgsql's benefit.
This patch does three interrelated things:

* Create a new expression execution step type EEOP_PARAM_CALLBACK
and add the infrastructure needed for add-on modules to generate that.
As discussed, the best control mechanism for that seems to be to add
another hook function to ParamListInfo, which will be called by
ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found.
For stand-alone expressions, we add a new entry point to allow the
ParamListInfo to be specified directly, since it can't be retrieved
from the parent plan node's EState.

* Redesign the API for the ParamListInfo paramFetch hook so that the
ParamExternData array can be entirely virtual.  This also lets us get rid
of ParamListInfo.paramMask, instead leaving it to the paramFetch hook to
decide which param IDs should be accessible or not.  plpgsql_param_fetch
was already doing the identical masking check, so having callers do it too
seemed redundant.  While I was at it, I added a "speculative" flag to
paramFetch that the planner can specify as TRUE to avoid unwanted failures.
This solves an ancient problem for plpgsql that it couldn't provide values
of non-DTYPE_VAR variables to the planner for fear of triggering premature
"record not assigned yet" or "field not found" errors during planning.

* Rework plpgsql to get rid of the need for "unshared" parameter lists,
by dint of turning the single ParamListInfo per estate into a nearly
read-only data structure that doesn't instantiate any per-variable data.
Instead, the paramFetch hook controls access to per-variable data and can
make the right decisions on the fly, replacing the cases that we used to
need multiple ParamListInfos for.  This might perhaps have been a
performance loss on its own, but by using a paramCompile hook we can
bypass plpgsql_param_fetch entirely during normal query execution.
(It's now only called when, eg, we copy the ParamListInfo into a cursor
portal.  copyParamList() or SerializeParamList() effectively instantiate
the virtual parameter array as a simple physical array without a
paramFetch hook, which is what we want in those cases.)  This allows
reverting most of commit 6c82d8d1f, though I kept the cosmetic
code-consolidation aspects of that (eg the assign_simple_var function).

Performance testing shows this to be at worst a break-even change,
and it can provide wins ranging up to 20% in test cases involving
accesses to fields of "record" variables.  The fact that values of
such variables can now be exposed to the planner might produce wins
in some situations, too, but I've not pursued that angle.

In passing, remove the "parent" pointer from the arguments to
ExecInitExprRec and related functions, instead storing that pointer in a
transient field in ExprState.  The ParamListInfo pointer for a stand-alone
expression is handled the same way; we'd otherwise have had to add
yet another recursively-passed-down argument in expression compilation.

Discussion: https://postgr.es/m/32589.1513706441@sss.pgh.pa.us
2017-12-21 12:57:45 -05:00
Andres Freund 1804284042 Add parallel-aware hash joins.
Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel
Hash Join with Parallel Hash.  While hash joins could already appear in
parallel queries, they were previously always parallel-oblivious and had a
partial subplan only on the outer side, meaning that the work of the inner
subplan was duplicated in every worker.

After this commit, the planner will consider using a partial subplan on the
inner side too, using the Parallel Hash node to divide the work over the
available CPU cores and combine its results in shared memory.  If the join
needs to be split into multiple batches in order to respect work_mem, then
workers process different batches as much as possible and then work together
on the remaining batches.

The advantages of a parallel-aware hash join over a parallel-oblivious hash
join used in a parallel query are that it:

 * avoids wasting memory on duplicated hash tables
 * avoids wasting disk space on duplicated batch files
 * divides the work of building the hash table over the CPUs

One disadvantage is that there is some communication between the participating
CPUs which might outweigh the benefits of parallelism in the case of small
hash tables.  This is avoided by the planner's existing reluctance to supply
partial plans for small scans, but it may be necessary to estimate
synchronization costs in future if that situation changes.  Another is that
outer batch 0 must be written to disk if multiple batches are required.

A potential future advantage of parallel-aware hash joins is that right and
full outer joins could be supported, since there is a single set of matched
bits for each hashtable, but that is not yet implemented.

A new GUC enable_parallel_hash is defined to control the feature, defaulting
to on.

Author: Thomas Munro
Reviewed-By: Andres Freund, Robert Haas
Tested-By: Rafia Sabih, Prabhat Sahu
Discussion:
    https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
    https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
2017-12-21 00:43:41 -08:00
Robert Haas 38fc54703e Re-fix wrong costing of Sort under Gather Merge.
Commit dc02c7bca4 changed this call
to create_sort_path() to take -1 rather than limit_tuples because,
at that time, there was no way for a Sort beneath a Gather Merge
to become a top-N sort.

Later, commit 3452dc5240 provided
a way for a Sort beneath a Gather Merge to become a top-N sort,
but failed to revert the previous commit in the process.  Do that.

Report and analysis by Jeff Janes; patch by Thomas Munro; review by
Amit Kapila and by me.

Discussion: http://postgr.es/m/CAEepm=1BWtC34vUroA0Uqjw02MaqdUrW+d6WD85_k8SLyPiKHQ@mail.gmail.com
2017-12-19 10:42:17 -05:00
Robert Haas d329dc2ea4 Remove bug from OPTIMIZER_DEBUG code for partition-wise join.
Etsuro Fujita, reviewed by Ashutosh Bapat

Discussion: http://postgr.es/m/5A2A60E6.6000008@lab.ntt.co.jp
2017-12-12 10:52:15 -05:00
Robert Haas ab72716778 Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention.  We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.

Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.

Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 17:28:39 -05:00
Robert Haas 1cbc17aaca Try to exclude partitioned tables in toto.
Ashutosh Bapat, reviewed by Jeevan Chalke.  Comment by me.

Discussion: http://postgr.es/m/CAFjFpRcuRaydz88CY_aQekmuvmN2A9ax5z0k=ppT+s8KS8xMRA@mail.gmail.com
2017-12-01 10:59:09 -05:00
Peter Eisentraut e4128ee767 SQL procedures
This adds a new object type "procedure" that is similar to a function
but does not have a return type and is invoked by the new CALL statement
instead of SELECT or similar.  This implementation is aligned with the
SQL standard and compatible with or similar to other SQL implementations.

This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well
as ALTER/DROP ROUTINE that can refer to either a function or a
procedure (or an aggregate function, as an extension to SQL).  There is
also support for procedures in various utility commands such as COMMENT
and GRANT, as well as support in pg_dump and psql.  Support for defining
procedures is available in all the languages supplied by the core
distribution.

While this commit is mainly syntax sugar around existing functionality,
future features will rely on having procedures as a separate object
type.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2017-11-30 11:03:20 -05:00
Robert Haas 1761653bbb Make create_unique_path manage memory like mark_dummy_rel.
Put the unique path in the same context as the owning RelOptInfo, rather
than the toplevel planner context.  This is how this function worked
originally, but commit f41803bb39
changed it without explanation.  mark_dummy_rel adopted the older (or
newer?) technique in commit eca75a12a2,
which also featured a much better explanation of why it is correct.
So, switch back to that technique here, with the same explanation
given there.

Although this fixes a possible memory leak when GEQO is in use, the
leak is minor and probably nobody cares, so no back-patch.

Ashutosh Bapat, reviewed by Tom Lane and by me

Discussion: http://postgr.es/m/CAFjFpRcXkHHrXyD9BCvkgGJV4TnHG2SWJ0PhJfrDu3NAcQvh7g@mail.gmail.com
2017-11-30 09:50:10 -05:00
Robert Haas eaedf0df71 Update typedefs.list and re-run pgindent
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
2017-11-29 09:24:24 -05:00
Tom Lane 801386af62 Clarify old comment about qual_is_pushdown_safe's handling of subplans.
This comment glossed over the difference between initplans and subplans,
but they are indeed different for our purposes here.
2017-11-28 23:32:17 -05:00
Tom Lane 9a785ad573 Fix creation of resjunk tlist entries for inherited mixed UPDATE/DELETE.
rewriteTargetListUD's processing is dependent on the relkind of the query's
target table.  That was fine at the time it was made to act that way, even
for queries on inheritance trees, because all tables in an inheritance tree
would necessarily be plain tables.  However, the 9.5 feature addition
allowing some members of an inheritance tree to be foreign tables broke the
assumption that rewriteTargetListUD's output tlist could be applied to all
child tables with nothing more than column-number mapping.  This led to
visible failures if foreign child tables had row-level triggers, and would
also break in cases where child tables belonged to FDWs that used methods
other than CTID for row identification.

To fix, delay running rewriteTargetListUD until after the planner has
expanded inheritance, so that it is applied separately to the (already
mapped) tlist for each child table.  We can conveniently call it from
preprocess_targetlist.  Refactor associated code slightly to avoid the
need to heap_open the target relation multiple times during
preprocess_targetlist.  (The APIs remain a bit ugly, particularly around
the point of which steps scribble on parse->targetList and which don't.
But avoiding such scribbling would require a change in FDW callback APIs,
which is more pain than it's worth.)

Also fix ExecModifyTable to ensure that "tupleid" is reset to NULL when
we transition from rows providing a CTID to rows that don't.  (That's
really an independent bug, but it manifests in much the same cases.)

Add a regression test checking one manifestation of this problem, which
was that row-level triggers on a foreign child table did not work right.

Back-patch to 9.5 where the problem was introduced.

Etsuro Fujita, reviewed by Ildus Kurbangaliev and Ashutosh Bapat

Discussion: https://postgr.es/m/20170514150525.0346ba72@postgrespro.ru
2017-11-27 17:54:07 -05:00
Tom Lane df3a66e282 Improve planner's handling of set-returning functions in grouping columns.
Improve query_is_distinct_for() to accept SRFs in the targetlist when
we can prove distinctness from a DISTINCT clause.  In that case the
de-duplication will surely happen after SRF expansion, so the proof
still works.  Continue to punt in the case where we'd try to prove
distinctness from GROUP BY (or, in the future, source relations).
To do that, we'd have to determine whether the SRFs were in the
grouping columns or elsewhere in the tlist, and it still doesn't
seem worth the trouble.  But this trivial change allows us to
recognize that "SELECT DISTINCT unnest(foo) FROM ..." produces
unique-ified output, which seems worth having.

Also, fix estimate_num_groups() to consider the possibility of SRFs in
the grouping columns.  Its failure to do so was masked before v10 because
grouping_planner() scaled up plan rowcount estimates by the estimated SRF
multiplier after performing grouping.  That doesn't happen anymore, which
is more correct, but it means we need an adjustment in the estimate for
the number of groups.  Failure to do this leads to an underestimate for
the number of output rows of subqueries like "SELECT DISTINCT unnest(foo)"
compared to what 9.6 and earlier estimated, thus breaking plan choices
in some cases.

Per report from Dmitry Shalashov.  Back-patch to v10 to avoid degraded
plan choices compared to previous releases.

Discussion: https://postgr.es/m/CAKPeCUGAeHgoh5O=SvcQxREVkoX7UdeJUMj1F5=aBNvoTa+O8w@mail.gmail.com
2017-11-25 11:48:09 -05:00
Robert Haas e89a71fb44 Pass InitPlan values to workers via Gather (Merge).
If a PARAM_EXEC parameter is used below a Gather (Merge) but the InitPlan
that computes it is attached to or above the Gather (Merge), force the
value to be computed before starting parallelism and pass it down to all
workers.  This allows us to use parallelism in cases where it previously
would have had to be rejected as unsafe.  We do - in this case - lose the
optimization that the value is only computed if it's actually used.  An
alternative strategy would be to have the first worker that needs the value
compute it, but one downside of that approach is that we'd then need to
select a parallel-safe path to compute the parameter value; it couldn't for
example contain a Gather (Merge) node.  At some point in the future, we
might want to consider both approaches.

Independent of that consideration, there is a great deal more work that
could be done to make more kinds of PARAM_EXEC parameters parallel-safe.
This infrastructure could be used to allow a Gather (Merge) on the inner
side of a nested loop (although that's not a very appealing plan) and
cases where the InitPlan is attached below the Gather (Merge) could be
addressed as well using various techniques.  But this is a good start.

Amit Kapila, reviewed and revised by me.  Reviewing and testing from
Kuntal Ghosh, Haribabu Kommi, and Tushar Ahuja.

Discussion: http://postgr.es/m/CAA4eK1LV0Y1AUV4cUCdC+sYOx0Z0-8NAJ2Pd9=UKsbQ5Sr7+JQ@mail.gmail.com
2017-11-16 12:06:14 -05:00
Robert Haas e5253fdc4f Add parallel_leader_participation GUC.
Sometimes, for testing, it's useful to have the leader do nothing but
read tuples from workers; and it's possible that could work out better
even in production.

Thomas Munro, reviewed by Amit Kapila and by me.  A few final tweaks
by me.

Discussion: http://postgr.es/m/CAEepm=2U++Lp3bNTv2Bv_kkr5NE2pOyHhxU=G0YTa4ZhSYhHiw@mail.gmail.com
2017-11-15 08:23:18 -05:00
Robert Haas 44ae64c388 Push target list evaluation through Gather Merge.
We already do this for Gather, but it got overlooked for Gather Merge.

Amit Kapila, with review and minor revisions by Rushabh Lathia
and by me.

Discussion: http://postgr.es/m/CAA4eK1KUC5Uyu7qaifxrjpHxbSeoQh3yzwN3bThnJsmJcZ-qtA@mail.gmail.com
2017-11-13 16:37:42 -05:00
Robert Haas e64861c79b Track in the plan the types associated with PARAM_EXEC parameters.
Up until now, we only tracked the number of parameters, which was
sufficient to allocate an array of Datums of the appropriate size,
but not sufficient to, for example, know how to serialize a Datum
stored in one of those slots.  An upcoming patch wants to do that,
so add this tracking to make it possible.

Patch by me, reviewed by Tom Lane and Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoYqpxDKn8koHdW8BEKk8FMUL0=e8m2Qe=M+r0UBjr3tuQ@mail.gmail.com
2017-11-13 15:24:12 -05:00
Robert Haas 5edc63bda6 Account for the effect of lossy pages when costing bitmap scans.
Dilip Kumar, reviewed by Alexander Kumenkov, Amul Sul, and me.
Some final adjustments by me.

Discussion: http://postgr.es/m/CAFiTN-sYtqUOXQ4SpuhTv0Z9gD0si3YxZGv_PQAAMX8qbOotcg@mail.gmail.com
2017-11-10 16:50:50 -05:00
Robert Haas b9941d3468 Fix incorrect comment.
Etsuro Fujita

Discussion: http://postgr.es/m/5A05728E.4050009@lab.ntt.co.jp
2017-11-10 10:55:09 -05:00
Robert Haas 1aba8e651a Add hash partitioning.
Hash partitioning is useful when you want to partition a growing data
set evenly.  This can be useful to keep table sizes reasonable, which
makes maintenance operations such as VACUUM faster, or to enable
partition-wise join.

At present, we still depend on constraint exclusion for partitioning
pruning, and the shape of the partition constraints for hash
partitioning is such that that doesn't work.  Work is underway to fix
that, which should both improve performance and make partitioning
pruning work with hash partitioning.

Amul Sul, reviewed and tested by Dilip Kumar, Ashutosh Bapat, Yugo
Nagata, Rajkumar Raghuwanshi, Jesper Pedersen, and by me.  A few
final tweaks also by me.

Discussion: http://postgr.es/m/CAAJ_b96fhpJAP=ALbETmeLk1Uni_GFZD938zgenhF49qgDTjaQ@mail.gmail.com
2017-11-09 18:07:44 -05:00
Peter Eisentraut 2eb4a831e5 Change TRUE/FALSE to true/false
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources.  The upper case spellings are only used
in some files/modules.  So standardize on the standard spellings.

The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.

In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-08 11:37:28 -05:00
Tom Lane 7b6c075471 Teach planner to account for HAVING quals in aggregation plan nodes.
For some reason, we have never accounted for either the evaluation cost
or the selectivity of filter conditions attached to Agg and Group nodes
(which, in practice, are always conditions from a HAVING clause).

Applying our regular selectivity logic to post-grouping conditions is a
bit bogus, but it's surely better than taking the selectivity as 1.0.
Perhaps someday the extended-statistics mechanism can be taught to provide
statistics that would help us in getting non-default estimates here.

Per a gripe from Benjamin Coutu.  This is surely a bug fix, but I'm
hesitant to back-patch because of the prospect of destabilizing existing
plan choices.  Given that it took us this long to notice the bug, it's
probably not hurting too many people in the field.

Discussion: https://postgr.es/m/20968.1509486337@sss.pgh.pa.us
2017-11-02 11:24:12 -04:00
Tom Lane 7c70996ebf Allow bitmap scans to operate as index-only scans when possible.
If we don't have to return any columns from heap tuples, and there's
no need to recheck qual conditions, and the heap page is all-visible,
then we can skip fetching the heap page altogether.

Skip prefetching pages too, when possible, on the assumption that the
recheck flag will remain the same from one page to the next.  While that
assumption is hardly bulletproof, it seems like a good bet most of the
time, and better than prefetching pages we don't need.

This commit installs the executor infrastructure, but doesn't change
any planner cost estimates, thus possibly causing bitmap scans to
not be chosen in cases where this change renders them the best choice.
I (tgl) am not entirely convinced that we need to account for this
behavior in the planner, because I think typically the bitmap scan would
get chosen anyway if it's the best bet.  In any case the submitted patch
took way too many shortcuts, resulting in too many clearly-bad choices,
to be committable.

Alexander Kuzmenkov, reviewed by Alexey Chernyshov, and whacked around
rather heavily by me.

Discussion: https://postgr.es/m/239a8955-c0fc-f506-026d-c837e86c827b@postgrespro.ru
2017-11-01 17:38:20 -04:00
Robert Haas cf7ab13bfb Fix code related to partitioning schemes for dropped columns.
The entry in appinfo->translated_vars can be NULL; if so, we must avoid
dereferencing it.

Ashutosh Bapat

Discussion: http://postgr.es/m/CAFjFpReL7+1ien=-21rhjpO3bV7aAm1rQ8XgLVk2csFagSzpZQ@mail.gmail.com
2017-10-31 14:43:05 +05:30
Robert Haas 24fd674a1a Fix grammar.
Etsuro Fujita

Discussion: http://postgr.es/m/cc7767b6-6a1b-74a2-8b3c-48b8e64c12ed@lab.ntt.co.jp
2017-10-28 11:14:23 +02:00
Robert Haas 682ce911f8 Allow parallel query for prepared statements with generic plans.
This was always intended to work, but due to an oversight in
max_parallel_hazard_walker, it didn't.  In testing, we missed the
fact that it was only working for custom plans, where the parameter
value has been substituted for the parameter itself early enough
that everything worked.  In a generic plan, the Param node survives
and must be treated as parallel-safe.  SerializeParamList provides
for the transmission of parameter values to workers.

Amit Kapila with help from Kuntal Ghosh.  Some changes by me.

Discussion: http://postgr.es/m/CAA4eK1+_BuZrmVCeua5Eqnm4Co9DAXdM5HPAOE2J19ePbR912Q@mail.gmail.com
2017-10-27 22:22:39 +02:00
Tom Lane 37a795a60b Support domains over composite types.
This is the last major omission in our domains feature: you can now
make a domain over anything that's not a pseudotype.

The major complication from an implementation standpoint is that places
that might be creating tuples of a domain type now need to be prepared
to apply domain_check().  It seems better that unprepared code fail
with an error like "<type> is not composite" than that it silently fail
to apply domain constraints.  Therefore, relevant infrastructure like
get_func_result_type() and lookup_rowtype_tupdesc() has been adjusted
to treat domain-over-composite as a distinct case that unprepared code
won't recognize, rather than just transparently treating it the same
as plain composite.  This isn't a 100% solution to the possibility of
overlooked domain checks, but it catches most places.

In passing, improve typcache.c's support for domains (it can now cache
the identity of a domain's base type), and rewrite the argument handling
logic in jsonfuncs.c's populate_record[set]_worker to reduce duplicative
per-call lookups.

I believe this is code-complete so far as the core and contrib code go.
The PLs need varying amounts of work, which will be tackled in followup
patches.

Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
2017-10-26 13:47:45 -04:00
Tom Lane 08f1e1f0a4 Make setrefs.c match by ressortgroupref even for plain Vars.
Previously, we skipped using search_indexed_tlist_for_sortgroupref()
if the tlist expression being sought in the child plan node was merely
a Var.  This is purely an optimization, based on the theory that
search_indexed_tlist_for_var() is faster, and one copy of a Var should
be as good as another.  However, the GROUPING SETS patch broke the
latter assumption: grouping columns containing the "same" Var can
sometimes have different outputs, as shown in the test case added here.
So do it the hard way whenever a ressortgroupref marking exists.

(If this seems like a bottleneck, we could imagine building a tlist index
data structure for ressortgroupref values, as we do for Vars.  But I'll
let that idea go until there's some evidence it's worthwhile.)

Back-patch to 9.6.  The problem also exists in 9.5 where GROUPING SETS
came in, but this patch is insufficient to resolve the problem in 9.5:
there is some obscure dependency on the upper-planner-pathification
work that happened in 9.6.  Given that this is such a weird corner case,
and no end users have complained about it, it doesn't seem worth the work
to develop a fix for 9.5.

Patch by me, per a report from Heikki Linnakangas.  (This does not fix
Heikki's original complaint, just the follow-on one.)

Discussion: https://postgr.es/m/aefc657e-edb2-64d5-6df1-a0828f6e9104@iki.fi
2017-10-26 12:17:40 -04:00
Tom Lane 896eb5efbd In the planner, delete joinaliasvars lists after we're done with them.
Although joinaliasvars lists coming out of the parser are quite simple,
those lists can contain arbitrarily complex expressions after subquery
pullup.  We do not perform expression preprocessing on them, meaning that
expressions in those lists will not meet the expectations of later phases
of the planner (for example, that they do not contain SubLinks).  This had
been thought pretty harmless, since we don't intentionally touch those
lists in later phases --- but Andreas Seltenreich found a case in which
adjust_appendrel_attrs() could recurse into a joinaliasvars list and then
die on its assertion that it never sees a SubLink.  We considered a couple
of localized fixes to prevent that specific case from looking at the
joinaliasvars lists, but really this seems like a generic hazard for all
expression processing in the planner.  Therefore, probably the best answer
is to delete the joinaliasvars lists from the parsetree at the end of
expression preprocessing, so that there are no reachable expressions that
haven't been through preprocessing.

The case Andreas found seems to be harmless in non-Assert builds, and so
far there are no field reports suggesting that there are user-visible
effects in other cases.  I considered back-patching this anyway, but
it turns out that Andreas' test doesn't fail at all in 9.4-9.6, because
in those versions adjust_appendrel_attrs contains code (added in commit
842faa714 and removed again in commit 215b43cdc) to process SubLinks
rather than complain about them.  Barring discovery of another path by
which unprocessed joinaliasvars lists can cause trouble, the most
prudent compromise seems to be to patch this into v10 but not further.

Patch by me, with thanks to Amit Langote for initial investigation
and review.

Discussion: https://postgr.es/m/87r2tvt9f1.fsf@ansel.ydns.eu
2017-10-24 18:42:47 -04:00
Magnus Hagander 752871b6de Fix typos
David Rowley
2017-10-19 13:58:30 +02:00
Robert Haas 6393613b6a Fix possible crash with Parallel Bitmap Heap Scan.
If a Parallel Bitmap Heap scan's chain of leftmost descendents
includes a BitmapOr whose first child is a BitmapAnd, the prior coding
would mistakenly create a non-shared TIDBitmap and then try to perform
shared iteration.

Report by Tomas Vondra.  Patch by Dilip Kumar.

Discussion: http://postgr.es/m/50e89684-8ad9-dead-8767-c9545bafd3b6@2ndquadrant.com
2017-10-13 15:02:45 -04:00
Tom Lane 8ec5429e2f Reduce "X = X" to "X IS NOT NULL", if it's easy to do so.
If the operator is a strict btree equality operator, and X isn't volatile,
then the clause must yield true for any non-null value of X, or null if X
is null.  At top level of a WHERE clause, we can ignore the distinction
between false and null results, so it's valid to simplify the clause to
"X IS NOT NULL".  This is a useful improvement mainly because we'll get
a far better selectivity estimate in most cases.

Because such cases seldom arise in well-written queries, it is unappetizing
to expend a lot of planner cycles looking for them ... but it turns out
that there's a place we can shoehorn this in practically for free, because
equivclass.c already has to detect and reject candidate equivalences of the
form X = X.  That doesn't catch every place that it would be valid to
simplify to X IS NOT NULL, but it catches the typical case.  Working harder
doesn't seem justified.

Patch by me, reviewed by Petr Jelinek

Discussion: https://postgr.es/m/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw@mail.gmail.com
2017-10-08 12:23:32 -04:00
Robert Haas 45866c7550 Copy information from the relcache instead of pointing to it.
We have the relations continuously locked, but not open, so relcache
pointers are not guaranteed to be stable.  Per buildfarm member
prion.

Ashutosh Bapat.  I fixed a typo.

Discussion: http://postgr.es/m/CAFjFpRcRBqoKLZSNmRsjKr81uEP=ennvqSQaXVCCBTXvJ2rW+Q@mail.gmail.com
2017-10-06 15:28:07 -04:00
Robert Haas f49842d1ee Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually.  This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels.  This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation.  In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side.  All the same, for now,
turn this feature off by default.

Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical.  It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.

Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me.  A few final adjustments by me.

Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 11:11:10 -04:00
Robert Haas e9baa5e9fa Allow DML commands that create tables to use parallel query.
Haribabu Kommi, reviewed by Dilip Kumar and Rafia Sabih.  Various
cosmetic changes by me to explain why this appears to be safe but
allowing inserts in parallel mode in general wouldn't be.  Also, I
removed the REFRESH MATERIALIZED VIEW case from Haribabu's patch,
since I'm not convinced that case is OK, and hacked on the
documentation somewhat.

Discussion: http://postgr.es/m/CAJrrPGdo5bak6qnPWe8Kpi8g_jfQEs-G4SYmG9y+OFaw2-dPvA@mail.gmail.com
2017-10-05 11:40:48 -04:00
Tom Lane c12d570fa1 Support arrays over domains.
Allowing arrays with a domain type as their element type was left un-done
in the original domain patch, but not for any very good reason.  This
omission leads to such surprising results as array_agg() not working on
a domain column, because the parser can't identify a suitable output type
for the polymorphic aggregate.

In order to fix this, first clean up the APIs of coerce_to_domain() and
some internal functions in parse_coerce.c so that we consistently pass
around a CoercionContext along with CoercionForm.  Previously, we sometimes
passed an "isExplicit" boolean flag instead, which is strictly less
information; and coerce_to_domain() didn't even get that, but instead had
to reverse-engineer isExplicit from CoercionForm.  That's contrary to the
documentation in primnodes.h that says that CoercionForm only affects
display and not semantics.  I don't think this change fixes any live bugs,
but it makes things more consistent.  The main reason for doing it though
is that now build_coercion_expression() receives ccontext, which it needs
in order to be able to recursively invoke coerce_to_target_type().

Next, reimplement ArrayCoerceExpr so that the node does not directly know
any details of what has to be done to the individual array elements while
performing the array coercion.  Instead, the per-element processing is
represented by a sub-expression whose input is a source array element and
whose output is a target array element.  This simplifies life in
parse_coerce.c, because it can build that sub-expression by a recursive
invocation of coerce_to_target_type().  The executor now handles the
per-element processing as a compiled expression instead of hard-wired code.
The main advantage of this is that we can use a single ArrayCoerceExpr to
handle as many as three successive steps per element: base type conversion,
typmod coercion, and domain constraint checking.  The old code used two
stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty
inefficient, and adding yet another array deconstruction to do domain
constraint checking seemed very unappetizing.

In the case where we just need a single, very simple coercion function,
doing this straightforwardly leads to a noticeable increase in the
per-array-element runtime cost.  Hence, add an additional shortcut evalfunc
in execExprInterp.c that skips unnecessary overhead for that specific form
of expression.  The runtime speed of simple cases is within 1% or so of
where it was before, while cases that previously required two levels of
array processing are significantly faster.

Finally, create an implicit array type for every domain type, as we do for
base types, enums, etc.  Everything except the array-coercion case seems
to just work without further effort.

Tom Lane, reviewed by Andrew Dunstan

Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us
2017-09-30 13:40:56 -04:00
Andrew Dunstan 28ae524bbf Quieten warnings about unused variables
These variables are only ever written to in assertion-enabled builds,
and the latest Microsoft compilers complain about such variables in
non-assertion-enabled builds.

Apparently they don't worry so much about variables that are written to
but not read from, so most of our PG_USED_FOR_ASSERTS_ONLY variables
don't cause the problem.

Discussion: https://postgr.es/m/7800.1505950322@sss.pgh.pa.us
2017-09-21 08:41:14 -04:00
Robert Haas 9140cf8269 Associate partitioning information with each RelOptInfo.
This is not used for anything yet, but it is necessary infrastructure
for partition-wise join and for partition pruning without constraint
exclusion.

Ashutosh Bapat, reviewed by Amit Langote and with quite a few changes,
mostly cosmetic, by me.  Additional review and testing of this patch
series by Antonin Houska, Amit Khandekar, Rafia Sabih, Rajkumar
Raghuwanshi, Thomas Munro, and Dilip Kumar.

Discussion: http://postgr.es/m/CAFjFpRfneFG3H+F6BaiXemMrKF+FY-POpx3Ocy+RiH3yBmXSNw@mail.gmail.com
2017-09-20 23:39:13 -04:00
Robert Haas 57eebca03a Fix create_lateral_join_info to handle dead relations properly.
Commit 0a480502b0 broke it.

Report by Andreas Seltenreich.  Fix by Ashutosh Bapat.

Discussion: http://postgr.es/m/874ls2vrnx.fsf@ansel.ydns.eu
2017-09-20 10:20:10 -04:00
Robert Haas 7f3a3312ab Fix typo.
Thomas Munro

Discussion: http://postgr.es/m/CAEepm=2j-HAgnBUrAazwS0ry7Z_ihk+d7g+Ye3u99+6WbiGt_Q@mail.gmail.com
2017-09-20 10:07:53 -04:00
Tom Lane 6f44fe7f12 Allow rel_is_distinct_for() to look through RelabelType below OpExpr.
This lets it do the right thing for, eg, varchar columns.
Back-patch to 9.5 where this logic appeared.

David Rowley, per report from Kim Rose Carlsen

Discussion: https://postgr.es/m/VI1PR05MB17091F9A9876528055D6A827C76D0@VI1PR05MB1709.eurprd05.prod.outlook.com
2017-09-17 15:28:51 -04:00
Robert Haas 0a480502b0 Expand partitioned table RTEs level by level, without flattening.
Flattening the partitioning hierarchy at this stage makes various
desirable optimizations difficult.  The original use case for this
patch was partition-wise join, which wants to match up the partitions
in one partitioning hierarchy with those in another such hierarchy.
However, it now seems that it will also be useful in making partition
pruning work using the PartitionDesc rather than constraint exclusion,
because with a flattened expansion, we have no easy way to figure out
which PartitionDescs apply to which leaf tables in a multi-level
partition hierarchy.

As it turns out, we end up creating both rte->inh and !rte->inh RTEs
for each intermediate partitioned table, just as we previously did for
the root table.  This seems unnecessary since the partitioned tables
have no storage and are not scanned.  We might want to go back and
rejigger things so that no partitioned tables (including the parent)
need !rte->inh RTEs, but that seems to require some adjustments not
related to the core purpose of this patch.

Ashutosh Bapat, reviewed by me and by Amit Langote.  Some final
adjustments by me.

Discussion: http://postgr.es/m/CAFjFpRd=1venqLL7oGU=C1dEkuvk2DJgvF+7uKbnPHaum1mvHQ@mail.gmail.com
2017-09-14 15:41:08 -04:00
Robert Haas 77b6b5e9ce Make RelationGetPartitionDispatchInfo expand depth-first.
With this change, the order of leaf partitions as returned by
RelationGetPartitionDispatchInfo should now be the same as the
order used by expand_inherited_rtentry.  This will make it simpler
for future patches to match up the partition dispatch information
with the planner data structures.  The new code is also, in my
opinion anyway, simpler and easier to understand.

Amit Langote, reviewed by Amit Khandekar.  I also reviewed and
made a few cosmetic revisions.

Discussion: http://postgr.es/m/d98d4761-5071-1762-501e-0e15047c714b@lab.ntt.co.jp
2017-09-14 12:28:50 -04:00
Robert Haas 1555566d9e Set partitioned_rels appropriately when UNION ALL is used.
In most cases, this omission won't matter, because the appropriate
locks will have been acquired during parse/plan or by AcquireExecutorLocks.
But it's a bug all the same.

Report by Ashutosh Bapat.  Patch by me, reviewed by Amit Langote.

Discussion: http://postgr.es/m/CAFjFpRdHb_ZnoDTuBXqrudWXh3H1ibLkr6nHsCFT96fSK4DXtA@mail.gmail.com
2017-09-14 11:00:39 -04:00
Tom Lane 7d08ce286c Distinguish selectivity of < from <= and > from >=.
Historically, the selectivity functions have simply not distinguished
< from <=, or > from >=, arguing that the fraction of the population that
satisfies the "=" aspect can be considered to be vanishingly small, if the
comparison value isn't any of the most-common-values for the variable.
(If it is, the code path that executes the operator against each MCV will
take care of things properly.)  But that isn't really true unless we're
dealing with a continuum of variable values, and in practice we seldom are.
If "x = const" would estimate a nonzero number of rows for a given const
value, then it follows that we ought to estimate different numbers of rows
for "x < const" and "x <= const", even if the const is not one of the MCVs.
Handling this more honestly makes a significant difference in edge cases,
such as the estimate for a tight range (x BETWEEN y AND z where y and z
are close together).

Hence, split scalarltsel into scalarltsel/scalarlesel, and similarly
split scalargtsel into scalargtsel/scalargesel.  Adjust <= and >=
operator definitions to reference the new selectivity functions.
Improve the core ineq_histogram_selectivity() function to make a
correction for equality.  (Along the way, I learned quite a bit about
exactly why that function gives good answers, which I tried to memorialize
in improved comments.)

The corresponding join selectivity functions were, and remain, just stubs.
But I chose to split them similarly, to avoid confusion and to prevent the
need for doing this exercise again if someone ever makes them less stubby.

In passing, change ineq_histogram_selectivity's clamp for extreme
probability estimates so that it varies depending on the histogram
size, instead of being hardwired at 0.0001.  With the default histogram
size of 100 entries, you still get the old clamp value, but bigger
histograms should allow us to put more faith in edge values.

Tom Lane, reviewed by Aleksander Alekseev and Kuntal Ghosh

Discussion: https://postgr.es/m/12232.1499140410@sss.pgh.pa.us
2017-09-13 11:12:39 -04:00
Tom Lane 8689e38263 Clean up handling of dropped columns in NAMEDTUPLESTORE RTEs.
The NAMEDTUPLESTORE patch piggybacked on the infrastructure for
TABLEFUNC/VALUES/CTE RTEs, none of which can ever have dropped columns,
so the possibility was ignored most places.  Fix that, including adding a
specification to parsenodes.h about what it's supposed to look like.

In passing, clean up assorted comments that hadn't been maintained
properly by said patch.

Per bug #14799 from Philippe Beaudoin.  Back-patch to v10.

Discussion: https://postgr.es/m/20170906120005.25630.84360@wrigleys.postgresql.org
2017-09-06 10:41:05 -04:00
Tom Lane 6e427aa4e5 Use lfirst_node() and linitial_node() where appropriate in planner.c.
There's no particular reason to target this module for the first
wholesale application of these macros; but we gotta start somewhere.

Ashutosh Bapat and Jeevan Chalke

Discussion: https://postgr.es/m/CAFjFpRcNr3r=u0ni=7A4GD9NnHQVq+dkFafzqo2rS6zy=dt1eg@mail.gmail.com
2017-09-05 15:57:48 -04:00
Robert Haas 30833ba154 Expand partitioned tables in PartDesc order.
Previously, we expanded the inheritance hierarchy in the order in
which find_all_inheritors had locked the tables, but that turns out
to block quite a bit of useful optimization.  For example, a
partition-wise join can't count on two tables with matching bounds
to get expanded in the same order.

Where possible, this change results in expanding partitioned tables in
*bound* order.  Bound order isn't well-defined for a list-partitioned
table with a null-accepting partition or for a list-partitioned table
where the bounds for a single partition are interleaved with other
partitions.  However, when expansion in bound order is possible, it
opens up further opportunities for optimization, such as
strength-reducing MergeAppend to Append when the expansion order
matches the desired sort order.

Patch by me, with cosmetic revisions by Ashutosh Bapat.

Discussion: http://postgr.es/m/CA+TgmoZrKj7kEzcMSum3aXV4eyvvbh9WD=c6m=002WMheDyE3A@mail.gmail.com
2017-08-31 15:50:18 -04:00
Tom Lane 7df2c1f8da Force rescanning of parallel-aware scan nodes below a Gather[Merge].
The ExecReScan machinery contains various optimizations for postponing
or skipping rescans of plan subtrees; for example a HashAgg node may
conclude that it can re-use the table it built before, instead of
re-reading its input subtree.  But that is wrong if the input contains
a parallel-aware table scan node, since the portion of the table scanned
by the leader process is likely to vary from one rescan to the next.
This explains the timing-dependent buildfarm failures we saw after
commit a2b70c89c.

The established mechanism for showing that a plan node's output is
potentially variable is to mark it as depending on some runtime Param.
Hence, to fix this, invent a dummy Param (one that has a PARAM_EXEC
parameter number, but carries no actual value) associated with each Gather
or GatherMerge node, mark parallel-aware nodes below that node as dependent
on that Param, and arrange for ExecReScanGather[Merge] to flag that Param
as changed whenever the Gather[Merge] node is rescanned.

This solution breaks an undocumented assumption made by the parallel
executor logic, namely that all rescans of nodes below a Gather[Merge]
will happen synchronously during the ReScan of the top node itself.
But that's fundamentally contrary to the design of the ExecReScan code,
and so was doomed to fail someday anyway (even if you want to argue
that the bug being fixed here wasn't a failure of that assumption).
A follow-on patch will address that issue.  In the meantime, the worst
that's expected to happen is that given very bad timing luck, the leader
might have to do all the work during a rescan, because workers think
they have nothing to do, if they are able to start up before the eventual
ReScan of the leader's parallel-aware table scan node has reset the
shared scan state.

Although this problem exists in 9.6, there does not seem to be any way
for it to manifest there.  Without GatherMerge, it seems that a plan tree
that has a rescan-short-circuiting node below Gather will always also
have one above it that will short-circuit in the same cases, preventing
the Gather from being rescanned.  Hence we won't take the risk of
back-patching this change into 9.6.  But v10 needs it.

Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
2017-08-30 09:29:55 -04:00
Andres Freund 2cd7084524 Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n).
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc.  Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.

Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-20 11:19:07 -07:00
Robert Haas 1e56883a52 Attempt to clarify comments related to force_parallel_mode.
Per discussion with Tom Lane.

Discussion: http://postgr.es/m/28589.1502902172@sss.pgh.pa.us
2017-08-17 14:09:14 -04:00
Tom Lane 963af96920 Add missing "static" marker.
Per pademelon.
2017-08-17 11:17:39 -04:00
Tom Lane 4867d7f62f Avoid out-of-memory in a hash join with many duplicate inner keys.
The executor is capable of splitting buckets during a hash join if
too much memory is being used by a small number of buckets.  However,
this only helps if a bucket's population is actually divisible; if
all the hash keys are alike, the tuples still end up in the same
new bucket.  This can result in an OOM failure if there are enough
inner keys with identical hash values.  The planner's cost estimates
will bias it against choosing a hash join in such situations, but not
by so much that it will never do so.  To mitigate the OOM hazard,
explicitly estimate the hash bucket space needed by just the inner
side's most common value, and if that would exceed work_mem then
add disable_cost to the hash cost estimate.

This approach doesn't account for the possibility that two or more
common values would share the same hash value.  On the other hand,
work_mem is normally a fairly conservative bound, so that eating
two or more times that much space is probably not going to kill us.

If we have no stats about the inner side, ignore this consideration.
There was some discussion of making a conservative assumption, but that
would effectively result in disabling hash join whenever we lack stats,
which seems like an overreaction given how seldom the problem manifests
in the field.

Per a complaint from David Hinkle.  Although this could be viewed
as a bug fix, the lack of similar complaints weighs against back-
patching; indeed we waited for v11 because it seemed already rather
late in the v10 cycle to be making plan choice changes like this one.

Discussion: https://postgr.es/m/32013.1487271761@sss.pgh.pa.us
2017-08-15 14:05:53 -04:00
Robert Haas e139f1953f Assorted preparatory refactoring for partition-wise join.
Instead of duplicating the logic to search for a matching
ParamPathInfo in multiple places, factor it out into a separate
function.

Pass only the relevant bits of the PartitionKey to
partition_bounds_equal instead of the whole thing, because
partition-wise join will want to call this without having a
PartitionKey available.

Adjust allow_star_schema_join and calc_nestloop_required_outer
to take relevant Relids rather than the entire Path, because
partition-wise join will want to call it with the top-parent
relids to determine whether a child join is allowable.

Ashutosh Bapat.  Review and testing of the larger patch set of which
this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih,
Thomas Munro, Dilip Kumar, and me.

Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com
2017-08-15 12:30:38 -04:00
Tom Lane 00418c6124 Simplify plpgsql's check for simple expressions.
plpgsql wants to recognize expressions that it can execute directly
via ExecEvalExpr() instead of going through the full SPI machinery.
Originally the test for this consisted of recursively groveling through
the post-planning expression tree to see if it contained only nodes that
plpgsql recognized as safe.  That was a major maintenance headache, since
it required updating plpgsql every time we added any kind of expression
node.  It was also kind of expensive, so over time we added various
pre-planning checks to try to short-circuit having to do that.
Robert Haas pointed out that as of the SRF-processing changes in v10,
particularly the addition of Query.hasTargetSRFs, there really isn't
any reason to make the recursive scan at all: the initial checks cover
everything we really care about.  We do have to make sure that those
checks agree with what inline_function() considers, so that inlining
of a function that formerly wasn't inlined can't cause an expression
considered simple to become non-simple.

Hence, delete the recursive function exec_simple_check_node(), and tweak
those other tests to more exactly agree with inline_function().  Adjust
some comments and function naming to match.

Discussion: https://postgr.es/m/CA+TgmoZGZpwdEV2FQWaVxA_qZXsQE1DAS5Fu8fwxXDNvfndiUQ@mail.gmail.com
2017-08-15 12:28:39 -04:00
Robert Haas 480f1f4329 Teach adjust_appendrel_attrs(_multilevel) to do multiple translations.
Currently, child relations are always base relations, so when we
translate parent relids to child relids, we only need to translate
a singler relid.  However, the proposed partition-wise join feature
will create child joins, which will mean we need to translate a set
of parent relids to the corresponding child relids.  This is
preliminary refactoring to make that possible.

Ashutosh Bapat.  Review and testing of the larger patch set of which
this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih,
Thomas Munro, Dilip Kumar, and me.  Some adjustments, mostly
cosmetic, by me.

Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com
2017-08-15 10:49:06 -04:00
Robert Haas d57929afc7 Avoid unnecessary single-child Append nodes.
Before commit d3cc37f1d8, an inheritance parent
whose only children were temp tables of other sessions would end up
as a simple scan of the parent; but with that commit, we end up with
an Append node, per a report from Ashutosh Bapat.  Tweak the logic
so that we go back to the old way, and update the function header
comment for partitioning while we're at it.

Ashutosh Bapat, reviewed by Amit Langote and adjusted by me.

Discussion: http://postgr.es/m/CAFjFpReWJr1yTkHU=OqiMBmcYCMoSW3VPR39RBuQ_ovwDFBT5Q@mail.gmail.com
2017-08-15 09:16:33 -04:00
Tom Lane 21d304dfed Final pgindent + perltidy run for v10. 2017-08-14 17:29:33 -04:00
Robert Haas 7086be6e36 When WCOs are present, disable direct foreign table modification.
If the user modifies a view that has CHECK OPTIONs and this gets
translated into a modification to an underlying relation which happens
to be a foreign table, the check options should be enforced.  In the
normal code path, that was happening properly, but it was not working
properly for "direct" modification because the whole operation gets
pushed to the remote side in that case and we never have an option to
enforce the constraint against individual tuples.  Fix by disabling
direct modification when there is a need to enforce CHECK OPTIONs.

Etsuro Fujita, reviewed by Kyotaro Horiguchi and by me.

Discussion: http://postgr.es/m/f8a48f54-6f02-9c8a-5250-9791603171ee@lab.ntt.co.jp
2017-07-24 15:57:24 -04:00
Tom Lane 278cb43411 Be more consistent about errors for opfamily member lookup failures.
Add error checks in some places that were calling get_opfamily_member
or get_opfamily_proc and just assuming that the call could never fail.
Also, standardize the wording for such errors in some other places.

None of these errors are expected in normal use, hence they're just
elog not ereport.  But they may be handy for diagnosing omissions in
custom opclasses.

Rushabh Lathia found the oversight in RelationBuildPartitionKey();
I found the others by grepping for all callers of these functions.

Discussion: https://postgr.es/m/CAGPqQf2R9Nk8htpv0FFi+FP776EwMyGuORpc9zYkZKC8sFQE3g@mail.gmail.com
2017-07-24 11:23:27 -04:00
Tom Lane decb08ebdf Code review for NextValueExpr expression node type.
Add missing infrastructure for this node type, notably in ruleutils.c where
its lack could demonstrably cause EXPLAIN to fail.  Add outfuncs/readfuncs
support.  (outfuncs support is useful today for debugging purposes.  The
readfuncs support may never be needed, since at present it would only
matter for parallel query and NextValueExpr should never appear in a
parallelizable query; but it seems like a bad idea to have a primnode type
that isn't fully supported here.)  Teach planner infrastructure that
NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node
with cost cpu_operator_cost.  Given its limited scope of usage, there
*might* be no live bug today from the lack of that knowledge, but it's
certainly going to bite us on the rear someday.  Teach pg_stat_statements
about the new node type, too.

While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction,
XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost.
Failing to do this for SQLValueFunction was an oversight in my commit
0bb51aa96.  The others are longer-standing oversights, but no time like the
present to fix them.  (In principle, CoerceToDomain could have cost much
higher than this, but it doesn't presently seem worth trying to examine the
domain's constraints here.)

Modify execExprInterp.c to execute NextValueExpr as an out-of-line
function; it seems quite unlikely to me that it's worth insisting that
it be inlined in all expression eval methods.  Besides, providing the
out-of-line function doesn't stop anyone from inlining if they want to.

Adjust some places where NextValueExpr support had been inserted with the
aid of a dartboard rather than keeping it in the same order as elsewhere.

Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us
2017-07-14 15:25:43 -04:00
Robert Haas 6af9f1bd4b Document partitioned_rels in create_modifytable_path header comment.
Etsuro Fujita, slightly adjusted by me.

Discussion: http://postgr.es/m/e87c4a6d-23d7-5e7c-e8db-44ed418eb5d1@lab.ntt.co.jp
2017-06-22 13:52:50 -04:00
Robert Haas 1300276042 Update comment to account for table partitioning.
Ashutosh Bapat and Amit Langote

Discussion: http://postgr.es/m/CAFjFpRcG_NaAv6cDHD-9VfGdvB8maAtSfB=fTQr5+kxP2_sXzg@mail.gmail.com
2017-06-22 10:53:37 -04:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Tom Lane e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Tom Lane 9ef2dbefc7 Final pgindent run with old pg_bsd_indent (version 1.3).
This is just to have a clean basis for comparison with the results of
the new version (which will indeed end up reverting some of these
changes...)

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:09:24 -04:00
Tom Lane d8e6b84bd2 Avoid regressions in foreign-key-based selectivity estimates.
David Rowley found that the "use the smallest per-column selectivity"
heuristic applied in some cases by get_foreign_key_join_selectivity()
was badly off if the FK columns are independent, producing estimates
much worse than we got before that code was added in 9.6.

One case where that heuristic was used was for LEFT and FULL outer joins
with the referenced rel on the outside of the join.  But we should not
really need to special-case those here.  eqjoinsel() never has had such a
special case; the correction is applied by calc_joinrel_size_estimate()
instead.  Let's just estimate such cases like inner joins and rely on that
later adjustment.  (I think there was something of a thinko here, in that
the comments seem to be thinking about the selectivity as defined for
semi/anti joins; but that shouldn't apply to left/full joins.)  Add a
regression test exercising such a case to show that this is sane in
at least some cases.

The other case where we used that heuristic was for SEMI/ANTI outer joins,
either if the referenced rel was on the outside, or if it was on the inside
but was part of a join within the RHS.  In either case, the FK doesn't give
us a lot of traction towards estimating the selectivity.  To ensure that
we don't have regressions from what happened before 9.6, let's punt by
ignoring the FK in such cases and applying the traditional selectivity
calculation.  (We might be able to improve on that later, but for now
I just want to be sure it's not worse than 9.5.)

Report and patch by David Rowley, simplified a bit by me.  Back-patch
to 9.6 where this code was added.

Discussion: https://postgr.es/m/CAKJS1f8NO8oCDcxrteohG6O72uU1saEVT9qX=R8pENr5QWerXw@mail.gmail.com
2017-06-19 15:33:41 -04:00
Robert Haas b08df9cab7 Teach predtest.c about CHECK clauses to fix partitioning bugs.
In a CHECK clause, a null result means true, whereas in a WHERE clause
it means false.  predtest.c provided different functions depending on
which set of semantics applied to the predicate being proved, but had
no option to control what a null meant in the clauses provided as
axioms.  Add one.

Use that in the partitioning code when figuring out whether the
validation scan on a new partition can be skipped.  Rip out the
old logic that attempted (not very successfully) to compensate
for the absence of the necessary support in predtest.c.

Ashutosh Bapat and Robert Haas, reviewed by Amit Langote and
incorporating feedback from Tom Lane.

Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com
2017-06-14 13:13:11 -04:00
Tom Lane 9db7d47f90 #ifdef out assorted unused GEQO code.
I'd always assumed that backend/optimizer/geqo/'s remarkably poor
showing on code coverage metrics was because we weren't exercising
it much in the regression tests.  But it turns out that a good chunk
of the problem is that there's a bunch of code that is physically
unreachable (because the calls to it are #ifdef'd out in geqo_main.c)
but is being built anyway.  Making the called code have #if guards
similar to the calling code saves a couple of kilobytes of executable
size and should make the coverage numbers more reflective of reality.

It's arguable that we should just delete all the unused recombination
mechanisms altogether, but I didn't feel a need to go that far today.
2017-06-04 13:34:05 -04:00
Tom Lane 23886581b5 Fix old corner-case logic error in final_cost_nestloop().
When costing a nestloop with stop-at-first-inner-match semantics, and a
non-indexscan inner path, final_cost_nestloop() wants to charge the full
scan cost of the inner rel at least once, with additional scans charged
at inner_rescan_run_cost which might be less.  However the logic for
doing this effectively assumed that outer_matched_rows is at least 1.
If it's zero, which is not unlikely for a small outer rel, we ended up
charging inner_run_cost plus N times inner_rescan_run_cost, as much as
double the correct charge for an outer rel with only one row that
we're betting won't be matched.  (Unless the inner rel is materialized,
in which case it has very small inner_rescan_run_cost and the cost
is not so far off what it should have been.)

The upshot of this was that the planner had a tendency to select plans
that failed to make effective use of the stop-at-first-inner-match
semantics, and that might have Materialize nodes in them even when the
predicted number of executions of the Materialize subplan was only 1.
This was not so obvious before commit 9c7f5229a, because the case only
arose in connection with semi/anti joins where there's not freedom to
reverse the join order.  But with the addition of unique-inner joins,
it could result in some fairly bad planning choices, as reported by
Teodor Sigaev.  Indeed, some of the test cases added by that commit
have plans that look dubious on closer inspection, and are changed
by this patch.

Fix the logic to ensure that we don't charge for too many inner scans.
I chose to adjust it so that the full-freight scan cost is associated
with an unmatched outer row if possible, not a matched one, since that
seems like a better model of what would happen at runtime.

This is a longstanding bug, but given the lesser impact in back branches,
and the lack of field complaints, I won't risk a back-patch.

Discussion: https://postgr.es/m/CAKJS1f-LzkUsFxdJ_-Luy38orQ+AdEXM5o+vANR+-pHAWPSecg@mail.gmail.com
2017-06-03 13:48:15 -04:00
Robert Haas b522759508 Copy partitioned_rels lists to avoid shared substructure.
Otherwise, set_plan_refs() can get applied to the same list
multiple times through different references, leading to chaos.

Amit Langote, Dilip Kumar, and Robert Haas, reviewed by Ashutosh
Bapat.  Original report by Sveinn Sveinsson.

Discussion: http://postgr.es/m/20170517141151.1435.79890@wrigleys.postgresql.org
2017-05-19 15:26:05 -04:00
Bruce Momjian a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Tom Lane f674743487 Remove no-longer-needed fields of Hash plan nodes.
skewColType/skewColTypmod are no longer used in the wake of commit
9aab83fc5, and seem unlikely to be wanted in future, so let's drop 'em.

Discussion: https://postgr.es/m/16364.1494520862@sss.pgh.pa.us
2017-05-14 11:07:40 -04:00
Tom Lane f04c9a6146 Standardize terminology for pg_statistic_ext entries.
Consistently refer to such an entry as a "statistics object", not just
"statistics" or "extended statistics".  Previously we had a mismash of
terms, accompanied by utter confusion as to whether the term was
singular or plural.  That's not only grating (at least to the ear of
a native English speaker) but could be outright misleading, eg in error
messages that seemed to be referring to multiple objects where only one
could be meant.

This commit fixes the code and a lot of comments (though I may have
missed a few).  I also renamed two new SQL functions,
pg_get_statisticsextdef -> pg_get_statisticsobjdef
pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible
to conform better with this terminology.

I have not touched the SGML docs other than fixing those function
names; the docs certainly need work but it seems like a separable task.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-14 10:55:01 -04:00
Tom Lane 92a43e4857 Reduce semijoins with unique inner relations to plain inner joins.
If the inner relation can be proven unique, that is it can have no more
than one matching row for any row of the outer query, then we might as
well implement the semijoin as a plain inner join, allowing substantially
more freedom to the planner.  This is a form of outer join strength
reduction, but it can't be implemented in reduce_outer_joins() because
we don't have enough info about the individual relations at that stage.
Instead do it much like remove_useless_joins(): once we've built base
relations, we can make another pass over the SpecialJoinInfo list and
get rid of any entries representing reducible semijoins.

This is essentially a followon to the inner-unique patch (commit 9c7f5229a)
and makes use of the proof machinery that that patch created.  We need only
minor refactoring of innerrel_is_unique's API to support this usage.

Per performance complaint from Teodor Sigaev.

Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru
2017-05-01 14:53:42 -04:00
Tom Lane 2057a58d16 Fix mis-optimization of semijoins with more than one LHS relation.
The inner-unique patch (commit 9c7f5229a) supposed that if we're
considering a JOIN_UNIQUE_INNER join path, we can always set inner_unique
for the join, because the inner path produced by create_unique_path should
be unique relative to the outer relation.  However, that's true only if
we're considering joining to the whole outer relation --- otherwise we may
be applying only some of the join quals, and so the inner path might be
non-unique from the perspective of this join.  Adjust the test to only
believe that we can set inner_unique if we have the whole semijoin LHS on
the outer side.

There is more that can be done in this area, but this commit is only
intended to provide the minimal fix needed to get correct plans.

Per report from Teodor Sigaev.  Thanks to David Rowley for preliminary
investigation.

Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru
2017-05-01 14:39:11 -04:00
Robert Haas e180c8aa8c Fire per-statement triggers on partitioned tables.
Even though no actual tuples are ever inserted into a partitioned
table (the actual tuples are in the partitions, not the partitioned
table itself), we still need to have a ResultRelInfo for the
partitioned table, or per-statement triggers won't get fired.

Amit Langote, per a report from Rajkumar Raghuwanshi.  Reviewed by me.

Discussion: http://postgr.es/m/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
2017-05-01 08:23:01 -04:00
Tom Lane 39151781c8 Fix testing of parallel-safety of SubPlans.
is_parallel_safe() supposed that the only relevant property of a SubPlan
was the parallel safety of the referenced subplan tree.  This is wrong:
the testexpr or args subtrees might contain parallel-unsafe stuff, as
demonstrated by the test case added here.  However, just recursing into the
subtrees fails in a different way: we'll typically find PARAM_EXEC Params
representing the subplan's output columns in the testexpr.  The previous
coding supposed that any Param must be treated as parallel-restricted, so
that a naive attempt at fixing this disabled parallel pushdown of SubPlans
altogether.  We must instead determine, for any visited Param, whether it
is one that would be computed by a surrounding SubPlan node; if so, it's
safe to push down along with the SubPlan node.

We might later be able to extend this logic to cope with Params used for
correlated subplans and other cases; but that's a task for v11 or beyond.

Tom Lane and Amit Kapila

Discussion: https://postgr.es/m/7064.1492022469@sss.pgh.pa.us
2017-04-18 15:43:56 -04:00
Alvaro Herrera ee6922112e Rename columns in new pg_statistic_ext catalog
The new catalog reused a column prefix "sta" from pg_statistic, but this
is undesirable, so change the catalog to use prefix "stx" instead.
Also, rename the column that lists enabled statistic kinds as "stxkind"
rather than "enabled".

Discussion: https://postgr.es/m/CAKJS1f_2t5jhSN7huYRFH3w3rrHfG2QU7hiUHsu-Vdjd1rYT3w@mail.gmail.com
2017-04-17 18:34:29 -03:00
Tom Lane 76799fc89d Always build a custom plan node's targetlist from the path's pathtarget.
We were applying the use_physical_tlist optimization to all relation
scan plans, even those implemented by custom scan providers.  However,
that's a bad idea for a couple of reasons.  The custom provider might
be unable to provide columns that it hadn't expected to be asked for
(for example, the custom scan might depend on an index-only scan).
Even more to the point, there's no good reason to suppose that this
"optimization" is a win for a custom scan; whatever the custom provider
is doing is likely not based on simply returning physical heap tuples.
(As a counterexample, if the custom scan is an interface to a column store,
demanding all columns would be a huge loss.)  If it is a win, the custom
provider could make that decision for itself and insert a suitable
pathtarget into the path, anyway.

Per discussion with Dmitry Ivanov.  Back-patch to 9.5 where custom scan
support was introduced.  The argument that the custom provider can adjust
the behavior by changing the pathtarget only applies to 9.6+, but on
balance it seems more likely that use_physical_tlist will hurt custom
scans than help them.

Discussion: https://postgr.es/m/e29ddd30-8ef9-4da5-a50b-2bb7b8c7198d@postgrespro.ru
2017-04-17 15:29:15 -04:00
Tom Lane 003d80f3df Mark finished Plan nodes with parallel_safe flags.
We'd managed to avoid doing this so far, but it seems pretty obvious
that it would be forced on us some day, and this is much the cleanest
way of approaching the open problem that parallel-unsafe subplans are
being transmitted to parallel workers.  Anyway there's no space cost
due to alignment considerations, and the time cost is pretty minimal
since we're just copying the flag from the corresponding Path node.
(At least in most cases ... some of the klugier spots in createplan.c
have to work a bit harder.)

In principle we could perhaps get rid of SubPlan.parallel_safe,
but I thought it better to keep that in case there are reasons to
consider a SubPlan unsafe even when its child plan is parallel-safe.

This patch doesn't actually do anything with the new flags, but
I thought I'd commit it separately anyway.

Note: although this touches outfuncs/readfuncs, there's no need for
a catversion bump because Plan trees aren't stored on disk.

Discussion: https://postgr.es/m/87tw5x4vcu.fsf@credativ.de
2017-04-12 15:13:34 -04:00
Tom Lane 8f0530f580 Improve castNode notation by introducing list-extraction-specific variants.
This extends the castNode() notation introduced by commit 5bcab1114 to
provide, in one step, extraction of a list cell's pointer and coercion to
a concrete node type.  For example, "lfirst_node(Foo, lc)" is the same
as "castNode(Foo, lfirst(lc))".  Almost half of the uses of castNode
that have appeared so far include a list extraction call, so this is
pretty widely useful, and it saves a few more keystrokes compared to the
old way.

As with the previous patch, back-patch the addition of these macros to
pg_list.h, so that the notation will be available when back-patching.

Patch by me, after an idea of Andrew Gierth's.

Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us
2017-04-10 13:51:53 -04:00
Tom Lane eef8c0069e Clean up bugs in clause_selectivity() cleanup.
Commit ac2b09508 was not terribly carefully reviewed.  Band-aid it to
not fail on non-RestrictInfo input, per report from Andreas Seltenreich.
Also make it do something more reasonable with variable-free clauses,
and improve nearby comments.

Discussion: https://postgr.es/m/87inmf5rdx.fsf@credativ.de
2017-04-08 16:38:03 -04:00
Tom Lane 9c7f5229ad Optimize joins when the inner relation can be proven unique.
If there can certainly be no more than one matching inner row for a given
outer row, then the executor can move on to the next outer row as soon as
it's found one match; there's no need to continue scanning the inner
relation for this outer row.  This saves useless scanning in nestloop
and hash joins.  In merge joins, it offers the opportunity to skip
mark/restore processing, because we know we have not advanced past the
first possible match for the next outer row.

Of course, the devil is in the details: the proof of uniqueness must
depend only on joinquals (not otherquals), and if we want to skip
mergejoin mark/restore then it must depend only on merge clauses.
To avoid adding more planning overhead than absolutely necessary,
the present patch errs in the conservative direction: there are cases
where inner_unique or skip_mark_restore processing could be used, but
it will not do so because it's not sure that the uniqueness proof
depended only on "safe" clauses.  This could be improved later.

David Rowley, reviewed and rather heavily editorialized on by me

Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com
2017-04-07 22:20:13 -04:00
Tom Lane 89deca582a Fix planner error (or assert trap) with nested set operations.
As reported by Sean Johnston in bug #14614, since 9.6 the planner can fail
due to trying to look up the referent of a Var with varno 0.  This happens
because we generate such Vars in generate_append_tlist, for lack of any
better way to describe the output of a SetOp node.  In typical situations
nothing really cares about that, but given nested set-operation queries
we will call estimate_num_groups on the output of the subquery, and that
wants to know what a Var actually refers to.  That logic used to look at
subquery->targetList, but in commit 3fc6e2d7f I'd switched it to look at
subroot->processed_tlist, ie the actual output of the subquery plan not the
parser's idea of the result.  It seemed like a good idea at the time :-(.
As a band-aid fix, change it back.

Really we ought to have an honest way of naming the outputs of SetOp steps,
which suggests that it'd be a good idea for the parser to emit an RTE
corresponding to each one.  But that's a task for another day, and it
certainly wouldn't yield a back-patchable fix.

Report: https://postgr.es/m/20170407115808.25934.51866@wrigleys.postgresql.org
2017-04-07 12:18:38 -04:00
Simon Riggs ac2b095088 Reset API of clause_selectivity()
Discussion: https://postgr.es/m/CAKJS1f9yurJQW9pdnzL+rmOtsp2vOytkpXKGnMFJEO-qz5O5eA@mail.gmail.com
2017-04-06 19:10:51 -04:00
Alvaro Herrera b1fc51a36e Comment fixes for extended statistics
Clean up some code comments in new extended statistics code, from
7b504eb282.
2017-04-06 12:28:50 -03:00
Simon Riggs 2686ee1b7c Collect and use multi-column dependency stats
Follow on patch in the multi-variate statistics patch series.

CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
ANALYZE;
will collect dependency stats on (a, b) and then use the measured
dependency in subsequent query planning.

Commit 7b504eb282 added
CREATE STATISTICS with n-distinct coefficients. These are now
specified using the mutually exclusive option WITH (ndistinct).

Author: Tomas Vondra, David Rowley
Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas
and many other comments and contributions
Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
2017-04-05 18:00:42 -04:00
Robert Haas 7a39b5e4d1 Abstract logic to allow for multiple kinds of child rels.
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels.  To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)

Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel.  Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.

Ashutosh Bapat, slightly tweaked by me.  Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.

Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
2017-04-03 22:41:31 -04:00
Andrew Gierth f578093526 Try and silence spurious Coverity warning.
gset_data (aka gd) in planner.c is always non-null if and only if
parse->groupingSets is non-null, but Coverity doesn't know that and
complains.  Feed it an assertion to see if that keeps it happy.
2017-04-03 23:30:24 +01:00
Kevin Grittner 18ce3a4ab2 Add infrastructure to support EphemeralNamedRelation references.
A QueryEnvironment concept is added, which allows new types of
objects to be passed into queries from parsing on through
execution.  At this point, the only thing implemented is a
collection of EphemeralNamedRelation objects -- relations which
can be referenced by name in queries, but do not exist in the
catalogs.  The only type of ENR implemented is NamedTuplestore, but
provision is made to add more types fairly easily.

An ENR can carry its own TupleDesc or reference a relation in the
catalogs by relid.

Although these features can be used without SPI, convenience
functions are added to SPI so that ENRs can easily be used by code
run through SPI.

The initial use of all this is going to be transition tables in
AFTER triggers, but that will be added to each PL as a separate
commit.

An incidental effect of this patch is to produce a more informative
error message if an attempt is made to modify the contents of a CTE
from a referencing DML statement.  No tests previously covered that
possibility, so one is added.

Kevin Grittner and Thomas Munro
Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro
with valuable comments and suggestions from many others
2017-03-31 23:17:18 -05:00
Robert Haas 7d8f6986b8 Fix parallel query so it doesn't spoil row estimates above Gather.
Commit 45be99f8cd removed GatherPath's
num_workers field, but this is entirely bogus.  Normally, a path's
parallel_workers flag is supposed to indicate the number of workers
that it wants, and should be 0 for a non-partial path.  In that
commit, I mistakenly thought that GatherPath could also use that field
to indicate the number of workers that it would try to start, but
that's disastrous, because then it can propagate up to higher nodes in
the plan tree, which will then get incorrect rowcounts because the
parallel_workers flag is involved in computing those values.  Repair
by putting the separate field back.

Report by Tomas Vondra.  Patch by me, reviewed by Amit Kapila.

Discussion: http://postgr.es/m/f91b4a44-f739-04bd-c4b6-f135bd643669@2ndquadrant.com
2017-03-31 21:01:20 -04:00
Peter Eisentraut 4cb824699e Cast result of copyObject() to correct type
copyObject() is declared to return void *, which allows easily assigning
the result independent of the input, but it loses all type checking.

If the compiler supports typeof or something similar, cast the result to
the input type.  This creates a greater amount of type safety.  In some
cases, where the result is assigned to a generic type such as Node * or
Expr *, new casts are now necessary, but in general casts are now
unnecessary in the normal case and indicate that something unusual is
happening.

Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
2017-03-28 21:59:23 -04:00
Andrew Gierth b5635948ab Support hashed aggregation with grouping sets.
This extends the Aggregate node with two new features: HashAggregate
can now run multiple hashtables concurrently, and a new strategy
MixedAggregate populates hashtables while doing sorted grouping.

The planner will now attempt to save as many sorts as possible when
planning grouping sets queries, while not exceeding work_mem for the
estimated combined sizes of all hashtables used.  No SQL-level changes
are required.  There should be no user-visible impact other than the
new EXPLAIN output and possible changes to result ordering when ORDER
BY was not used (which affected a few regression tests).  The
enable_hashagg option is respected.

Author: Andrew Gierth
Reviewers: Mark Dilger, Andres Freund
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
2017-03-27 04:20:54 +01:00
Tom Lane 244dd95ce9 Update some obsolete comments.
Fix a few stray references to expression eval functions that don't
exist anymore or don't take the same input representation they used to.
2017-03-26 11:36:46 -04:00
Andres Freund b8d7f053c5 Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.

This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.

The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
  function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
  out operation metadata sequentially; including the avoidance of
  nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
  constant re-checks at evaluation time

Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.

The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
  overhead of expression evaluation, by caching state in prepared
  statements.  That'd be helpful in OLTPish scenarios where
  initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
  work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
  been made here too.

The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
  initialization, whereas previously they were done during
  execution. In edge cases this can lead to errors being raised that
  previously wouldn't have been, e.g. a NULL array being coerced to a
  different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
  during expression initialization, previously it was re-built
  every time a domain check was evaluated. For normal queries this
  doesn't change much, but e.g. for plpgsql functions, which caches
  ExprStates, the old set could stick around longer.  The behavior
  around might still change.

Author: Andres Freund, with significant changes by Tom Lane,
	changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-25 14:52:06 -07:00
Alvaro Herrera 7b504eb282 Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns.  Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too.  All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.

This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table.  This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve.  A new
special pseudo-type pg_ndistinct is used.

(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)

Author: Tomas Vondra.  Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
    Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
    https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 14:06:10 -03:00
Robert Haas dc02c7bca4 Fix wrong costing of Sort under Gather Merge.
There's no mechanism for such a sort to become a top-N sort, so we
should pass -1 rather than limit_tuples to cost_sort().

Rushabh Lathia, per a report from Mithun Cy

Discussion: http://postgr.es/m/CAGPqQf1akRcSgC9=6iwx=sEPap9UvPpHJLzg8_N+OuHdb6fL+g@mail.gmail.com
2017-03-22 14:45:14 -04:00
Robert Haas d3cc37f1d8 Don't scan partitioned tables.
Partitioned tables do not contain any data; only their unpartitioned
descendents need to be scanned.  However, the partitioned tables still
need to be locked, even though they're not scanned.  To make that
work, Append and MergeAppend relations now need to carry a list of
(unscanned) partitioned relations that must be locked, and InitPlan
must lock all partitioned result relations.

Aside from the obvious advantage of avoiding some work at execution
time, this has two other advantages.  First, it may improve the
planner's decision-making in some cases since the empty relation
might throw things off.  Second, it paves the way to getting rid of
the storage for partitioned tables altogether.

Amit Langote, reviewed by me.

Discussion: http://postgr.es/m/6837c359-45c4-8044-34d1-736756335a15@lab.ntt.co.jp
2017-03-21 09:48:04 -04:00
Robert Haas 1ea60ad602 Fix failure to use clamp_row_est() for parallel joins.
Commit 0c2070cefa neglected to use
clamp_row_est() where it should have done so.

Patch by me.  Report by Amit Kapila.

Discussion: http://postgr.es/m/CAA4eK1KPm8RYa1Kun3ZmQj9pb723b-EFN70j47Pid1vn3ByquA@mail.gmail.com
2017-03-15 12:28:54 -04:00
Robert Haas c44c47a773 Some preliminary refactoring towards partitionwise join.
Partitionwise join proposes add a concept of child join relations,
which will have the same relationship with join relations as "other
member" relations do with base relations.  These relations will need
some but not all of the handling that we currently have for join
relations, and some but not all of the handling that we currently have
for appendrels, since they are a mix of the two.  Refactor a little
bit so that the necessary bits of logic are exposed as separate
functions.

Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi and
by me.

Discussion: http://postgr.es/m/CAFjFpRfqotRR6cM3sooBHMHEVdkFfAZ6PyYg4GRZsoMuW08HjQ@mail.gmail.com
2017-03-14 19:25:47 -04:00
Robert Haas 2609e91fcf Fix regression in parallel planning against inheritance tables.
Commit 51ee6f3160 accidentally changed
the behavior around inheritance hierarchies; before, we always
considered parallel paths even for very small inheritance children,
because otherwise an inheritance hierarchy with even one small child
wouldn't be eligible for parallelism.  That exception was inadverently
removed; put it back.

In passing, also adjust the degree-of-parallelism comptuation for
index-only scans not to consider the number of heap pages fetched.
Otherwise, we'll avoid parallel index-only scans on tables that are
mostly all-visible, which isn't especially logical.

Robert Haas and Amit Kapila, per a report from Ashutosh Sharma.

Discussion: http://postgr.es/m/CAE9k0PmgSoOHRd60SHu09aRVTHRSs8s6pmyhJKWHxWw9C_x+XA@mail.gmail.com
2017-03-14 14:33:14 -04:00
Peter Eisentraut a47b38c9ee Spelling fixes
From: Josh Soref <jsoref@gmail.com>
2017-03-14 12:58:39 -04:00
Peter Eisentraut f97a028d8e Spelling fixes in code comments
From: Josh Soref <jsoref@gmail.com>
2017-03-14 12:58:39 -04:00
Robert Haas a82178020d Update overlooked comment for Gather Merge.
Commit 355d3993c5 probably should have
done this, but nobody noticed that it was needed.
2017-03-14 07:52:11 -04:00
Robert Haas bce352fb46 Remove some bogus logic from create_gather_merge_plan.
This logic was adapated from create_merge_append_plan, but the two
cases aren't really analogous, because create_merge_append_plan is not
projection-capable and must therefore have a tlist identical to that
of the underlying paths.  Overwriting the tlist of Gather Merge with
whatever the underlying plan happens to produce is no good at all.

Patch by me, reviewed by Rushabh Lathia, who also reported the issue
and made an initial attempt at a fix.

Discussion: http://postgr.es/m/CA+Tgmob_-oHEOBfT9S25bjqokdqv8e8xEmh9zOY+3MPr_LmuhA@mail.gmail.com
2017-03-14 07:43:45 -04:00
Alvaro Herrera a9c074ba7e Silence unused variable compiler warning
Fallout from fcec6caafa2: mark a variable in
set_tablefunc_size_estimates as used for asserts only.

Also, the planner_rte_fetch() call is pointless with assertions
disabled, so enclose it in a USE_ASSERT_CHECKING #ifdef; fix the same
problem in set_subquery_size_estimates().

First problem noted by David Rowley, whose compiler is noisier than mine
in this regard.
2017-03-13 19:02:38 -03:00
Robert Haas 0ee92e1c9b Fix a couple of planner bugs in Gather Merge.
Neha Sharma reported these to Rushabh Lathia just after I commit
355d3993c5 went in.  The patch is
Rushabh's, with input from me.
2017-03-09 12:06:49 -05:00
Robert Haas 355d3993c5 Add a Gather Merge executor node.
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers.  (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)

This works out to a win if it saves us from having to perform an
expensive Sort.  In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't.  But if this avoids an expensive sort then it's a win.

Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.

Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
2017-03-09 07:49:29 -05:00
Robert Haas f35742ccb7 Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.

Dilip Kumar, with some corrections and cosmetic changes by me.  The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.

Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 12:05:43 -05:00
Alvaro Herrera fcec6caafa Support XMLTABLE query expression
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.

This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE.  (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).

The implemented syntax deviates slightly from what the standard
requires.  First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only.  Second, we don't
currently support a default namespace to be specified.

This implementation relies on a new executor node based on a hardcoded
method table.  (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)

Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
2017-03-08 12:40:26 -03:00
Robert Haas 506f05423a Properly initialize variable.
Commit 3bc7dafa9b forgot to do this.

Noted while experimenting with valgrind.
2017-03-07 13:50:52 -05:00
Robert Haas 3bc7dafa9b Consider parallel merge joins.
Commit 45be99f8cd took the position
that performing a merge join in parallel was not likely to work out
well, but this conclusion was greeted with skepticism even at the
time.  Whether it was true then or not, it's clearly not true any
more now that we have parallel index scan.

Dilip Kumar, reviewed by Amit Kapila and by me.

Discussion: http://postgr.es/m/CAFiTN-v3=cM6nyFwFGp0fmvY4=kk79Hq9Fgu0u8CSJ-EEq1Tiw@mail.gmail.com
2017-03-07 11:54:51 -05:00
Robert Haas a71f10189d Preparatory refactoring for parallel merge join support.
Extract the logic used by hash_inner_and_outer into a separate
function, get_cheapest_parallel_safe_total_inner, so that it can
also be used to plan parallel merge joins.

Also, add a require_parallel_safe argument to the existing function
get_cheapest_path_for_pathkeys, because parallel merge join needs
to find the cheapest path for a given set of pathkeys that is
parallel-safe, not just the cheapest one overall.

Patch by me, reviewed by Dilip Kumar.

Discussion: http://postgr.es/m/CA+TgmoYOv+dFK0MWW6366dFj_xTnohQfoBDrHyB7d1oZhrgPjA@mail.gmail.com
2017-03-07 10:33:29 -05:00
Robert Haas 655393a022 Fix parallel hash join path search.
When the very cheapest path is not parallel-safe, we want to instead use
the cheapest unparameterized path that is.  The old code searched
innerrel->cheapest_parameterized_paths, but that isn't right, because
the path we want may not be in that list.  Search innerrel->pathlist
instead.

Spotted by Dilip Kumar.

Discussion: http://postgr.es/m/CAFiTN-szCEcZrQm0i_w4xqSaRUTOUFstNu32Zn4rxxDcoa8gnA@mail.gmail.com
2017-03-07 10:22:07 -05:00
Tom Lane c56ac2913a Suppress unused-variable warning.
Rearrange so we don't have an unused variable in disable-cassert case.

Discussion: https://postgr.es/m/CAMkU=1x63f2QyFTeas83xJqD+Hm1PBuok1LrzYzS-OngDzYOVA@mail.gmail.com
2017-02-21 17:58:24 -05:00
Peter Eisentraut 38d103763d Make more use of castNode() 2017-02-21 11:59:09 -05:00
Robert Haas 0414b26bac Add optimizer and executor support for parallel index-only scans.
Commit 5262f7a4fc added similar support
for parallel index scans; this extends that work to index-only scans.
As with parallel index scans, this requires support from the index AM,
so currently parallel index-only scans will only be possible for btree
indexes.

Rafia Sabih, reviewed and tested by Rahila Syed, Tushar Ahuja,
and Amit Kapila

Discussion: http://postgr.es/m/CAOGQiiPEAs4C=TBp0XShxBvnWXuzGL2u++Hm1=qnCpd6_Mf8Fw@mail.gmail.com
2017-02-19 15:57:55 +05:30
Robert Haas 5262f7a4fc Add optimizer and executor support for parallel index scans.
In combination with 569174f1be, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes.  This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.

Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
2017-02-15 13:53:24 -05:00
Robert Haas 51ee6f3160 Replace min_parallel_relation_size with two new GUCs.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan.  Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way.  Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.

Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.

Amit Kapila and Robert Haas

Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
2017-02-15 13:37:24 -05:00
Robert Haas 5e6d8d2bbb Allow parallel workers to execute subplans.
This doesn't do anything to make Param nodes anything other than
parallel-restricted, so this only helps with uncorrelated subplans,
and it's not necessarily very cheap because each worker will run the
subplan separately (just as a Hash Join will build a separate copy of
the hash table in each participating process), but it's a first step
toward supporting cases that are more likely to help in practice, and
is occasionally useful on its own.

Amit Kapila, reviewed and tested by Rafia Sabih, Dilip Kumar, and
me.

Discussion: http://postgr.es/m/CAA4eK1+e8Z45D2n+rnDMDYsVEb5iW7jqaCH_tvPMYau=1Rru9w@mail.gmail.com
2017-02-14 18:16:03 -05:00
Tom Lane 8d396a0a70 Remove duplicate code in planner.c.
I noticed while hacking on join UNION transforms that planner.c's
function get_base_rel_indexes() just duplicates the functionality of
get_relids_in_jointree().  It doesn't even have the excuse of being
older code :-(.  Drop it and use the latter function instead.
2017-02-14 11:47:45 -05:00
Heikki Linnakangas 181bdb90ba Fix typos in comments.
Backpatch to all supported versions, where applicable, to make backpatching
of future fixes go more smoothly.

Josh Soref

Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
2017-02-06 11:33:58 +02:00
Tom Lane 555494d1bc Fix placement of initPlans when forcibly materializing a subplan.
If we forcibly place a Material node atop a finished subplan, we need
to move any initPlans attached to the subplan up to the Material node,
in order to keep SS_finalize_plan() happy.  I'd figured this out in
commit 7b67a0a49 for the case of materializing a cursor plan, but out of
an abundance of caution, I put the initPlan movement hack at the call
site for that case, rather than inside materialize_finished_plan().
That was the wrong thing, because it turns out to also be necessary for
the only other caller of materialize_finished_plan(), ie subselect.c.
We lacked any test cases that exposed the mistake, but bug#14524 from
Wei Congrui shows that it's possible to get an initPlan reference into
the top tlist in that case too, and then SS_finalize_plan() complains.
Hence, move the hack into materialize_finished_plan().

In HEAD, also relocate some recently-added tests in subselect.sql, which
I'd unthinkingly dropped into the middle of a sequence of related tests.

Report: https://postgr.es/m/20170202060020.1400.89021@wrigleys.postgresql.org
2017-02-02 19:11:32 -05:00
Tom Lane c82d4e658e Fix mishandling of tSRFs at different nesting levels.
Given a targetlist like "srf(x), f(srf(x))", split_pathtarget_at_srfs()
decided that it needed two levels of ProjectSet nodes, failing to notice
that the two SRF calls are textually equal().  Because of that, setrefs.c
would convert the upper ProjectSet's tlist to "Var1, f(Var1)" (where Var1
represents a reference to the srf(x) output of the lower ProjectSet).
This triggered an assertion in nodeProjectSet.c complaining that it found
no SRFs to evaluate, as reported by Erik Rijkers.

What we want in such a case is to evaluate srf(x) only once and use a plain
Result node to compute "Var1, f(Var1)"; that gives results similar to what
previous versions produced, whereas allowing srf(x) to be evaluated again
in an upper ProjectSet would square the number of rows emitted.

Furthermore, even if the SRF calls aren't textually identical, we want them
to be evaluated in lockstep, because that's what happened in the old
implementation.  But split_pathtarget_at_srfs() got this completely wrong,
using two levels of ProjectSet for a case like "srf(x), f(srf(y))".

Hence, rewrite split_pathtarget_at_srfs() from the ground up so that it
groups SRFs according to the depth of nesting of SRFs in their arguments.
This is pretty much how we envisioned that working originally, but I blew
it when it came to implementation.

In passing, optimize the case of target == input_target, which I noticed
is not only possible but quite common.

Discussion: https://postgr.es/m/dcbd2853c05d22088766553d60dc78c6@xs4all.nl
2017-02-02 16:38:18 -05:00
Robert Haas da08a65989 Refactor bitmap heap scan estimation of heap pages fetched.
Currently, we only need this logic in order to cost a Bitmap Heap
Scan.  But a pending patch for Parallel Bitmap Heap Scan also uses
it to help figure out how many workers to use for the scan, which
has to be determined prior to costing.  So, move the logic to
a separate function to make that easier.

Dilip Kumar.  The patch series of which this is a part has been
reviewed by Andres Freund, Amit Khendekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, and me; it is not clear from the email
discussion which of those people have looked specifically at this
part.

Discussion: http://postgr.es/m/CAFiTN-v3QYNJEZnnmKCeATuLbN-h9tMVfeEF0+BrouYDqjXgwg@mail.gmail.com
2017-01-27 16:28:47 -05:00
Tom Lane 3c821466ab Fix example plan in optimizer/README.
Joining three tables only takes two join nodes.  I think when I (tgl)
wrote this, I was envisioning possible additional joins; but since the
example doesn't show any fourth table, it's just confusing to write
a third join node.

Etsuro Fujita

Discussion: https://postgr.es/m/e6cfbaa3-af02-1abc-c25e-8fa5c6bc4e21@lab.ntt.co.jp
2017-01-23 09:38:36 -05:00
Tom Lane d479e37e3d Fix Assert failure induced by commit 215b43cdc.
I'd somehow talked myself into believing that set_append_rel_size
doesn't need to worry about getting back an AND clause when it applies
eval_const_expressions to the result of adjust_appendrel_attrs (that is,
transposing the appendrel parent's restriction clauses for one child).
But that is nonsense, and Andreas Seltenreich's fuzz tester soon
turned up a counterexample.  Put back the make_ands_implicit step
that was there before, and add a regression test covering the case.

Report: https://postgr.es/m/878tq6vja6.fsf@ansel.ydns.eu
2017-01-19 18:20:58 -05:00
Andres Freund ea15e18677 Remove obsoleted code relating to targetlist SRF evaluation.
Since 69f4b9c plain expression evaluation (and thus normal projection)
can't return sets of tuples anymore. Thus remove code dealing with
that possibility.

This will require adjustments in external code using
ExecEvalExpr()/ExecProject() - that should neither be hard nor very
common.

Author: Andres Freund and Tom Lane
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-19 14:40:41 -08:00
Andres Freund 69f4b9c85f Move targetlist SRF handling from expression evaluation to new executor node.
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.

This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.

That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.

To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist.  The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes.  The ProjectSet nodes always get
input from an underlying node.

We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.

While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e.  continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones.  We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.

As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.

Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node.  As that's quite confusing, we're
likely to instead prohibit SRFs in those places.  But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.

There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.

Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-18 13:40:27 -08:00
Robert Haas 716c7d4b24 Factor out logic for computing number of parallel workers.
Forthcoming patches to allow other types of parallel scans will
need this logic, or something like it.

Dilip Kumar
2017-01-18 13:54:45 -05:00
Tom Lane 215b43cdc8 Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows.  The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.

To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation.  Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so.  This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.

This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only.  Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.

Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.

Patch by me, reviewed by Stephen Frost and Dean Rasheed.

Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 12:58:20 -05:00
Tom Lane 0777f7a2e8 Fix matching of boolean index columns to sort ordering.
Normally, if we have a WHERE clause like "indexcol = constant",
the planner will figure out that that index column can be ignored
when determining whether the index has a desired sort ordering.
But this failed to work for boolean index columns, because a
condition like "boolcol = true" is canonicalized to just "boolcol"
which does not give rise to an EquivalenceClass.  Add a check to
allow the same type of deduction to be made in this case too.

Per a complaint from Dima Pavlov.  Arguably this is a bug, but given the
limited impact and the small number of complaints so far, I won't risk
destabilizing plans in stable branches by back-patching.

Patch by me, reviewed by Michael Paquier

Discussion: https://postgr.es/m/1788.1481605684@sss.pgh.pa.us
2017-01-15 14:09:35 -05:00
Tom Lane ab1f0c8225 Change representation of statement lists, and add statement location info.
This patch makes several changes that improve the consistency of
representation of lists of statements.  It's always been the case
that the output of parse analysis is a list of Query nodes, whatever
the types of the individual statements in the list.  This patch brings
similar consistency to the outputs of raw parsing and planning steps:

* The output of raw parsing is now always a list of RawStmt nodes;
the statement-type-dependent nodes are one level down from that.

* The output of pg_plan_queries() is now always a list of PlannedStmt
nodes, even for utility statements.  In the case of a utility statement,
"planning" just consists of wrapping a CMD_UTILITY PlannedStmt around
the utility node.  This list representation is now used in Portal and
CachedPlan plan lists, replacing the former convention of intermixing
PlannedStmts with bare utility-statement nodes.

Now, every list of statements has a consistent head-node type depending
on how far along it is in processing.  This allows changing many places
that formerly used generic "Node *" pointers to use a more specific
pointer type, thus reducing the number of IsA() tests and casts needed,
as well as improving code clarity.

Also, the post-parse-analysis representation of DECLARE CURSOR is changed
so that it looks more like EXPLAIN, PREPARE, etc.  That is, the contained
SELECT remains a child of the DeclareCursorStmt rather than getting flipped
around to be the other way.  It's now true for both Query and PlannedStmt
that utilityStmt is non-null if and only if commandType is CMD_UTILITY.
That allows simplifying a lot of places that were testing both fields.
(I think some of those were just defensive programming, but in many places,
it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.)

Because PlannedStmt carries a canSetTag field, we're also able to get rid
of some ad-hoc rules about how to reconstruct canSetTag for a bare utility
statement; specifically, the assumption that a utility is canSetTag if and
only if it's the only one in its list.  While I see no near-term need for
relaxing that restriction, it's nice to get rid of the ad-hocery.

The API of ProcessUtility() is changed so that what it's passed is the
wrapper PlannedStmt not just the bare utility statement.  This will affect
all users of ProcessUtility_hook, but the changes are pretty trivial; see
the affected contrib modules for examples of the minimum change needed.
(Most compilers should give pointer-type-mismatch warnings for uncorrected
code.)

There's also a change in the API of ExplainOneQuery_hook, to pass through
cursorOptions instead of expecting hook functions to know what to pick.
This is needed because of the DECLARE CURSOR changes, but really should
have been done in 9.6; it's unlikely that any extant hook functions
know about using CURSOR_OPT_PARALLEL_OK.

Finally, teach gram.y to save statement boundary locations in RawStmt
nodes, and pass those through to Query and PlannedStmt nodes.  This allows
more intelligent handling of cases where a source query string contains
multiple statements.  This patch doesn't actually do anything with the
information, but a follow-on patch will.  (Passing this information through
cleanly is the true motivation for these changes; while I think this is all
good cleanup, it's unlikely we'd have bothered without this end goal.)

catversion bump because addition of location fields to struct Query
affects stored rules.

This patch is by me, but it owes a good deal to Fabien Coelho who did
a lot of preliminary work on the problem, and also reviewed the patch.

Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
2017-01-14 16:02:35 -05:00
Robert Haas 0c2070cefa Fix cardinality estimates for parallel joins.
For a partial path, the cardinality estimate needs to reflect the
number of rows we think each worker will see, rather than the total
number of rows; otherwise, costing will go wrong.  The previous coding
got this completely wrong for parallel joins.

Unfortunately, this change may destabilize plans for users of 9.6 who
have enabled parallel query, but since 9.6 is still fairly new I'm
hoping expectations won't be too settled yet.  Also, this is really a
brown-paper-bag bug, so leaving it unfixed for the entire lifetime of
9.6 seems unwise.

Related reports (whose import I initially failed to recognize) by
Tomas Vondra and Tom Lane.

Discussion: http://postgr.es/m/CA+TgmoaDxZ5z5Kw_oCQoymNxNoVaTCXzPaODcOuao=CzK8dMZw@mail.gmail.com
2017-01-13 13:34:10 -05:00
Robert Haas 18fc5192a6 Remove unnecessary arguments from partitioning functions.
RelationGetPartitionQual() and generate_partition_qual() are always
called with recurse = true, so we don't need an argument for that.

Extracted by me from a larger patch by Amit Langote.
2017-01-04 14:56:37 -05:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Robert Haas 59649c3f1c Refactor merge path generation code.
This shouldn't change the set of paths that get generated in any
way, but it is preparatory work for further changes to allow a
partial path to be merge-joined witih a non-partial path to produce
a partial join path.

Dilip Kumar, with cosmetic adjustments by me.
2016-12-21 09:45:50 -05:00
Tom Lane 7fa93eec4e Fix FK-based join selectivity estimation for semi/antijoins.
This case wasn't thought through sufficiently in commit 100340e2d.
It's true that the FK proves that every outer row has a match in the
inner table, but we forgot that some of the inner rows might be filtered
away by WHERE conditions located within the semijoin's RHS.

If the RHS is just one table, we can reasonably take the semijoin
selectivity as equal to the fraction of the referenced table's rows
that are expected to survive its restriction clauses.

If the RHS is a join, it's not clear how much of the referenced table
might get through the join, so fall back to the same rule we were
already using for other outer-join cases: use the minimum of the
regular per-clause selectivity estimates.  This gives the same result
as if we hadn't considered the FK at all when there's a single FK
column, but it should still help for multi-column FKs, which is the
case that 100340e2d is really meant to help with.

Back-patch to 9.6 where the previous commit came in.

Discussion: https://postgr.es/m/16149.1481835103@sss.pgh.pa.us
2016-12-17 15:28:54 -05:00
Tom Lane 1f542a2eac Prevent planagg.c from failing on queries containing CTEs.
The existing tests in preprocess_minmax_aggregates() usually prevent it
from trying to do anything with queries containing CTEs, but there's an
exception: a CTE could be present as a member of an appendrel, if we
flattened a UNION ALL that contains CTE references.  If it did try to
generate an optimized path for a query using a CTE, it failed with
"could not find plan for CTE", as reported by Torsten Förtsch.

The proximate cause is an unwise decision in commit 3fc6e2d7f to clear
subroot->cte_plan_ids in build_minmax_path().  That left the subroot's
cte_plan_ids list out of step with its parse->cteList.

Removing the "subroot->cte_plan_ids = NIL;" assignment is enough to let
the case work again, but really it's pretty silly to be expending any
cycles at all in this module when there are CTEs: we always treat their
outputs as unordered so there's no way for the optimization to win.
Hence, also add an early-exit test so we don't waste time like that.

Back-patch to 9.6 where the misbehavior was introduced.

Report: https://postgr.es/m/CAKkG4_=gjY5QiHtqSZyWMwDuTd_CftKoTaCqxjJ7uUz1-Gw=qw@mail.gmail.com
2016-12-13 13:20:37 -05:00
Tom Lane 0b78106cd4 Fix reporting of column typmods for multi-row VALUES constructs.
expandRTE() and get_rte_attribute_type() reported the exprType() and
exprTypmod() values of the expressions in the first row of the VALUES as
being the column type/typmod returned by the VALUES RTE.  That's fine for
the data type, since we coerce all expressions in a column to have the same
common type.  But we don't coerce them to have a common typmod, so it was
possible for rows after the first one to return values that violate the
claimed column typmod.  This leads to the incorrect result seen in bug
#14448 from Hassan Mahmood, as well as some other corner-case misbehaviors.

The desired behavior is the same as we use in other type-unification
cases: report the common typmod if there is one, but otherwise return -1
indicating no particular constraint.  It's cheap for transformValuesClause
to determine the common typmod while transforming a multi-row VALUES, but
it'd be less cheap for expandRTE() and get_rte_attribute_type() to
re-determine that info every time they're asked --- possibly a lot less
cheap, if the VALUES has many rows.  Therefore, the best fix is to record
the common typmods explicitly in a list in the VALUES RTE, as we were
already doing for column collations.  This looks quite a bit like what
we're doing for CTE RTEs, so we can save a little bit of space and code by
unifying the representation for those two RTE types.  They both now share
coltypes/coltypmods/colcollations fields.  (At some point it might seem
desirable to populate those fields for all RTE types; but right now it
looks like constructing them for other RTE types would add more code and
cycles than it would save.)

The RTE change requires a catversion bump, so this fix is only usable
in HEAD.  If we fix this at all in the back branches, the patch will
need to look quite different.

Report: https://postgr.es/m/20161205143037.4377.60754@wrigleys.postgresql.org
Discussion: https://postgr.es/m/27429.1480968538@sss.pgh.pa.us
2016-12-08 11:40:02 -05:00
Robert Haas f0e44751d7 Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.

Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.

Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.

Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.
2016-12-07 13:17:55 -05:00
Tom Lane 41e2b84ce1 Fix bogus handling of JOIN_UNIQUE_OUTER/INNER cases for parallel joins.
consider_parallel_nestloop passed the wrong jointype down to its
subroutines for JOIN_UNIQUE_INNER cases (it should pass JOIN_INNER), and it
thought that it could pass paths other than innerrel->cheapest_total_path
to create_unique_path, which create_unique_path is not on board with.
These bugs would lead to assertion failures or other errors, suggesting
that this code path hasn't been tested much.

hash_inner_and_outer's code for parallel join effectively treated both
JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER the same as JOIN_INNER (for
different reasons :-(), leading to incorrect plans that treated a semijoin
as if it were a plain join.

Michael Day submitted a test case demonstrating that hash_inner_and_outer
failed for JOIN_UNIQUE_OUTER, and I found the other cases through code
review.

Report: https://postgr.es/m/D0E8A029-D1AC-42E8-979A-5DE4A77E4413@rcmail.com
2016-11-29 19:32:35 -05:00
Tom Lane d6c8b34e95 Fix incorrect variable type in set_rel_consider_parallel().
func_parallel() returns char not Oid.  Harmless, but still wrong.

Amit Langote
2016-11-29 11:07:02 -05:00
Tom Lane 4e20511d5b Fix estimate_expression_value to constant-fold SQLValueFunction nodes.
Oversight in my commit 0bb51aa96.  Noted while poking at a recent
bug report --- HEAD's estimates for a query using CURRENT_DATE
were unexpectedly much worse than 9.6's.
2016-11-28 19:08:45 -05:00
Alvaro Herrera eb68141688 Fix get_relation_info name typo'ed in a comment
Plus add a missing comment about this in get_relation_info itself.

Author: Amit Langote
Discussion: https://postgr.es/m/e46c0569-0449-afa0-e2fe-f3776e4b3fd5@lab.ntt.co.jp
2016-11-28 15:56:00 -03:00
Tom Lane ab77a5a456 Mark a query's topmost Paths parallel-unsafe if they will have initPlans.
Andreas Seltenreich found another case where we were being too optimistic
about allowing a plan to be considered parallelizable despite it containing
initPlans.  It seems like the real issue here is that if we know we are
going to tack initPlans onto the topmost Plan node for a subquery, we
had better mark that subquery's result Paths as not-parallel-safe.  That
fixes this problem and allows reversion of a kluge (added in commit
7b67a0a49 and extended in f24cf960d) to not trust the parallel_safe flag
at top level.

Discussion: <874m2w4k5d.fsf@ex.ansel.ydns.eu>
2016-11-25 16:20:12 -05:00
Tom Lane 6fa391be4e Avoid masking a function parameter name with a local variable name.
No actual bug here, but it might confuse readers, so change the name
of the local variable.

Ashutosh Bapat
2016-11-23 16:26:40 -05:00
Tom Lane 4324ade9a6 Fix optimization for skipping searches for parallel-query hazards.
Fix thinko in commit da1c91631: even if the original query was free of
parallel hazards, we might introduce such a hazard by adding PARAM_EXEC
Param nodes.  Adjust is_parallel_safe() so that it will scan the given
expression whenever any such nodes have been created.  Per report from
Andreas Seltenreich.

Discussion: <878tse6yvf.fsf@credativ.de>
2016-11-21 13:19:23 -05:00
Tom Lane f24cf960d7 Fix test for subplans in force-parallel mode.
We mustn't force parallel mode if the query has any subplans, since
ExecSerializePlan doesn't transmit them to workers.  Testing
top_plan->initPlan is inadequate because (1) there might be initPlans
attached to lower plan nodes, and (2) non-initPlan subplans don't
work either.  There's certainly room for improvement in those
restrictions, but for the moment that's what we've got.

Amit Kapila, per report from Andreas Seltenreich

Discussion: <8737im6pmh.fsf@credativ.de>
2016-11-21 11:09:24 -05:00
Tom Lane 0832f2db68 Fix latent costing error in create_merge_append_path.
create_merge_append_path should use the path rowcount it just computed,
not rel->tuples, for costing purposes.  Those numbers should always be
the same at present, but if we ever support parameterized MergeAppend
paths (a case this function is otherwise prepared for), the former would
be right and the latter wrong.

No need for back-patch since the problem is only latent.

Ashutosh Bapat

Discussion: <CAFjFpRek+cLCnTo24youuGtsq4zRphEB8EUUPjDxZjnL4n4HYQ@mail.gmail.com>
2016-11-19 15:06:45 -05:00
Tom Lane 24aef33804 Cleanup of rewriter and planner handling of Query.hasRowSecurity flag.
Be sure to pull up the subquery's hasRowSecurity flag when flattening a
subquery in pull_up_simple_subquery().  This isn't a bug today because
we don't look at the hasRowSecurity flag during planning, but it could
easily be a bug tomorrow.

Likewise, make rewriteRuleAction() pull up the hasRowSecurity flag when
absorbing RTEs from a rule action.  This isn't a bug either, for the
opposite reason: the flag should never be set yet.  But again, it seems
like good future proofing.

Add a comment explaining why rewriteTargetView() should *not* set
hasRowSecurity when adding stuff to securityQuals.

Improve some nearby comments about securityQuals processing, and document
that field more completely in parsenodes.h.

Patch by me, analysis by Dean Rasheed.

Discussion: <CAEZATCXZ8tb2DV6f=bkhsMV6u_gRcZ0CZBw2J-qU84RxSukZog@mail.gmail.com>
2016-11-10 16:16:33 -05:00
Tom Lane e1b449bea9 Fix partial aggregation for the case of a degenerate GROUP BY clause.
The plan generated for sorted partial aggregation with "GROUP BY constant"
included a Sort node with no sort keys, which the executor does not like.

Per report from Steve Randall.  I'd add a regression test case if I could
think of a compact one, but it doesn't seem worth expending lots of cycles
on.

Report: <CABVd52UAdGXpg_rCk46egpNKYdXOzCjuJ1zG26E2xBe_8bj+Fg@mail.gmail.com>
2016-11-10 11:31:56 -05:00
Tom Lane 34ca090570 Adjust cost_merge_append() to reflect use of binaryheap_replace_first().
Commit 7a2fe9bd0 improved merge append so that replacement of a tuple
takes log(N) operations, not twice log(N).  Since cost_merge_append knew
about that explicitly, we should adjust it.  This probably makes little
difference in practice, but the obsolete comment is confusing.

Ideally this would have been put in in 9.3 with the underlying behavior
change; but I'm not going to back-patch it, since there's some small chance
of changing a plan choice that somebody's optimized for.

Thomas Munro

Discussion: <CAEepm=0WQBSvuYcMOUj4Ga4NXpu2J=ejZcE=e=eiTjTX-6_gDw@mail.gmail.com>
2016-11-05 13:48:11 -04:00
Tom Lane 770671062f Don't make FK-based selectivity estimates in inheritance situations.
The foreign-key-aware logic for estimation of join sizes (added in commit
100340e2d) blindly tried to apply the concept to rels that are actually
parents of inheritance trees.  This is just plain wrong so far as the
referenced relation is concerned, since the inheritance scan may well
produce lots of rows that are not participating in the constraint.  It's
wrong for the referencing relation too, for the same reason; although on
that end we could conceivably detect whether all members of the inheritance
tree have equivalent FK constraints pointing to the same referenced rel,
and then proceed more or less as we do now.  But pending somebody writing
code to do that, we must disable this, because it's producing completely
silly estimates when there's an FK linking the heads of inheritance trees.

Per bug #14404 from Clinton Adams.  Back-patch to 9.6 where the new
estimation logic came in.

Report: <20161028200412.15987.96482@wrigleys.postgresql.org>
2016-11-02 15:50:15 -04:00
Tom Lane da8f3ebf30 Don't convert Consts into Vars during setrefs.c processing.
While converting expressions in an upper-level plan node so that they
reference Vars and expressions provided by the input plan node(s),
don't convert plain Const items, even if there happens to be a matching
Const in the input.  It's silly to do so because a Var is more expensive to
execute than a Const.  Moreover, converting can fool ExecCheckPlanOutput's
check that an insert or update query inserts nulls into dropped columns,
leading to "query provides a value for a dropped column" errors during
INSERT or UPDATE on a table with a dropped column.  We could solve this
by making that check more complicated, but I don't see the point; this fix
should save a marginal number of cycles, and it also makes for less messy
EXPLAIN output, as shown by the ensuing regression test result changes.

Per report from Pavel Hanák.  I have not incorporated a test case based
on that example, as there doesn't seem to be a simple way of checking
this in isolation without making a bunch of assumptions about other
planner and SQL-function behavior.

Back-patch to 9.6.  This setrefs.c behavior exists much further back,
but there is not currently reason to think that it causes problems
before 9.6.

Discussion: <83shraampf.fsf@is-it.eu>
2016-11-02 14:32:13 -04:00
Tom Lane 9a00f03e47 Improve speed of aggregates that use array_append as transition function.
In the previous coding, if an aggregate's transition function returned an
expanded array, nodeAgg.c and nodeWindowAgg.c would always copy it and thus
force it into the flat representation.  This led to ping-ponging between
flat and expanded formats, which costs a lot.  For an aggregate using
array_append as transition function, I measured about a 15X slowdown
compared to the pre-9.5 code, when working on simple int[] arrays.
Of course, the old code was already O(N^2) in this usage due to copying
flat arrays all the time, but it wasn't quite this inefficient.

To fix, teach nodeAgg.c and nodeWindowAgg.c to allow expanded transition
values without copying, so long as the transition function takes care to
return the transition value already properly parented under the aggcontext.
That puts a bit of extra responsibility on the transition function, but
doing it this way allows us to not need any extra logic in the fast path
of advance_transition_function (ie, with a pass-by-value transition value,
or with a modified-in-place pass-by-reference value).  We already know
that that's a hot spot so I'm loath to add any cycles at all there.  Also,
while only array_append currently knows how to follow this convention,
this solution allows other transition functions to opt-in without needing
to have a whitelist in the core aggregation code.

(The reason we would need a whitelist is that currently, if you pass a
R/W expanded-object pointer to an arbitrary function, it's allowed to do
anything with it including deleting it; that breaks the core agg code's
assumption that it should free discarded values.  Returning a value under
aggcontext is the transition function's signal that it knows it is an
aggregate transition function and will play nice.  Possibly the API rules
for expanded objects should be refined, but that would not be a
back-patchable change.)

With this fix, an aggregate using array_append is no longer O(N^2), so it's
much faster than pre-9.5 code rather than much slower.  It's still a bit
slower than the bespoke infrastructure for array_agg, but the differential
seems to be only about 10%-20% rather than orders of magnitude.

Discussion: <6315.1477677885@sss.pgh.pa.us>
2016-10-30 12:27:41 -04:00
Robert Haas 7012b132d0 postgres_fdw: Push down aggregates to remote servers.
Now that the upper planner uses paths, and now that we have proper hooks
to inject paths into the upper planning process, it's possible for
foreign data wrappers to arrange to push aggregates to the remote side
instead of fetching all of the rows and aggregating them locally.  This
figures to be a massive win for performance, so teach postgres_fdw to
do it.

Jeevan Chalke and Ashutosh Bapat.  Reviewed by Ashutosh Bapat with
additional testing by Prabhat Sahu.  Various mostly cosmetic changes
by me.
2016-10-21 09:54:29 -04:00
Andres Freund 5dfc198146 Use more efficient hashtable for execGrouping.c to speed up hash aggregation.
The more efficient hashtable speeds up hash-aggregations with more than
a few hundred groups significantly. Improvements of over 120% have been
measured.

Due to the the different hash table queries that not fully
determined (e.g. GROUP BY without ORDER BY) may change their result
order.

The conversion is largely straight-forward, except that, due to the
static element types of simplehash.h type hashes, the additional data
some users store in elements (e.g. the per-group working data for hash
aggregaters) is now stored in TupleHashEntryData->additional.  The
meaning of BuildTupleHashTable's entrysize (renamed to additionalsize)
has been changed to only be about the additionally stored size.  That
size is only used for the initial sizing of the hash-table.

Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
2016-10-14 17:22:51 -07:00
Tom Lane 72daabc7a3 Disallow pushing volatile quals past set-returning functions.
Pushing an upper-level restriction clause into an unflattened
subquery-in-FROM is okay when the subquery contains no SRFs in its
targetlist, or when it does but the SRFs are unreferenced by the clause
*and the clause is not volatile*.  Otherwise, we're changing the number
of times the clause is evaluated, which is bad for volatile quals, and
possibly changing the result, since a volatile qual might succeed for some
SRF output rows and not others despite not referencing any of the changing
columns.  (Indeed, if the clause is something like "random() > 0.5", the
user is probably expecting exactly that behavior.)

We had most of these restrictions down, but not the one about the upper
clause not being volatile.  Fix that, and add a regression test to
illustrate the expected behavior.

Although this is definitely a bug, it doesn't seem like back-patch
material, since possibly some users don't realize that the broken
behavior is broken and are relying on what happens now.  Also, while
the added test is quite cheap in the wake of commit a4c35ea1c, it would
be much more expensive (or else messier) in older branches.

Per report from Tom van Tilburg.

Discussion: <CAP3PPDiucxYCNev52=YPVkrQAPVF1C5PFWnrQPT7iMzO1fiKFQ@mail.gmail.com>
2016-09-27 18:43:36 -04:00
Tom Lane a4c35ea1c2 Improve parser's and planner's handling of set-returning functions.
Teach the parser to reject misplaced set-returning functions during parse
analysis using p_expr_kind, in much the same way as we do for aggregates
and window functions (cf commit eaccfded9).  While this isn't complete
(it misses nesting-based restrictions), it's much better than the previous
error reporting for such cases, and it allows elimination of assorted
ad-hoc expression_returns_set() error checks.  We could add nesting checks
later if it seems important to catch all cases at parse time.

There is one case the parser will now throw error for although previous
versions allowed it, which is SRFs in the tlist of an UPDATE.  That never
behaved sensibly (since it's ill-defined which generated row should be
used to perform the update) and it's hard to see why it should not be
treated as an error.  It's a release-note-worthy change though.

Also, add a new Query field hasTargetSRFs reporting whether there are
any SRFs in the targetlist (including GROUP BY/ORDER BY expressions).
The parser can now set that basically for free during parse analysis,
and we can use it in a number of places to avoid expression_returns_set
searches.  (There will be more such checks soon.)  In some places, this
allows decontorting the logic since it's no longer expensive to check for
SRFs in the tlist --- so I made the checks parallel to the handling of
hasAggs/hasWindowFuncs wherever it seemed appropriate.

catversion bump because adding a Query field changes stored rules.

Andres Freund and Tom Lane

Discussion: <24639.1473782855@sss.pgh.pa.us>
2016-09-13 13:54:24 -04:00
Tom Lane ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Tom Lane 2c00fad286 Fix improper repetition of previous results from a hashed aggregate.
ExecReScanAgg's check for whether it could re-use a previously calculated
hashtable neglected the possibility that the Agg node might reference
PARAM_EXEC Params that are not referenced by its input plan node.  That's
okay if the Params are in upper tlist or qual expressions; but if one
appears in aggregate input expressions, then the hashtable contents need
to be recomputed when the Param's value changes.

To avoid unnecessary performance degradation in the case of a Param that
isn't within an aggregate input, add logic to the planner to determine
which Params are within aggregate inputs.  This requires a new field in
struct Agg, but fortunately we never write plans to disk, so this isn't
an initdb-forcing change.

Per report from Jeevan Chalke.  This has been broken since forever,
so back-patch to all supported branches.

Andrew Gierth, with minor adjustments by me

Report: <CAM2+6=VY8ykfLT5Q8vb9B6EbeBk-NGuLbT6seaQ+Fq4zXvrDcA@mail.gmail.com>
2016-08-24 14:38:12 -04:00
Tom Lane 65a603e903 Guard against parallel-restricted functions in VALUES expressions.
Obvious brain fade in set_rel_consider_parallel().  Noticed it while
adjusting the adjacent RTE_FUNCTION case.

In 9.6, also make the code look more like what I just did in HEAD
by removing the unnecessary function_rte_parallel_ok subroutine
(it does nothing that expression_tree_walker wouldn't do).
2016-08-19 14:35:32 -04:00
Tom Lane da1c91631e Speed up planner's scanning for parallel-query hazards.
We need to scan the whole parse tree for parallel-unsafe functions.
If there are none, we'll later need to determine whether particular
subtrees contain any parallel-restricted functions.  The previous coding
retained no knowledge from the first scan, even though this is very
wasteful in the common case where the query contains only parallel-safe
functions.  We can bypass all of the later scans by remembering that fact.
This provides a small but measurable speed improvement when the case
applies, and shouldn't cost anything when it doesn't.

Patch by me, reviewed by Robert Haas

Discussion: <3740.1471538387@sss.pgh.pa.us>
2016-08-19 14:03:13 -04:00
Tom Lane 0bb51aa967 Improve parsetree representation of special functions such as CURRENT_DATE.
We implement a dozen or so parameterless functions that the SQL standard
defines special syntax for.  Up to now, that was done by converting them
into more or less ad-hoc constructs such as "'now'::text::date".  That's
messy for multiple reasons: it exposes what should be implementation
details to users, and performance is worse than it needs to be in several
cases.  To improve matters, invent a new expression node type
SQLValueFunction that can represent any of these parameterless functions.

Bump catversion because this changes stored parsetrees for rules.

Discussion: <30058.1463091294@sss.pgh.pa.us>
2016-08-16 20:33:01 -04:00
Tom Lane f0c7b789ab Fix two errors with nested CASE/WHEN constructs.
ExecEvalCase() tried to save a cycle or two by passing
&econtext->caseValue_isNull as the isNull argument to its sub-evaluation of
the CASE value expression.  If that subexpression itself contained a CASE,
then *isNull was an alias for econtext->caseValue_isNull within the
recursive call of ExecEvalCase(), leading to confusion about whether the
inner call's caseValue was null or not.  In the worst case this could lead
to a core dump due to dereferencing a null pointer.  Fix by not assigning
to the global variable until control comes back from the subexpression.
Also, avoid using the passed-in isNull pointer transiently for evaluation
of WHEN expressions.  (Either one of these changes would have been
sufficient to fix the known misbehavior, but it's clear now that each of
these choices was in itself dangerous coding practice and best avoided.
There do not seem to be any similar hazards elsewhere in execQual.c.)

Also, it was possible for inlining of a SQL function that implements the
equality operator used for a CASE comparison to result in one CASE
expression's CaseTestExpr node being inserted inside another CASE
expression.  This would certainly result in wrong answers since the
improperly nested CaseTestExpr would be caused to return the inner CASE's
comparison value not the outer's.  If the CASE values were of different
data types, a crash might result; moreover such situations could be abused
to allow disclosure of portions of server memory.  To fix, teach
inline_function to check for "bare" CaseTestExpr nodes in the arguments of
a function to be inlined, and avoid inlining if there are any.

Heikki Linnakangas, Michael Paquier, Tom Lane

Report: https://github.com/greenplum-db/gpdb/pull/327
Report: <4DDCEEB8.50602@enterprisedb.com>
Security: CVE-2016-5423
2016-08-08 10:33:46 -04:00
Tom Lane 9492cf86e4 Fix assorted fallout from IS [NOT] NULL patch.
Commits 4452000f3 et al established semantics for NullTest.argisrow that
are a bit different from its initial conception: rather than being merely
a cache of whether we've determined the input to have composite type,
the flag now has the further meaning that we should apply field-by-field
testing as per the standard's definition of IS [NOT] NULL.  If argisrow
is false and yet the input has composite type, the construct instead has
the semantics of IS [NOT] DISTINCT FROM NULL.  Update the comments in
primnodes.h to clarify this, and fix ruleutils.c and deparse.c to print
such cases correctly.  In the case of ruleutils.c, this merely results in
cosmetic changes in EXPLAIN output, since the case can't currently arise
in stored rules.  However, it represents a live bug for deparse.c, which
would formerly have sent a remote query that had semantics different
from the local behavior.  (From the user's standpoint, this means that
testing a remote nested-composite column for null-ness could have had
unexpected recursive behavior much like that fixed in 4452000f3.)

In a related but somewhat independent fix, make plancat.c set argisrow
to false in all NullTest expressions constructed to represent "attnotnull"
constructs.  Since attnotnull is actually enforced as a simple null-value
check, this is a more accurate representation of the semantics; we were
previously overpromising what it meant for composite columns, which might
possibly lead to incorrect planner optimizations.  (It seems that what the
SQL spec expects a NOT NULL constraint to mean is an IS NOT NULL test, so
arguably we are violating the spec and should fix attnotnull to do the
other thing.  If we ever do, this part should get reverted.)

Back-patch, same as the previous commit.

Discussion: <10682.1469566308@sss.pgh.pa.us>
2016-07-28 16:09:15 -04:00
Tom Lane 69995c3b3f Fix cost_rescan() to account for multi-batch hashing correctly.
cost_rescan assumed that we don't need to rebuild the hash table when
rescanning a hash join.  However, that's currently only true for
single-batch joins; for a multi-batch join we must charge full freight.

This probably has escaped notice because we'd be unlikely to put a hash
join on the inside of a nestloop anyway.  Nonetheless, it's wrong.
Fix in HEAD, but don't backpatch for fear of destabilizing plans in
stable releases.
2016-07-27 17:45:05 -04:00
Tom Lane 4452000f31 Fix constant-folding of ROW(...) IS [NOT] NULL with composite fields.
The SQL standard appears to specify that IS [NOT] NULL's tests of field
nullness are non-recursive, ie, we shouldn't consider that a composite
field with value ROW(NULL,NULL) is null for this purpose.
ExecEvalNullTest got this right, but eval_const_expressions did not,
leading to weird inconsistencies depending on whether the expression
was such that the planner could apply constant folding.

Also, adjust the docs to mention that IS [NOT] DISTINCT FROM NULL can be
used as a substitute test if a simple null check is wanted for a rowtype
argument.  That motivated reordering things so that IS [NOT] DISTINCT FROM
is described before IS [NOT] NULL.  In HEAD, I went a bit further and added
a table showing all the comparison-related predicates.

Per bug #14235.  Back-patch to all supported branches, since it's certainly
undesirable that constant-folding should change the semantics.

Report and patch by Andrew Gierth; assorted wordsmithing and revised
regression test cases by me.

Report: <20160708024746.1410.57282@wrigleys.postgresql.org>
2016-07-26 15:25:02 -04:00
Tom Lane 6d85bb1ba7 Correctly set up aggregate FILTER expression in partial-aggregation plans.
The aggfilter expression should be removed from the parent (combining)
Aggref, since it's not supposed to apply the filter, and indeed cannot
because any Vars used in the filter would not be available after the
lower-level aggregation step.  Per report from Jeff Janes.

(This has been broken since the introduction of partial aggregation,
I think.  The error became obvious after commit 59a3795c2, when setrefs.c
began processing the parent Aggref's fields normally and thus would detect
such Vars.  The special-case coding previously used in setrefs.c skipped
over the parent's aggfilter field without processing it.  That was broken
in its own way because no other setrefs.c processing got applied either;
though since the executor would not execute the filter expression, only
initialize it, that oversight might not have had any visible symptoms at
present.)

Report: <CAMkU=1xfuPf2edAe4ZGXTmJpU7jxuKukKyvNtEXwu35B7dvejg@mail.gmail.com>
2016-07-23 20:16:48 -04:00
Tom Lane 45639a0525 Avoid invalidating all foreign-join cached plans when user mappings change.
We must not push down a foreign join when the foreign tables involved
should be accessed under different user mappings.  Previously we tried
to enforce that rule literally during planning, but that meant that the
resulting plans were dependent on the current contents of the
pg_user_mapping catalog, and we had to blow away all cached plans
containing any remote join when anything at all changed in pg_user_mapping.
This could have been improved somewhat, but the fact that a syscache inval
callback has very limited info about what changed made it hard to do better
within that design.  Instead, let's change the planner to not consider user
mappings per se, but to allow a foreign join if both RTEs have the same
checkAsUser value.  If they do, then they necessarily will use the same
user mapping at runtime, and we don't need to know specifically which one
that is.  Post-plan-time changes in pg_user_mapping no longer require any
plan invalidation.

This rule does give up some optimization ability, to wit where two foreign
table references come from views with different owners or one's from a view
and one's directly in the query, but nonetheless the same user mapping
would have applied.  We'll sacrifice the first case, but to not regress
more than we have to in the second case, allow a foreign join involving
both zero and nonzero checkAsUser values if the nonzero one is the same as
the prevailing effective userID.  In that case, mark the plan as only
runnable by that userID.

The plancache code already had a notion of plans being userID-specific,
in order to support RLS.  It was a little confused though, in particular
lacking clarity of thought as to whether it was the rewritten query or just
the finished plan that's dependent on the userID.  Rearrange that code so
that it's clearer what depends on which, and so that the same logic applies
to both RLS-injected role dependency and foreign-join-injected role
dependency.

Note that this patch doesn't remove the other issue mentioned in the
original complaint, which is that while we'll reliably stop using a foreign
join if it's disallowed in a new context, we might fail to start using a
foreign join if it's now allowed, but we previously created a generic
cached plan that didn't use one.  It was agreed that the chance of winning
that way was not high enough to justify the much larger number of plan
invalidations that would have to occur if we tried to cause it to happen.

In passing, clean up randomly-varying spelling of EXPLAIN commands in
postgres_fdw.sql, and fix a COSTS ON example that had been allowed to
leak into the committed tests.

This reverts most of commits fbe5a3fb7 and 5d4171d1c, which were the
previous attempt at ensuring we wouldn't push down foreign joins that
span permissions contexts.

Etsuro Fujita and Tom Lane

Discussion: <d49c1e5b-f059-20f4-c132-e9752ee0113e@lab.ntt.co.jp>
2016-07-15 17:23:02 -04:00
Peter Eisentraut 63cfdb8dde Adjust spellings of forms of "cancel" 2016-07-14 22:48:26 -04:00
Tom Lane cec5501394 Add a regression test case to improve code coverage for tuplesort.
Test the external-sort code path in CLUSTER for two different scenarios:
multiple-pass external sorting, and the best case for replacement
selection, where only one run is produced, so that no merge is required.
This test would have caught the bug fixed in commit 1b0fc8507, at
least when run with valgrind enabled.

In passing, add a short-circuit test in plan_cluster_use_sort() to make
dead certain that it selects sorting when enable_indexscan is off.  As
things stand, that would happen anyway, but it seems like good future
proofing for this test.

Peter Geoghegan

Discussion: <CAM3SWZSgxehDkDMq1FdiW2A0Dxc79wH0hz1x-TnGy=1BXEL+nw@mail.gmail.com>
2016-07-13 15:23:56 -04:00
Tom Lane 29a2195de6 Typo fix. 2016-07-03 18:43:43 -04:00
Tom Lane 110a6dbdeb Allow RTE_SUBQUERY rels to be considered parallel-safe.
There isn't really any reason not to; the original comments here were
partly confused about subplans versus subquery-in-FROM, and partly
dependent on restrictions that no longer apply now that subqueries return
Paths not Plans.  Depending on what's inside the subquery, it might fail
to produce any parallel_safe Paths, but that's fine.

Tom Lane and Robert Haas
2016-07-03 18:24:49 -04:00
Tom Lane 4ea9948e58 Fix up parallel-safety marking for appendrels.
The previous coding assumed that the value derived by
set_rel_consider_parallel() for an appendrel parent would be accurate for
all the appendrel's children; but this is not so, for example because one
child might scan a temp table.  Instead, apply set_rel_consider_parallel()
to each child rel as well as the parent, and then take the AND of the
results as controlling parallel safety for the appendrel as a whole.

(We might someday be able to deal more intelligently than this with cases
in which some of the childrels are parallel-safe and others not, but that's
for later.)

Robert Haas and Tom Lane
2016-07-03 17:57:28 -04:00
Tom Lane 2c6e6471af Allow treating TABLESAMPLE scans as parallel-safe.
This was the intention all along, but an extraneous "return;" in
set_rel_consider_parallel() caused sampled rels to never be marked
consider_parallel.

Since we don't have any partial tablesample path/plan type yet, there's
no possibility of parallelizing the sample scan itself; but this fix
allows such a scan to appear below a parallel join, for example.
2016-07-03 16:55:27 -04:00
Tom Lane 0e495c5e2f Set correct cost data in Gather node added by force_parallel_mode.
We were just leaving the cost fields zeroes, which produces obviously bogus
output with force_parallel_mode = on.  With force_parallel_mode = regress,
the zeroes are hidden, but I wonder if they wouldn't still confuse add-on
code such as auto_explain.
2016-07-03 15:35:29 -04:00
Tom Lane c89d507649 Round rowcount estimate for a partial path to an integer.
I'd been wondering why I was sometimes seeing fractional rowcount
estimates in parallel-query situations, and this seems to be the
reason.  (You won't see the fractional parts in EXPLAIN, because it
prints rowcounts with %.0f, but they are apparent in the debugger.)
A fractional rowcount is not any saner for a partial path than any
other kind of path, and it's equally likely to break cost estimation
for higher paths, so apply clamp_row_est() like we do in other places.
2016-07-03 14:53:46 -04:00
Tom Lane 420c166163 Fix failure to mark all aggregates with appropriate transtype.
In commit 915b703e1 I gave get_agg_clause_costs() the responsibility of
marking Aggref nodes with the appropriate aggtranstype.  I failed to notice
that where it was being called from, it might see only a subset of the
Aggref nodes that were in the original targetlist.  Specifically, if there
are duplicate aggregate calls in the tlist, either make_sort_input_target
or make_window_input_target might put just a single instance into the
grouping_target, and then only that instance would get marked.  Fix by
moving the call back into grouping_planner(), before we start building
assorted PathTargets from the query tlist.  Per report from Stefan Huehner.

Report: <20160702131056.GD3165@huehner.biz>
2016-07-02 13:23:12 -04:00
Tom Lane 7b67a0a49c Fix some interrelated planner issues with initPlans and Param munging.
In commit 68fa28f77 I tried to teach SS_finalize_plan() to cope with
initPlans attached anywhere in the plan tree, by dint of moving its
handling of those into the recursion in finalize_plan().  It turns out that
that doesn't really work: if a lower-level plan node emits an initPlan
output parameter in its targetlist, it's legitimate for upper levels to
reference those Params --- and at the point where this code runs, those
references look just like the Param itself, so finalize_plan() quite
properly rejects them as being in the wrong place.  We could lobotomize
the checks enough to allow that, probably, but then it's not clear that
we'd have any meaningful check for misplaced Params at all.  What seems
better, at least in the near term, is to tweak standard_planner() a bit
so that initPlans are never placed anywhere but the topmost plan node
for a query level, restoring the behavior that occurred pre-9.6.  Possibly
we can do better if this code is ever merged into setrefs.c: then it would
be possible to check a Param's placement only when we'd failed to replace
it with a Var referencing a child plan node's targetlist.

BTW, I'm now suspicious that finalize_plan is doing the wrong thing by
returning the node's allParam rather than extParam to be incorporated
in the parent node's set of used parameters.  However, it makes no
difference given that initPlans only appear at top level, so I'll leave
that alone for now.

Another thing that emerged from this is that standard_planner() needs
to check for initPlans before deciding that it's safe to stick a Gather
node on top in force_parallel_mode mode.  We previously guarded against
that by deciding the plan wasn't wholePlanParallelSafe if any subplans
had been found, but after commit 5ce5e4a12 it's necessary to have this
substitute test, because path parallel_safe markings don't account for
initPlans.  (Normally, we'd have decided the paths weren't safe anyway
due to appearances of SubPlan nodes, Params, or CTE scans somewhere in
the tree --- but it's possible for those all to be optimized away while
initPlans still remain.)

Per fuzz testing by Andreas Seltenreich.

Report: <874m89rw7x.fsf@credativ.de>
2016-07-01 20:06:05 -04:00
Tom Lane 9e703987a8 Rethink the GetForeignUpperPaths API (again).
In the previous design, the GetForeignUpperPaths FDW callback hook was
called before we got around to labeling upper relations with the proper
consider_parallel flag; this meant that any upper paths created by an FDW
would be marked not-parallel-safe.  While that's probably just as well
right now, we aren't going to want it to be true forever.  Hence, abandon
the idea that FDWs should be allowed to inject upper paths before the core
code has gotten around to creating the relevant upper relation.  (Well,
actually they still can, but it's on their own heads how well it works.)
Instead, adopt the same API already designed for create_upper_paths_hook:
we call GetForeignUpperPaths after each upperrel has been created and
populated with the paths the core planner knows how to make.
2016-07-01 13:12:34 -04:00
Robert Haas 5ce5e4a12e Set consider_parallel correctly for upper planner rels.
Commit 3fc6e2d7f5 introduced new "upper"
RelOptInfo structures but didn't set consider_parallel for them
correctly, a point I completely missed when reviewing it.  Later,
commit e06a38965b made the situation
worse by doing it incorrectly for the grouping relation.  Try to
straighten all of that out.  Along the way, get rid of the annoying
wholePlanParallelSafe flag, which was only necessarily because of
the fact that upper planning stages didn't use paths at the time
that code was written.

The most important immediate impact of these changes is that
force_parallel_mode will provide useful test coverage in quite a few
more scenarios than it did previously, but it's also necessary
preparation for fixing some problems related to subqueries.

Patch by me, reviewed by Tom Lane.
2016-07-01 11:52:56 -04:00
Tom Lane 3154e16737 Dodge compiler bug in Visual Studio 2013.
VS2013 apparently has a problem with taking the address of a formal
parameter in some cases.  We do that elsewhere without trouble, but
in this case the address is being passed to a subroutine that will
probably get inlined, so maybe the combination of those things is
what tickles the bug.  Anyway, introducing an extra copy of the
parameter value is enough to work around it.  Per trouble report
from Umair Shahid.

Report: <CAM184AcjqKYZSdQqBHDrnENXHhW=mXbUC46QYPJ=nAh0gUHCGA@mail.gmail.com>
2016-06-29 19:07:19 -04:00
Tom Lane b32e63506c Fix match_foreign_keys_to_quals for FKs linking to unused rtable entries.
Since get_relation_foreign_keys doesn't try to determine whether RTEs
are actually part of the query semantics, it might make FK info records
linking to RTEs that won't have a RelOptInfo at all.  Cope with that.
Per bug #14219 from Andrew Gierth.

Report: <20160629183338.1397.43514@wrigleys.postgresql.org>
2016-06-29 16:02:08 -04:00
Tom Lane c12f02ffc9 Don't apply sortgroupref labels to a tlist that might not match.
If we need to use a gating Result node for pseudoconstant quals,
create_scan_plan() intentionally suppresses use_physical_tlist's checks
on whether there are matches for sortgroupref labels, on the grounds that
we don't need matches because we can label the Result's projection output
properly.  However, it then called apply_pathtarget_labeling_to_tlist
anyway.  This oversight was harmless when written, but in commit aeb9ae645
I made that function throw an error if there was no match.  Thus, the
combination of a table scan, pseudoconstant quals, and a non-simple-Var
sortgroupref column threw the dreaded "ORDER/GROUP BY expression not found
in targetlist" error.  To fix, just skip applying the labeling in this
case.  Per report from Rushabh Lathia.

Report: <CAGPqQf2iLB8t6t-XrL-zR233DFTXxEsfVZ4WSqaYfLupEoDxXA@mail.gmail.com>
2016-06-28 10:43:11 -04:00
Tom Lane f1993038a4 Avoid making a separate pass over the query to check for partializability.
It's rather silly to make a separate pass over the tlist + HAVING qual,
and a separate set of visits to the syscache, when get_agg_clause_costs
already has all the required information in hand.  This nets out as less
code as well as fewer cycles.
2016-06-26 15:55:01 -04:00
Tom Lane 19e972d558 Rethink node-level representation of partial-aggregation modes.
The original coding had three separate booleans representing partial
aggregation behavior, which was confusing, unreadable, and error-prone,
not least because the booleans weren't always listed in the same order.
It was also inadequate for the allegedly-desirable future extension to
support intermediate partial aggregation, because we'd need separate
markers for serialization and deserialization in such a case.

Merge these bools into an enum "AggSplit" to provide symbolic names for
the supported operating modes (and document what those are).  By assigning
the values of the enum constants carefully, we can treat AggSplit values
as options bitmasks so that tests of what to do aren't noticeably more
expensive than before.

While at it, get rid of Aggref.aggoutputtype.  That's not needed since
commit 59a3795c2 got rid of setrefs.c's special-purpose Aggref comparison
code, and it likewise seemed more confusing than helpful.

Assorted comment cleanup as well (there's still more that I want to do
in that line).

catversion bump for change in Aggref node contents.  Should be the last
one for partial-aggregation changes.

Discussion: <29309.1466699160@sss.pgh.pa.us>
2016-06-26 14:33:38 -04:00
Tom Lane 59a3795c25 Simplify planner's final setup of Aggrefs for partial aggregation.
Commit e06a38965's original coding for constructing the execution-time
expression tree for a combining aggregate was rather messy, involving
duplicating quite a lot of code in setrefs.c so that it could inject
a nonstandard matching rule for Aggrefs.  Get rid of that in favor of
explicitly constructing a combining Aggref with a partial Aggref as input,
then allowing setref's normal matching logic to match the partial Aggref
to the output of the lower plan node and hence replace it with a Var.

In passing, rename and redocument make_partialgroup_input_target to have
some connection to what it actually does.
2016-06-26 12:08:12 -04:00
Tom Lane f8ace5477e Fix type-safety problem with parallel aggregate serial/deserialization.
The original specification for this called for the deserialization function
to have signature "deserialize(serialtype) returns transtype", which is a
security violation if transtype is INTERNAL (which it always would be in
practice) and serialtype is not (which ditto).  The patch blithely overrode
the opr_sanity check for that, which was sloppy-enough work in itself,
but the indisputable reason this cannot be allowed to stand is that CREATE
FUNCTION will reject such a signature and thus it'd be impossible for
extensions to create parallelizable aggregates.

The minimum fix to make the signature type-safe is to add a second, dummy
argument of type INTERNAL.  But to lock it down a bit more and make misuse
of INTERNAL-accepting functions less likely, let's get rid of the ability
to specify a "serialtype" for an aggregate and just say that the only
useful serialtype is BYTEA --- which, in practice, is the only interesting
value anyway, due to the usefulness of the send/recv infrastructure for
this purpose.  That means we only have to allow "serialize(internal)
returns bytea" and "deserialize(bytea, internal) returns internal" as
the signatures for these support functions.

In passing fix bogus signature of int4_avg_combine, which I found thanks
to adding an opr_sanity check on combinefunc signatures.

catversion bump due to removing pg_aggregate.aggserialtype and adjusting
signatures of assorted built-in functions.

David Rowley and Tom Lane

Discussion: <27247.1466185504@sss.pgh.pa.us>
2016-06-22 16:52:41 -04:00
Tom Lane 8b9d323cb9 Refactor planning of projection steps that don't need a Result plan node.
The original upper-planner-pathification design (commit 3fc6e2d7f5)
assumed that we could always determine during Path formation whether or not
we would need a Result plan node to perform projection of a targetlist.
That turns out not to work very well, though, because createplan.c still
has some responsibilities for choosing the specific target list associated
with sorting/grouping nodes (in particular it might choose to add resjunk
columns for sorting).  We might not ever refactor that --- doing so would
push more work into Path formation, which isn't attractive --- and we
certainly won't do so for 9.6.  So, while create_projection_path and
apply_projection_to_path can tell for sure what will happen if the subpath
is projection-capable, they can't tell for sure when it isn't.  This is at
least a latent bug in apply_projection_to_path, which might think it can
apply a target to a non-projecting node when the node will end up computing
something different.

Also, I'd tied the creation of a ProjectionPath node to whether or not a
Result is needed, but it turns out that we sometimes need a ProjectionPath
node anyway to avoid modifying a possibly-shared subpath node.  Callers had
to use create_projection_path for such cases, and we added code to them
that knew about the potential omission of a Result node and attempted to
adjust the cost estimates for that.  That was uncertainly correct and
definitely ugly/unmaintainable.

To fix, have create_projection_path explicitly check whether a Result
is needed and adjust its cost estimate accordingly, though it creates
a ProjectionPath in either case.  apply_projection_to_path is now mostly
just an optimized version that can avoid creating an extra Path node when
the input is known to not be shared with any other live path.  (There
is one case that create_projection_path doesn't handle, which is pushing
parallel-safe expressions below a Gather node.  We could make it do that
by duplicating the GatherPath, but there seems no need as yet.)

create_projection_plan still has to recheck the tlist-match condition,
which means that if the matching situation does get changed by createplan.c
then we'll have made a slightly incorrect cost estimate.  But there seems
no help for that in the near term, and I doubt it occurs often enough,
let alone would change planning decisions often enough, to be worth
stressing about.

I added a "dummypp" field to ProjectionPath to track whether
create_projection_path thinks a Result is needed.  This is not really
necessary as-committed because create_projection_plan doesn't look at the
flag; but it seems like a good idea to remember what we thought when
forming the cost estimate, if only for debugging purposes.

In passing, get rid of the target_parallel parameter added to
apply_projection_to_path by commit 54f5c5150.  I don't think that's a good
idea because it involves callers in what should be an internal decision,
and opens us up to missing optimization opportunities if callers think they
don't need to provide a valid flag, as most don't.  For the moment, this
just costs us an extra has_parallel_hazard call when planning a Gather.
If that starts to look expensive, I think a better solution would be to
teach PathTarget to carry/cache knowledge of parallel-safety of its
contents.
2016-06-21 18:38:20 -04:00
Tom Lane 100340e2dc Restore foreign-key-aware estimation of join relation sizes.
This patch provides a new implementation of the logic added by commit
137805f89 and later removed by 77ba61080.  It differs from the original
primarily in expending much less effort per joinrel in large queries,
which it accomplishes by doing most of the matching work once per query not
once per joinrel.  Hopefully, it's also less buggy and better commented.
The never-documented enable_fkey_estimates GUC remains gone.

There remains work to be done to make the selectivity estimates account
for nulls in FK referencing columns; but that was true of the original
patch as well.  We may be able to address this point later in beta.
In the meantime, any error should be in the direction of overestimating
rather than underestimating joinrel sizes, which seems like the direction
we want to err in.

Tomas Vondra and Tom Lane

Discussion: <31041.1465069446@sss.pgh.pa.us>
2016-06-18 15:22:34 -04:00
Tom Lane 598aa194af Still another try at fixing scanjoin_target insertion into parallel plans.
The previous code neglected the fact that the scanjoin_target might
carry sortgroupref labelings that we need to absorb.  Instead, do
create_projection_path() unconditionally, and tweak the path's cost
estimate after the fact.  (I'm now convinced that we ought to refactor
the way we account for sometimes not needing a separate projection step,
but right now is not the time for that sort of cleanup.)

Problem identified by Amit Kapila, patch by me.
2016-06-18 00:28:51 -04:00
Tom Lane 915b703e16 Fix handling of argument and result datatypes for partial aggregation.
When doing partial aggregation, the args list of the upper (combining)
Aggref node is replaced by a Var representing the output of the partial
aggregation steps, which has either the aggregate's transition data type
or a serialized representation of that.  However, nodeAgg.c blindly
continued to use the args list as an indication of the user-level argument
types.  This broke resolution of polymorphic transition datatypes at
executor startup (though it accidentally failed to fail for the ANYARRAY
case, which is likely the only one anyone had tested).  Moreover, the
constructed FuncExpr passed to the finalfunc contained completely wrong
information, which would have led to bogus answers or crashes for any case
where the finalfunc examined that information (which is only likely to be
with polymorphic aggregates using a non-polymorphic transition type).

As an independent bug, apply_partialaggref_adjustment neglected to resolve
a polymorphic transition datatype before assigning it as the output type
of the lower-level Aggref node.  This again accidentally failed to fail
for ANYARRAY but would be unlikely to work in other cases.

To fix the first problem, record the user-level argument types in a
separate OID-list field of Aggref, and look to that rather than the args
list when asking what the argument types were.  (It turns out to be
convenient to include any "direct" arguments in this list too, although
those are not currently subject to being overwritten.)

Rather than adding yet another resolve_aggregate_transtype() call to fix
the second problem, add an aggtranstype field to Aggref, and store the
resolved transition type OID there when the planner first computes it.
(By doing this in the planner and not the parser, we can allow the
aggregate's transition type to change from time to time, although no DDL
support yet exists for that.)  This saves nothing of consequence for
simple non-polymorphic aggregates, but for polymorphic transition types
we save a catalog lookup during executor startup as well as several
planner lookups that are new in 9.6 due to parallel query planning.

In passing, fix an error that was introduced into count_agg_clauses_walker
some time ago: it was applying exprTypmod() to something that wasn't an
expression node at all, but a TargetEntry.  exprTypmod silently returned
-1 so that there was not an obvious failure, but this broke the intended
sensitivity of aggregate space consumption estimates to the typmod of
varchar and similar data types.  This part needs to be back-patched.

Catversion bump due to change of stored Aggref nodes.

Discussion: <8229.1466109074@sss.pgh.pa.us>
2016-06-17 21:44:37 -04:00
Robert Haas 54f5c5150f Try again to fix the way the scanjoin_target is used with partial paths.
Commit 04ae11f62e removed some broken
code to apply the scan/join target to partial paths, but its theory
that this processing step is totally unnecessary turns out to be wrong.
Put similar code back again, but this time, check for parallel-safety
and avoid in-place modifications to paths that may already have been
used as part of some other path.

(This is not an entirely elegant solution to this problem; it might
be better, for example, to postpone generate_gather_paths for the
topmost scan/join rel until after the scan/join target has been
applied.  But this is not the time for such redesign work.)

Amit Kapila and Robert Haas
2016-06-17 16:29:07 -04:00
Tom Lane 75be66464c Invent min_parallel_relation_size GUC to replace a hard-wired constant.
The main point of doing this is to allow the cutoff to be set very small,
even zero, to allow parallel-query behavior to be tested on relatively
small tables such as we typically use in the regression tests.  But it
might be of use to users too.  The number-of-workers scaling behavior in
create_plain_partial_paths() is pretty ad-hoc and subject to change, so
we won't expose anything about that, but the notion of not considering
parallel query at all for tables below size X seems reasonably stable.

Amit Kapila, per a suggestion from me

Discussion: <17170.1465830165@sss.pgh.pa.us>
2016-06-16 13:47:20 -04:00
Tom Lane 89d53515e5 In planner.c, avoid assuming that all PathTargets have sortgrouprefs.
The struct definition for PathTarget specifies that a NULL sortgrouprefs
pointer means no sortgroupref labels.  While it's likely that there
should always be at least one labeled column in the places that were
unconditionally fetching through the pointer, it seems wiser to adhere to
the data structure specification and test first.  Add a macro to make this
convenient.  Per experimentation with running the regression tests with a
very small parallelization threshold --- the crash I observed may well
represent a bug elsewhere, but still this coding was not very robust.

Report: <20756.1465834072@sss.pgh.pa.us>
2016-06-13 12:59:25 -04:00
Tom Lane 3303ea1a32 Remove reltarget_has_non_vars flag.
Commit b12fd41c6 added a "reltarget_has_non_vars" field to RelOptInfo,
but failed to maintain it accurately.  Since its only purpose was to skip
calls to has_parallel_hazard() in the simple case where a rel's targetlist
is all Vars, and that call is really pretty cheap in that case anyway, it
seems like this is just a case of premature optimization.  Let's drop the
flag and do the calls unconditionally until it's proven that we need more
smarts here.
2016-06-10 16:20:03 -04:00
Tom Lane 2f153ddfdd Refactor to reduce code duplication for function property checking.
As noted by Andres Freund, we'd accumulated quite a few similar functions
in clauses.c that examine all functions in an expression tree to see if
they satisfy some boolean test.  Reduce the duplication by inventing a
function check_functions_in_node() that applies a simple callback function
to each SQL function OID appearing in a given expression node.  This also
fixes some arguable oversights; for example, contain_mutable_functions()
did not check aggregate or window functions for mutability.  I doubt that
that represents a live bug at the moment, because we don't really consider
mutability for aggregates; but it might someday be one.

I chose to put check_functions_in_node() in nodeFuncs.c because it seemed
like other modules might wish to use it in future.  That in turn forced
moving set_opfuncid() et al into nodeFuncs.c, as the alternative was for
nodeFuncs.c to depend on optimizer/setrefs.c which didn't seem very clean.

In passing, teach contain_leaked_vars_walker() about a few more expression
node types it can safely look through, and improve the rather messy and
undercommented code in has_parallel_hazard_walker().

Discussion: <20160527185853.ziol2os2zskahl7v@alap3.anarazel.de>
2016-06-10 16:03:46 -04:00
Tom Lane cae1c788b9 Improve the situation for parallel query versus temp relations.
Transmit the leader's temp-namespace state to workers.  This is important
because without it, the workers do not really have the same search path
as the leader.  For example, there is no good reason (and no extant code
either) to prevent a worker from executing a temp function that the
leader created previously; but as things stood it would fail to find the
temp function, and then either fail or execute the wrong function entirely.

We still prohibit a worker from creating a temp namespace on its own.
In effect, a worker can only see the session's temp namespace if the leader
had created it before starting the worker, which seems like the right
semantics.

Also, transmit the leader's BackendId to workers, and arrange for workers
to use that when determining the physical file path of a temp relation
belonging to their session.  While the original intent was to prevent such
accesses entirely, there were a number of holes in that, notably in places
like dbsize.c which assume they can safely access temp rels of other
sessions anyway.  We might as well get this right, as a small down payment
on someday allowing workers to access the leader's temp tables.  (With
this change, directly using "MyBackendId" as a relation or buffer backend
ID is deprecated; you should use BackendIdForTempRelations() instead.
I left a couple of such uses alone though, as they're not going to be
reachable in parallel workers until we do something about localbuf.c.)

Move the thou-shalt-not-access-thy-leader's-temp-tables prohibition down
into localbuf.c, which is where it actually matters, instead of having it
in relation_open().  This amounts to recognizing that access to temp
tables' catalog entries is perfectly safe in a worker, it's only the data
in local buffers that is problematic.

Having done all that, we can get rid of the test in has_parallel_hazard()
that says that use of a temp table's rowtype is unsafe in parallel workers.
That test was unduly expensive, and if we really did need such a
prohibition, that was not even close to being a bulletproof guard for it.
(For example, any user-defined function executed in a parallel worker
might have attempted such access.)
2016-06-09 20:16:11 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Robert Haas b12fd41c69 Don't generate parallel paths for rels with parallel-restricted outputs.
Such paths are unsafe.  To make it cheaper to detect when this case
applies, track whether a relation's default PathTarget contains any
non-Vars.  In most cases, the answer will be no, which enables us to
determine cheaply that the target list for a proposed path is
parallel-safe.  However, subquery pull-up can create cases that
require us to inspect the target list more carefully.

Amit Kapila, reviewed by me.
2016-06-09 12:43:36 -04:00
Tom Lane e4158319f3 Mop-up for parallel degree-ectomy.
Fix a couple of overlooked uses of "degree" terminology.  Make the parallel
worker count selection logic in create_plain_partial_paths more robust (in
particular, it failed with max_parallel_workers_per_gather set to zero).
2016-06-09 11:16:26 -04:00
Robert Haas c9ce4a1c61 Eliminate "parallel degree" terminology.
This terminology provoked widespread complaints.  So, instead, rename
the GUC max_parallel_degree to max_parallel_workers_per_gather
(leaving room for a possible future GUC max_parallel_workers that acts
as a system-wide limit), and rename the parallel_degree reloption to
parallel_workers.  Rename structure members to match.

These changes create a dump/restore hazard for users of PostgreSQL
9.6beta1 who have set the reloption (or applied the GUC using ALTER
USER or ALTER DATABASE).
2016-06-09 10:00:26 -04:00
Tom Lane 77ba610805 Revert "Use Foreign Key relationships to infer multi-column join selectivity".
This commit reverts 137805f89 as well as the associated commits 015e88942,
5306df283, and 68d704edb.  We found multiple bugs in this feature, and
there was concern about possible planner slowdown (though to be fair,
exhibiting a very large slowdown proved difficult).  The way forward
requires a considerable rewrite, which may or may not be possible to
accomplish in time for beta2.  In my judgment reviewing the rewrite will
be easier to accomplish starting from a clean slate, so let's temporarily
revert what's there now.  This also leaves us in a safe state if it turns
out to be necessary to postpone the rewrite to the next development cycle.

Discussion: <20160429102531.GA13701@huehner.biz>
2016-06-07 17:21:17 -04:00
Robert Haas 04ae11f62e Remove bogus code to apply PathTargets to partial paths.
The partial paths that get modified may already have been used as
part of a GatherPath which appears in the path list, so modifying
them is not a good idea at this stage - especially because this
code has no check that the PathTarget is in fact parallel-safe.

When partial aggregation is being performed, this is actually
harmless because we'll end up replacing the pathtargets here with
the correct ones within create_grouping_paths().  But if we've got
a query tree containing only scan/join operations then this can
result in incorrectly pushing down parallel-restricted target
list entries.  If those are, for example, references to subqueries,
that can crash the server; but it's wrong in any event.

Amit Kapila
2016-06-03 14:27:33 -04:00
Greg Stark e1623c3959 Fix various common mispellings.
Mostly these are just comments but there are a few in documentation
and a handful in code and tests. Hopefully this doesn't cause too much
unnecessary pain for backpatching. I relented from some of the most
common like "thru" for that reason. The rest don't seem numerous
enough to cause problems.

Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
2016-06-03 16:08:45 +01:00
Tom Lane aeb9ae6457 Disable physical tlist if any Var would need multiple sortgroupref labels.
As part of upper planner pathification (commit 3fc6e2d7f5) I redid
createplan.c's approach to the physical-tlist optimization, in which scan
nodes are allowed to return exactly the underlying table's columns so as
to save doing a projection step at runtime.  The logic was intentionally
more aggressive than before about applying the optimization, which is
generally a good thing, but Andres Freund found a case in which it got
too aggressive.  Namely, if any column is referenced more than once in
the parent plan node's sorting or grouping column list, we can't optimize
because then that column would need to have more than one ressortgroupref
label, and we only have space for one.

Add logic to detect this situation in use_physical_tlist(), and also add
some error checking in apply_pathtarget_labeling_to_tlist(), which this
example proves was being overly cavalier about whether what it was doing
made any sense.

The added test case exposes the problem only because we do not eliminate
duplicate grouping keys.  That might be something to fix someday, but it
doesn't seem like appropriate post-beta work.

Report: <20160526021235.w4nq7k3gnheg7vit@alap3.anarazel.de>
2016-05-26 14:52:30 -04:00
Tom Lane 8a13d5e6d1 Fix infer_arbiter_indexes() to not barf on system columns.
While it could be argued that rejecting system column mentions in the
ON CONFLICT list is an unsupported feature, falling over altogether
just because the table has a unique index on OID is indubitably a bug.

As far as I can tell, fixing infer_arbiter_indexes() is sufficient to
make ON CONFLICT (oid) actually work, though making a regression test
for that case is problematic because of the impossibility of setting
the OID counter to a known value.

Minor cosmetic cleanups along with the bug fix.
2016-05-11 17:06:53 -04:00
Tom Lane 26e66184d6 Fix assorted missing infrastructure for ON CONFLICT.
subquery_planner() failed to apply expression preprocessing to the
arbiterElems and arbiterWhere fields of an OnConflictExpr.  No doubt the
theory was that this wasn't necessary because we don't actually try to
execute those expressions; but that's wrong, because it results in failure
to match to index expressions or index predicates that are changed at all
by preprocessing.  Per bug #14132 from Reynold Smith.

Also add pullup_replace_vars processing for onConflictWhere.  Perhaps
it's impossible to have a subquery reference there, but I'm not exactly
convinced; and even if true today it's a failure waiting to happen.

Also add some comments to other places where one or another field of
OnConflictExpr is intentionally ignored, with explanation as to why it's
okay to do so.

Also, catalog/dependency.c failed to record any dependency on the named
constraint in ON CONFLICT ON CONSTRAINT, allowing such a constraint to
be dropped while rules exist that depend on it, and allowing pg_dump to
dump such a rule before the constraint it refers to.  The normal execution
path managed to error out reasonably for a dangling constraint reference,
but ruleutils.c dumped core; so in addition to fixing the omission, add
a protective check in ruleutils.c, since we can't retroactively add a
dependency in existing databases.

Back-patch to 9.5 where this code was introduced.

Report: <20160510190350.2608.48667@wrigleys.postgresql.org>
2016-05-11 16:20:23 -04:00
Robert Haas 68d704edbf Minimal fix for crash bug in quals_match_foreign_key.
Discussion is still underway as to whether to revert the entire patch
that added this function, but that discussion may not conclude before
beta1.  So, in the meantime, let's do at least this much.

David Rowley
2016-05-06 15:00:55 -04:00
Tom Lane 2a2435e699 Small improvements to OPTIMIZER_DEBUG code.
Now that Paths have their own rows field, print that rather than
the parent relation's rowcount.

Show the relid sets associated with Paths using table names rather
than numbers; since this code is able to print simple Var references
using table names, it seems a bit silly that print_relids can't.

Print the cheapest_parameterized_paths list for a RelOptInfo, and
include information about a parameterized path's required_outer rels.

Noted while trying to use this feature to debug Alexander Kirkouski's
recent bug report.
2016-04-30 14:08:00 -04:00
Tom Lane c45bf5751b Fix planner crash from pfree'ing a partial path that a GatherPath uses.
We mustn't run generate_gather_paths() during add_paths_to_joinrel(),
because that function can be invoked multiple times for the same target
joinrel.  Not only is it wasteful to build GatherPaths repeatedly, but
a later add_partial_path() could delete the partial path that a previously
created GatherPath depends on.  Instead establish the convention that we
do generate_gather_paths() for a rel only just before set_cheapest().

The code was accidentally not broken for baserels, because as of today there
never is more than one partial path for a baserel.  But that assumption
obviously has a pretty short half-life, so move the generate_gather_paths()
calls for those cases as well.

Also add some generic comments explaining how and why this all works.

Per fuzz testing by Andreas Seltenreich.

Report: <871t5pgwdt.fsf@credativ.de>
2016-04-30 12:29:21 -04:00
Tom Lane 207d5a656e Fix mishandling of equivalence-class tests in parameterized plans.
Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like

	Nested Loop
	  ->  Scan X
	  ->  Nested Loop
	    ->  Scan Y
	    ->  Scan Z
	          Filter: Z.Z = X.X

The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node.  On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members.  So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there.  If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.

This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape.  (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.)  Many
thanks to Alexander Kirkouski for submitting a reproducible test case.

The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.
2016-04-29 20:19:38 -04:00
Robert Haas 59eb551279 Fix EXPLAIN VERBOSE output for parallel aggregate.
The way that PartialAggregate and FinalizeAggregate plan nodes were
displaying output columns before was bogus.  Now, FinalizeAggregate
produces the same outputs as an Aggregate would have produced, while
PartialAggregate produces each of those outputs prefixed by the word
PARTIAL.

Discussion: 12585.1460737650@sss.pgh.pa.us

Patch by me, reviewed by David Rowley.
2016-04-27 07:37:40 -04:00
Robert Haas 77cd477c4b Enable parallel query by default.
Change max_parallel_degree default from 0 to 2.  It is possible that
this is not a good idea, or that we should go with 1 worker rather
than 2, but we won't find out without trying it.  Along the way,
reword the documentation for max_parallel_degree a little bit to
hopefully make it more clear.

Discussion: 20160420174631.3qjjhpwsvvx5bau5@alap3.anarazel.de
2016-04-26 08:35:58 -04:00
Tom Lane 80f66a9ad0 Fix planner failure with full join in RHS of left join.
Given a left join containing a full join in its righthand side, with
the left join's joinclause referencing only one side of the full join
(in a non-strict fashion, so that the full join doesn't get simplified),
the planner could fail with "failed to build any N-way joins" or related
errors.  This happened because the full join was seen as overlapping the
left join's RHS, and then recent changes within join_is_legal() caused
that function to conclude that the full join couldn't validly be formed.
Rather than try to rejigger join_is_legal() yet more to allow this,
I think it's better to fix initsplan.c so that the required join order
is explicit in the SpecialJoinInfo data structure.  The previous coding
there essentially ignored full joins, relying on the fact that we don't
flatten them in the joinlist data structure to preserve their ordering.
That's sufficient to prevent a wrong plan from being formed, but as this
example shows, it's not sufficient to ensure that the right plan will
be formed.  We need to work a bit harder to ensure that the right plan
looks sane according to the SpecialJoinInfos.

Per bug #14105 from Vojtech Rylko.  This was apparently induced by
commit 8703059c6 (though now that I've seen it, I wonder whether there
are related cases that could have failed before that); so back-patch
to all active branches.  Unfortunately, that patch also went into 9.0,
so this bug is a regression that won't be fixed in that branch.
2016-04-21 20:05:58 -04:00
Robert Haas 36f69faeff Comment improvements for ForeignPath.
It's not necessarily just scanning a base relation any more.

Amit Langote and Etsuro Fujita
2016-04-21 13:30:48 -04:00
Robert Haas 9c75e1a36b Forbid parallel Hash Right Join or Hash Full Join.
That won't work.  You'll get bogus null-extended rows.

Mithun Cy
2016-04-20 17:48:55 -04:00
Magnus Hagander ba8fe38f58 Fix typo in comment 2016-04-15 13:32:54 +02:00
Robert Haas deb71fa971 Fix costing for parallel aggregation.
The original patch kind of ignored the fact that we were doing something
different from a costing point of view, but nobody noticed.  This patch
fixes that oversight.

David Rowley
2016-04-12 16:25:55 -04:00
Tom Lane f1f01de145 Redefine create_upper_paths_hook as being invoked once per upper relation.
Per discussion, this gives potential users of the hook more flexibility,
because they can build custom Paths that implement only one stage of
upper processing atop core-provided Paths for earlier stages.
2016-04-12 15:23:14 -04:00
Tom Lane 5306df2831 Clean up foreign-key caching code in planner.
Coverity complained that the code added by 015e88942a lacked an
error check for SearchSysCache1 failures, which it should have.  But
the code was pretty duff in other ways too, including failure to think
about whether it could really cope with arrays of different lengths.
2016-04-10 23:47:30 -04:00
Teodor Sigaev 8b99edefca Revert CREATE INDEX ... INCLUDING ...
It's not ready yet, revert two commits
690c543550 - unstable test output
386e3d7609 - patch itself
2016-04-08 21:52:13 +03:00
Teodor Sigaev 386e3d7609 CREATE INDEX ... INCLUDING (column[, ...])
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.

Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
2016-04-08 19:45:59 +03:00
Robert Haas 25fe8b5f1a Add a 'parallel_degree' reloption.
The code that estimates what parallel degree should be uesd for the
scan of a relation is currently rather stupid, so add a parallel_degree
reloption that can be used to override the planner's rather limited
judgement.

Julien Rouhaud, reviewed by David Rowley, James Sewell, Amit Kapila,
and me.  Some further hacking by me.
2016-04-08 11:14:56 -04:00
Robert Haas 0711803775 Use quicksort, not replacement selection, for external sorting.
We still use replacement selection for the first run of the sort only
and only when the number of tuples is relatively small.  Otherwise,
the first run, and subsequent runs in all cases, are produced using
quicksort.  This tends to be faster except perhaps for very small
amounts of working memory.

Peter Geoghegan, reviewed by Tomas Vondra, Jeff Janes, Mithun Cy,
Greg Stark, and me.
2016-04-08 02:36:26 -04:00
Simon Riggs 137805f89a Use Foreign Key relationships to infer multi-column join selectivity
In cases where joins use multiple columns we currently assess each join
separately causing gross mis-estimates for join cardinality.

This patch adds use of FK information for the first time into the
planner. When FKs are present and we have multi-column join information,
plan estimates will be drastically improved. Cases with multiple FKs
are handled, though partial matches are ignored currently.

Net effect is substantial performance improvements for joins in many
common cases. Additional planning time is isolated to cases that are
currently performing poorly, measured at 0.08 - 0.15 ms.

Please watch for planner performance regressions; circumstances seem
unlikely but the law of unintended consequences may apply somewhen.
Additional complex tests welcome to prove this before release.

Tests can be performed using SET enable_fkey_estimates = on | off
using scripts provided during Hackers discussions, message id:
552335D9.3090707@2ndquadrant.com

Authors: Tomas Vondra and David Rowley
Reviewed and tested by Simon Riggs, adding comments only
2016-04-08 02:51:09 +01:00
Tom Lane f338dd7585 Refactor join_is_removable() to separate out distinctness-proving logic.
Extracted from pending unique-join patch, since this is a rather large
delta but it's simply moving code out into separately-accessible
subroutines.

I (tgl) did choose to add a bit more logic to rel_supports_distinctness,
so that it verifies that there's at least one potentially usable unique
index rather than just checking indexlist != NIL.  Otherwise there's
no functional change here.

David Rowley
2016-04-07 13:12:31 -04:00
Simon Riggs 015e88942a Load FK defs into relcache for use by planner
Fastpath ignores this if no triggers defined.

Author: Tomas Vondra, with fastpath and comments added by me
Reviewers: David Rowley, Simon Riggs
2016-04-07 12:08:33 +01:00
Tom Lane de94e2af18 Run pgindent on a batch of (mostly-planner-related) source files.
Getting annoyed at the amount of unrelated chatter I get from pgindent'ing
Rowley's unique-joins patch.  Re-indent all the files it touches.
2016-04-06 11:34:02 -04:00
Robert Haas 41ea0c2376 Fix parallel-safety code for parallel aggregation.
has_parallel_hazard() was ignoring the proparallel markings for
aggregates, which is no good.  Fix that.  There was no way to mark
an aggregate as actually being parallel-safe, either, so add a
PARALLEL option to CREATE AGGREGATE.

Patch by me, reviewed by David Rowley.
2016-04-05 16:06:15 -04:00
Tom Lane f9aefcb91f Support using index-only scans with partial indexes in more cases.
Previously, the planner would reject an index-only scan if any restriction
clause for its table used a column not available from the index, even
if that restriction clause would later be dropped from the plan entirely
because it's implied by the index's predicate.  This is a fairly common
situation for partial indexes because predicates using columns not included
in the index are often the most useful kind of predicate, and we have to
duplicate (or at least imply) the predicate in the WHERE clause in order
to get the index to be considered at all.  So index-only scans were
essentially unavailable with such partial indexes.

To fix, we have to do detection of implied-by-predicate clauses much
earlier in the planner.  This patch puts it in check_index_predicates
(nee check_partial_indexes), meaning it gets done for every partial index,
whereas we previously only considered this issue at createplan time,
so that the work was only done for an index actually selected for use.
That could result in a noticeable planning slowdown for queries against
tables with many partial indexes.  However, testing suggested that there
isn't really a significant cost, especially not with reasonable numbers
of partial indexes.  We do get a small additional benefit, which is that
cost_index is more accurate since it correctly discounts the evaluation
cost of clauses that will be removed.  We can also avoid considering such
clauses as potential indexquals, which saves useless matching cycles in
the case where the predicate columns aren't in the index, and prevents
generating bogus plans that double-count the clause's selectivity when
the columns are in the index.

Tomas Vondra and Kyotaro Horiguchi, reviewed by Kevin Grittner and
Konstantin Knizhnik, and whacked around a little by me
2016-03-31 14:49:10 -04:00
Robert Haas 5fe5a2cee9 Allow aggregate transition states to be serialized and deserialized.
This is necessary infrastructure for supporting parallel aggregation
for aggregates whose transition type is "internal".  Such values
can't be passed between cooperating processes, because they are
just pointers.

David Rowley, reviewed by Tomas Vondra and by me.
2016-03-29 15:04:05 -04:00
Robert Haas f9143d102f Rework custom scans to work more like the new extensible node stuff.
Per discussion, the new extensible node framework is thought to be
better designed than the custom path/scan/scanstate stuff we added
in PostgreSQL 9.5.  Rework the latter to be more like the former.

This is not backward-compatible, but we generally don't promise that
for C APIs, and there probably aren't many people using this yet
anyway.

KaiGai Kohei, reviewed by Petr Jelinek and me.  Some further
cosmetic changes by me.
2016-03-29 11:28:04 -04:00
Robert Haas 5d4171d1c7 Don't require a user mapping for FDWs to work.
Commit fbe5a3fb73 accidentally changed
this behavior; put things back the way they were, and add some
regression tests.

Report by Andres Freund; patch by Ashutosh Bapat, with a bit of
kibitzing by me.
2016-03-28 21:50:28 -04:00
Tom Lane 76281aa964 Avoid a couple of zero-divide scenarios in the planner.
cost_subplan() supposed that the given subplan must have plan_rows > 0,
which as far as I can tell was true until recent refactoring of the
code in createplan.c; but now that code allows the Result for a provably
empty subquery to have plan_rows = 0.  Rather than undo that change,
put in a clamp to prevent zero divide.

get_cheapest_fractional_path() likewise supposed that best_path->rows > 0.
This assumption has been wrong for longer.  It's actually harmless given
IEEE float math, because a positive value divided by zero gives +Infinity
and compare_fractional_path_costs() will do the right thing with that.
Still, best not to assume that.

final_cost_nestloop() also seems to have some risks in this area, so
borrow the clamping logic already present in the mergejoin cost functions.

Lastly, remove unnecessary clamp_row_est() in planner.c's calls to
get_number_of_groups().  The only thing that function does with path_rows
is pass it to estimate_num_groups() which already has an internal clamp,
so we don't need the extra call; and if we did, the callers are arguably
the wrong place for it anyway.

First two items reported by Piotr Stefaniak, the others are products
of my nosing around for similar problems.  No back-patch since there's
no evidence that problems arise in the back branches.
2016-03-26 12:03:12 -04:00
Tom Lane d543170f2f Don't split up SRFs when choosing to postpone SELECT output expressions.
In commit 9118d03a8c we taught the planner to postpone evaluation of
set-returning functions in a SELECT's targetlist until after any sort done
to satisfy ORDER BY.  However, if we postpone some SRFs this way while
others do not get postponed (because they're sort or group key columns)
we will break the traditional behavior by which all SRFs in the tlist run
in-step during ExecTargetList(), so that you get the least common multiple
of their periods not the product.  Fix make_sort_input_target() so it will
not split up SRF evaluation in such cases.

There is still a hazard of similar odd behavior if there's a SRF in a
grouping column and another one that isn't, but that was true before
and we're just trying to preserve bug-compatibility with the traditional
behavior.  This whole area is overdue to be rethought and reimplemented,
but we'll try to avoid changing behavior until then.

Per report from Regina Obe.
2016-03-25 11:19:51 -04:00
Robert Haas e06a38965b Support parallel aggregation.
Parallel workers can now partially aggregate the data and pass the
transition values back to the leader, which can combine the partial
results to produce the final answer.

David Rowley, based on earlier work by Haribabu Kommi.  Reviewed by
Álvaro Herrera, Tomas Vondra, Amit Kapila, James Sewell, and me.
2016-03-21 09:30:18 -04:00
Robert Haas 0bf3ae88af Directly modify foreign tables.
postgres_fdw can now sent an UPDATE or DELETE statement directly to
the foreign server in simple cases, rather than sending a SELECT FOR
UPDATE statement and then updating or deleting rows one-by-one.

Etsuro Fujita, reviewed by Rushabh Lathia, Shigeru Hanada, Kyotaro
Horiguchi, Albe Laurenz, Thom Brown, and me.
2016-03-18 13:55:52 -04:00
Robert Haas 992b5ba30d Push scan/join target list beneath Gather when possible.
This means that, for example, "SELECT expensive_func(a) FROM bigtab
WHERE something" can compute expensive_func(a) in the workers rather
than the leader if it happens to be parallel-safe, which figures to be
a big win in some practical cases.

Currently, we can only do this if the entire target list is
parallel-safe.  If we worked harder, we might be able to evaluate
parallel-safe targets in the worker and any parallel-restricted
targets in the leader, but that would be more complicated, and there
aren't that many parallel-restricted functions that people are likely
to use in queries anyway.  I think.  So just do the simple thing for
the moment.

Robert Haas, Amit Kapila, and Tom Lane
2016-03-18 09:50:05 -04:00
Robert Haas 0218e8b3fa Fix typos.
Jim Nasby
2016-03-17 07:26:20 -04:00
Robert Haas 3aff33aa68 Fix typos.
Oskari Saarenmaa
2016-03-15 18:06:11 -04:00
Tom Lane 101fd9349e Add a GetForeignUpperPaths callback function for FDWs.
This is basically like the just-added create_upper_paths_hook, but
control is funneled only to the FDW responsible for all the baserels
of the current query; so providing such a callback is much less likely
to add useless overhead than using the hook function is.

The documentation is a bit sketchy.  We'll likely want to improve it,
and/or adjust the call conventions, when we get some experience with
actually using this callback.  Hopefully somebody will find time to
experiment with it before 9.6 feature freeze.
2016-03-14 20:04:48 -04:00
Tom Lane 5864d6a4b6 Provide a planner hook at a suitable place for creating upper-rel Paths.
In the initial revision of the upper-planner pathification work, the only
available way for an FDW or custom-scan provider to inject Paths
representing post-scan-join processing was to insert them during scan-level
GetForeignPaths or similar processing.  While that's not impossible, it'd
require quite a lot of duplicative processing to look forward and see if
the extension would be capable of implementing the whole query.  To improve
matters for custom-scan providers, provide a hook function at the point
where the core code is about to start filling in upperrel Paths.  At this
point Paths are available for the whole scan/join tree, which should reduce
the amount of redundant effort considerably.

(An alternative design that was suggested was to provide a separate hook
for each post-scan-join processing step, but that seems messy and not
clearly more useful.)

Following our time-honored tradition, there's no documentation for this
hook outside the source code.

As-is, this hook is only meant for custom scan providers, which we can't
assume very much about.  A followon patch will implement an FDW callback
to let FDWs do the same thing in a somewhat more structured fashion.
2016-03-14 19:23:29 -04:00
Tom Lane 28048cbaa2 Allow callers of create_foreignscan_path to specify nondefault PathTarget.
Although the default choice of rel->reltarget should typically be
sufficient for scan or join paths, it's not at all sufficient for the
purposes PathTargets were invented for; in particular not for
upper-relation Paths.  So break API compatibility by adding a PathTarget
argument to create_foreignscan_path().  To ease updating of existing
code, accept a NULL value of the argument as selecting rel->reltarget.
2016-03-14 17:31:28 -04:00
Tom Lane 307c78852f Rethink representation of PathTargets.
In commit 19a541143a I did not make PathTarget a subtype of Node,
and embedded a RelOptInfo's reltarget directly into it rather than having
a separately-allocated Node.  In hindsight that was misguided
micro-optimization, enabled by the fact that at that point we didn't have
any Paths with custom PathTargets.  Now that PathTarget processing has
been fleshed out some more, it's easier to see that it's better to have
PathTarget as an indepedent Node type, even if it does cost us one more
palloc to create a RelOptInfo.  So change it while we still can.

This commit just changes the representation, without doing anything more
interesting than that.
2016-03-14 16:59:59 -04:00
Robert Haas 6be84eeb8d Update more comments for 96198d94cb.
Etsuro Fujita, reviewed (though not completely endorsed) by Ashutosh
Bapat, and slightly expanded by me.
2016-03-14 14:29:12 -04:00
Tom Lane 570be1f73f Re-export a few of createplan.c's make_xxx() functions.
CitusDB is using these and don't wish to redesign their code right now.
I am not on board with this being a good idea, or a good precedent,
but I lack the energy to fight about it.
2016-03-12 12:12:59 -05:00
Tom Lane 9118d03a8c When appropriate, postpone SELECT output expressions till after ORDER BY.
It is frequently useful for volatile, set-returning, or expensive functions
in a SELECT's targetlist to be postponed till after ORDER BY and LIMIT are
done.  Otherwise, the functions might be executed for every row of the
table despite the presence of LIMIT, and/or be executed in an unexpected
order.  For example, in
	SELECT x, nextval('seq') FROM tab ORDER BY x LIMIT 10;
it's probably desirable that the nextval() values are ordered the same
as x, and that nextval() is not run more than 10 times.

In the past, Postgres was inconsistent in this area: you would get the
desirable behavior if the ordering were performed via an indexscan, but
not if it had to be done by an explicit sort step.  Getting the desired
behavior reliably required contortions like
	SELECT x, nextval('seq')
	  FROM (SELECT x FROM tab ORDER BY x) ss LIMIT 10;

This patch conditionally postpones evaluation of pure-output target
expressions (that is, those that are not used as DISTINCT, ORDER BY, or
GROUP BY columns) so that they effectively occur after sorting, even if an
explicit sort step is necessary.  Volatile expressions and set-returning
expressions are always postponed, so as to provide consistent semantics.
Expensive expressions (costing more than 10 times typical operator cost,
which by default would include any user-defined function) are postponed
if there is a LIMIT or if there are expressions that must be postponed.

We could be more aggressive and postpone any nontrivial expression, but
there are costs associated with doing so: it requires an extra Result plan
node which adds some overhead, and postponement changes the volume of data
going through the sort step, perhaps for the worse.  Since we tend not to
have very good estimates of the output width of nontrivial expressions,
it's hard to have much confidence in our ability to predict whether
postponement would increase or decrease the cost of the sort; therefore
this patch doesn't attempt to make decisions conditionally on that.
Between these factors and a general desire not to change query behavior
when there's not a demonstrable benefit, it seems best to be conservative
about applying postponement.  We might tweak the decision rules in the
future, though.

Konstantin Knizhnik, heavily rewritten by me
2016-03-11 12:27:50 -05:00
Tom Lane 49635d7b3e Minor additional refactoring of planner.c's PathTarget handling.
Teach make_group_input_target() and make_window_input_target() to work
entirely with the PathTarget representation of tlists, rather than
constructing a tlist and immediately deconstructing it into PathTarget
format.  In itself this only saves a few palloc's; the bigger picture is
that it opens the door for sharing cost_qual_eval work across all of
planner.c's constructions of PathTargets.  I'll come back to that later.

In support of this, flesh out tlist.c's infrastructure for PathTargets
a bit more.
2016-03-11 10:24:55 -05:00
Tom Lane c82c92b111 Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too.
All along, this function should have treated WindowFuncs in a manner
similar to Aggrefs, ie with an option whether or not to recurse into them.
By not considering the case, it was always recursing, which is OK for most
callers (although I suspect that the case in prepare_sort_from_pathkeys
might represent a bug).  But now we need return-without-recursing behavior
as well.  There are also more than a few callers that should never see a
WindowFunc, and now we'll get some error checking on that.
2016-03-10 16:23:52 -05:00
Tom Lane 364a9f47ab Refactor pull_var_clause's API to make it less tedious to extend.
In commit 1d97c19a0f and later c1d9579dd8, we extended
pull_var_clause's API by adding enum-type arguments.  That's sort of a pain
to maintain, though, because it means every time we add a new behavior we
must touch every last one of the call sites, even if there's a reasonable
default behavior that most of them could use.  Let's switch over to using a
bitmask of flags, instead; that seems more maintainable and might save a
nanosecond or two as well.  This commit changes no behavior in itself,
though I'm going to follow it up with one that does add a new behavior.

In passing, remove flatten_tlist(), which has not been used since 9.1
and would otherwise need the same API changes.

Removing these enums means that optimizer/tlist.h no longer needs to
depend on optimizer/var.h.  Changing that caused a number of C files to
need addition of #include "optimizer/var.h" (probably we can thank old
runs of pgrminclude for that); but on balance it seems like a good change
anyway.
2016-03-10 15:53:07 -05:00
Tom Lane 8776c15c85 Fix incorrect tlist generation in create_gather_plan().
This function is written as though Gather doesn't project; but it does.
Even if it did not project, though, we must use build_path_tlist to ensure
that the output columns receive correct sortgroupref labeling.

Per report from Amit Kapila.
2016-03-09 10:56:46 -05:00
Tom Lane 51c0f63e4d Improve handling of pathtargets in planner.c.
Refactor so that the internal APIs in planner.c deal in PathTargets not
targetlists, and establish a more regular structure for deriving the
targets needed for successive steps.

There is more that could be done here; calculating the eval costs of each
successive target independently is both inefficient and wrong in detail,
since we won't actually recompute values available from the input node's
tlist.  But it's no worse than what happened before the pathification
rewrite.  In any case this seems like a good starting point for considering
how to handle Konstantin Knizhnik's function-evaluation-postponement patch.
2016-03-09 01:12:16 -05:00
Tom Lane 9e8b99420f Improve handling of group-column indexes in GroupingSetsPath.
Instead of having planner.c compute a groupColIdx array and store it in
GroupingSetsPaths, make create_groupingsets_plan() find the grouping
columns by searching in the child plan node's tlist.  Although that's
probably a bit slower for create_groupingsets_plan(), it's more like
the way every other plan node type does this, and it provides positive
confirmation that we know which child output columns we're supposed to be
grouping on.  (Indeed, looking at this now, I'm not at all sure that it
wasn't broken before, because create_groupingsets_plan() isn't demanding
an exact tlist match from its child node.)  Also, this allows substantial
simplification in planner.c, because it no longer needs to compute the
groupColIdx array at all; no other cases were using it.

I'd intended to put off this refactoring until later (like 9.7), but
in view of the likely bug fix and the need to rationalize planner.c's
tlist handling so we can do something sane with Konstantin Knizhnik's
function-evaluation-postponement patch, I think it can't wait.
2016-03-08 22:32:11 -05:00
Tom Lane 61fd218930 Fix minor thinko in pathification code.
I passed the wrong "root" struct to create_pathtarget in build_minmax_path.
Since the subroot is a clone of the outer root, this would not cause any
serious problems, but it would waste some cycles because
set_pathtarget_cost_width would not have access to Var width estimates
set up while running query_planner on the subroot.
2016-03-08 16:50:44 -05:00
Tom Lane 8c314b9853 Finish refactoring make_foo() functions in createplan.c.
This patch removes some redundant cost calculations that I left for later
cleanup in commit 3fc6e2d7f5.  There's now a uniform policy that the
make_foo() convenience functions don't do any cost calculations.  Most of
their callers copy costs from the source Path node, and for those that
don't, the calculation in the make_foo() function wasn't necessarily right
anyhow.  (make_result() was particularly a mess, as it was serving multiple
callers using cost calcs designed for only the first one or two that had
ever existed.)  Aside from saving a few cycles, this ensures that what
EXPLAIN prints matches the costs we used for planning purposes.  It does
not change any planner decisions, since the decisions are already made.
2016-03-08 16:28:34 -05:00
Robert Haas 7400559a3f Comment update for fdw_recheck_quals.
Commit 5fc4c26db5 could've done a better
job updating these comments.

Etsuro Fujita
2016-03-08 14:40:55 -05:00
Tom Lane cf8e7b16a5 Spell "parallel" correctly.
Per David Rowley.
2016-03-07 21:48:17 -05:00
Tom Lane 3fc6e2d7f5 Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is.  This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps.  Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step.  We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.

In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan.  It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation.  (A couple of regression test outputs change in consequence of
that.  However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)

There is a great deal left to do here.  This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations.  (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.)  I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.

Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 15:58:22 -05:00
Tom Lane 05893712cc Fix build under OPTIMIZER_DEBUG.
In commit 19a541143a I replaced RelOptInfo.width with
RelOptInfo.reltarget.width, but I missed updating debug_print_rel()
for that because it's not compiled by default.
Reported by Salvador Fandino, patch by Michael Paquier.
2016-02-29 10:14:12 -05:00
Dean Rasheed 41fedc2462 Fix incorrect varlevelsup in security_barrier_replace_vars().
When converting an RTE with securityQuals into a security barrier
subquery RTE, ensure that the Vars in the new subquery's targetlist
all have varlevelsup = 0 so that they correctly refer to the
underlying base relation being wrapped.

The original code was creating new Vars by copying them from existing
Vars referencing the base relation found elsewhere in the query, but
failed to account for the fact that such Vars could come from sublink
subqueries, and hence have varlevelsup > 0. In practice it looks like
this could only happen with nested security barrier views, where the
outer view has a WHERE clause containing a correlated subquery, due to
the order in which the Vars are processed.

Bug: #13988
Reported-by: Adam Guthrie
Backpatch-to: 9.4, where updatable SB views were introduced
2016-02-29 12:28:06 +00:00
Robert Haas 35746bc348 Add new FDW API to test for parallel-safety.
This is basically a bug fix; the old code assumes that a ForeignScan
is always parallel-safe, but for postgres_fdw, for example, this is
definitely false.  It should be true for file_fdw, though, since a
worker can read a file from the filesystem just as well as any other
backend process.

Original patch by Thomas Munro.  Documentation, and changes to the
comments, by me.
2016-02-26 16:14:46 +05:30
Tom Lane 19a541143a Add an explicit representation of the output targetlist to Paths.
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist).  However, there are good
reasons to remove that assumption.  For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free".  While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node).  Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next.  Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit.  It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.

While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist.  To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)

I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation.  Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't.  Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths.  This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more.  This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
2016-02-18 20:02:03 -05:00
Tom Lane d4c3a156cb Remove GROUP BY columns that are functionally dependent on other columns.
If a GROUP BY clause includes all columns of a non-deferred primary key,
as well as other columns of the same relation, those other columns are
redundant and can be dropped from the grouping; the pkey is enough to
ensure that each row of the table corresponds to a separate group.
Getting rid of the excess columns will reduce the cost of the sorting or
hashing needed to implement GROUP BY, and can indeed remove the need for
a sort step altogether.

This seems worth testing for since many query authors are not aware of
the GROUP-BY-primary-key exception to the rule about queries not being
allowed to reference non-grouped-by columns in their targetlists or
HAVING clauses.  Thus, redundant GROUP BY items are not uncommon.  Also,
we can make the test pretty cheap in most queries where it won't help
by not looking up a rel's primary key until we've found that at least
two of its columns are in GROUP BY.

David Rowley, reviewed by Julien Rouhaud
2016-02-11 17:34:59 -05:00
Tom Lane 2564be360a Fix typo in comment. 2016-02-11 15:20:14 -05:00
Andres Freund a6897efab9 Fix overeager pushdown of HAVING clauses when grouping sets are used.
In 61444bfb we started to allow HAVING clauses to be fully pushed down
into WHERE, even when grouping sets are in use. That turns out not to
work correctly, because grouping sets can "produce" NULLs, meaning that
filtering in WHERE and HAVING can have different results, even when no
aggregates or volatile functions are involved.

Instead only allow pushdown of empty grouping sets.

It'd be nice to do better, but the exact mechanics of deciding which
cases are safe are still being debated. It's important to give correct
results till we find a good solution, and such a solution might not be
appropriate for backpatching anyway.

Bug: #13863
Reported-By: 'wrb'
Diagnosed-By: Dean Rasheed
Author: Andrew Gierth
Reviewed-By: Dean Rasheed and Andres Freund
Discussion: 20160113183558.12989.56904@wrigleys.postgresql.org
Backpatch: 9.5, where grouping sets were introduced
2016-02-08 11:03:31 +01:00
Robert Haas 7c944bd903 Introduce a new GUC force_parallel_mode for testing purposes.
When force_parallel_mode = true, we enable the parallel mode restrictions
for all queries for which this is believed to be safe.  For the subset of
those queries believed to be safe to run entirely within a worker, we spin
up a worker and run the query there instead of running it in the
original process.  When force_parallel_mode = regress, make additional
changes to allow the regression tests to run cleanly even though parallel
workers have been injected under the hood.

Taken together, this facilitates both better user testing and better
regression testing of the parallelism code.

Robert Haas, with help from Amit Kapila and Rushabh Lathia.
2016-02-07 11:41:33 -05:00
Alvaro Herrera 3cb5867b7d Don't test for system columns on join relations
create_foreignscan_plan needs to know whether any system columns are
requested from a relation (this flag is needed by ForeignNext during
execution).  However, for join relations this is a pointless test,
because it's not possible to request system columns from them, so
remove the check.

Author: Etsuro Fujita
Discussion: http://www.postgresql.org/message-id/56AA0FC5.9000207@lab.ntt.co.jp
Reviewed-by: David Rowley, Robert Haas
2016-02-02 19:20:02 +01:00
Robert Haas fbe5a3fb73 Only try to push down foreign joins if the user mapping OIDs match.
Previously, the foreign join pushdown infrastructure left the question
of security entirely up to individual FDWs, but it would be easy for
a foreign data wrapper to inadvertently open up subtle security holes
that way.  So, make it the core code's job to determine which user
mapping OID is relevant, and don't attempt join pushdown unless it's
the same for all relevant relations.

Per a suggestion from Tom Lane.  Shigeru Hanada and Ashutosh Bapat,
reviewed by Etsuro Fujita and KaiGai Kohei, with some further
changes by me.
2016-01-28 14:05:36 -05:00
Robert Haas eaf7b1f643 Assert that create_unique_path returns non-NULL.
Per off-list discussion with Tom Lane and Michael Paquier, Coverity
gets unhappy if this is not done.
2016-01-27 22:03:18 -05:00
Tom Lane b99551832e Add defenses against putting expanded objects into Const nodes.
Putting a reference to an expanded-format value into a Const node would be
a bad idea for a couple of reasons.  It'd be possible for the supposedly
immutable Const to change value, if something modified the referenced
variable ... in fact, if the Const's reference were R/W, any function that
has the Const as argument might itself change it at runtime.  Also, because
datumIsEqual() is pretty simplistic, the Const might fail to compare equal
to other Consts that it should compare equal to, notably including copies
of itself.  This could lead to unexpected planner behavior, such as "could
not find pathkey item to sort" errors or inferior plans.

I have not been able to find any way to get an expanded value into a Const
within the existing core code; but Paul Ramsey was able to trigger the
problem by writing a datatype input function that returns an expanded
value.

The best fix seems to be to establish a rule that varlena values being
placed into Const nodes should be passed through pg_detoast_datum().
That will do nothing (and cost little) in normal cases, but it will flatten
expanded values and thereby avoid the above problems.  Also, it will
convert short-header or compressed values into canonical format, which will
avoid possible unexpected lack-of-equality issues for those cases too.
And it provides a last-ditch defense against putting a toasted value into
a Const, which we already knew was dangerous, cf commit 2b0c86b665.
(In the light of this discussion, I'm no longer sure that that commit
provided 100% protection against such cases, but this fix should do it.)

The test added in commit 65c3d05e18 to catch datatype input functions
with unstable results would fail for functions that returned expanded
values; but it seems a bit uncharitable to deem a result unstable just
because it's expressed in expanded form, so revise the coding so that we
check for bitwise equality only after applying pg_detoast_datum().  That's
a sufficient condition anyway given the new rule about detoasting when
forming a Const.

Back-patch to 9.5 where the expanded-object facility was added.  It's
possible that this should go back further; but in the absence of clear
evidence that there's any live bug in older branches, I'll refrain for now.
2016-01-21 12:56:08 -05:00
Robert Haas 45be99f8cd Support parallel joins, and make related improvements.
The core innovation of this patch is the introduction of the concept
of a partial path; that is, a path which if executed in parallel will
generate a subset of the output rows in each process.  Gathering a
partial path produces an ordinary (complete) path.  This allows us to
generate paths for parallel joins by joining a partial path for one
side (which at the baserel level is currently always a Partial Seq
Scan) to an ordinary path on the other side.  This is subject to
various restrictions at present, especially that this strategy seems
unlikely to be sensible for merge joins, so only nested loops and
hash joins paths are generated.

This also allows an Append node to be pushed below a Gather node in
the case of a partitioned table.

Testing revealed that early versions of this patch made poor decisions
in some cases, which turned out to be caused by the fact that the
original cost model for Parallel Seq Scan wasn't very good.  So this
patch tries to make some modest improvements in that area.

There is much more to be done in the area of generating good parallel
plans in all cases, but this seems like a useful step forward.

Patch by me, reviewed by Dilip Kumar and Amit Kapila.
2016-01-20 14:40:26 -05:00
Robert Haas a7de3dc5c3 Support multi-stage aggregation.
Aggregate nodes now have two new modes: a "partial" mode where they
output the unfinalized transition state, and a "finalize" mode where
they accept unfinalized transition states rather than individual
values as input.

These new modes are not used anywhere yet, but they will be necessary
for parallel aggregation.  The infrastructure also figures to be
useful for cases where we want to aggregate local data and remote
data via the FDW interface, and want to bring back partial aggregates
from the remote side that can then be combined with locally generated
partial aggregates to produce the final value.  It may also be useful
even when neither FDWs nor parallelism are in play, as explained in
the comments in nodeAgg.c.

David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
Linnakangas, Haribabu Kommi, and me.
2016-01-20 13:46:50 -05:00
Tom Lane 49b4950650 Add explicit cast to amcostestimate call.
My compiler doesn't complain here, but David Rowley's does ...
2016-01-17 22:56:16 -05:00
Tom Lane 65c5fcd353 Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-17 19:36:59 -05:00
Tom Lane 8d290c8ec6 Re-pgindent a few files.
In preparation for landing index AM interface changes.
2016-01-17 19:13:18 -05:00
Tom Lane a923af382c Fix build_grouping_chain() to not clobber its input lists.
There's no good reason for stomping on the input data; it makes the logic
in this function no simpler, in fact probably the reverse.  And it makes
it impossible to separate path generation from plan generation, as I'm
working towards doing; that will require more than one traversal of these
lists.
2016-01-14 11:51:57 -05:00
Robert Haas 950ab82c3d Remove obsolete comment.
Noted while reviewing a question from Dickson S. Guedes.
2016-01-10 21:35:33 -05:00
Tom Lane a54676acad Marginal cleanup of GROUPING SETS code in grouping_planner().
Improve comments and make it a shade less messy.  I think we might want
to move all of this somewhere else later, but it needs to be more
readable first.

In passing, re-pgindent the file, affecting some recently-added comments
concerning parallel query planning.
2016-01-07 20:32:35 -05:00
Tom Lane c44d013835 Delay creation of subplan tlist until after create_plan().
Once upon a time it was necessary for grouping_planner() to determine
the tlist it wanted from the scan/join plan subtree before it called
query_planner(), because query_planner() would actually make a Plan using
that.  But we refactored things a long time ago to delay construction of
the Plan tree till later, so there's no need to build that tlist until
(and indeed unless) we're ready to plaster it onto the Plan.  The only
thing query_planner() cares about is what Vars are going to be needed for
the tlist, and it can perfectly well get that by looking at the real tlist
rather than some masticated version.

Well, actually, there is one minor glitch in that argument, which is that
make_subplanTargetList also adds Vars appearing only in HAVING to the
tlist it produces.  So now we have to account for HAVING explicitly in
build_base_rel_tlists.  But that just adds a few lines of code, and
I doubt it moves the needle much on processing time; we might be doing
pull_var_clause() twice on the havingQual, but before we had it scanning
dummy tlist entries instead.

This is a very small down payment on rationalizing grouping_planner
enough so it can be refactored.
2016-01-07 20:23:57 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Tom Lane efe4c9d704 Add some comments about division of labor between rewriter and planner.
The rationale for the way targetlist processing is done wasn't clearly
stated anywhere, and I for one had forgotten some of the details.  Having
just painfully re-learned them, add some breadcrumbs for the next person.
2015-12-29 18:50:35 -05:00
Robert Haas 51d152f18e Change Gather not to use a physical tlist.
This should have been part of the original commit, but was missed.
Pushing data between processes is expensive, so we definitely want
to project away unneeded columns here, just as we do for other nodes
like Sort and Hash that care about the volume of data.
2015-12-23 13:41:06 -05:00
Robert Haas ccd8f97922 postgres_fdw: Consider requesting sorted data so we can do a merge join.
When use_remote_estimate is enabled, consider adding ORDER BY to the
query we sending to the remote server so that we can use that ordered
data for a merge join.  Commit f18c944b61
arranges to push down the query pathkeys, which seems like the case
mostly likely to be a win, but testing shows this can sometimes win,
too.

For a regular table, we know which indexes are present and therefore
test whether the ordering provided by each such index is useful.  Here,
we take the opposite approach: guess what orderings would be useful if
they could be generated cheaply, and then ask the remote side what those
will cost.

Ashutosh Bapat, with very substantial cosmetic revisions by me.  Also
reviewed by Rushabh Lathia.
2015-12-22 13:46:40 -05:00
Stephen Frost e5e11c8cca Collect the global OR of hasRowSecurity flags for plancache
We carry around information about if a given query has row security or
not to allow the plancache to use that information to invalidate a
planned query in the event that the environment changes.

Previously, the flag of one of the subqueries was simply being copied
into place to indicate if the query overall included RLS components.
That's wrong as we need the global OR of all subqueries.  Fix by
changing the code to match how fireRIRules works, which is results
in OR'ing all of the flags.

Noted by Tom.

Back-patch to 9.5 where RLS was introduced.
2015-12-14 20:05:43 -05:00
Tom Lane 4fcf48450d Get rid of the planner's LateralJoinInfo data structure.
I originally modeled this data structure on SpecialJoinInfo, but after
commit acfcd45cac that looks like a pretty poor decision.
All we really need is relid sets identifying laterally-referenced rels;
and most of the time, what we want to know about includes indirect lateral
references, a case the LateralJoinInfo data was unsuited to compute with
any efficiency.  The previous commit redefined RelOptInfo.lateral_relids
as the transitive closure of lateral references, so that it easily supports
checking indirect references.  For the places where we really do want just
direct references, add a new RelOptInfo field direct_lateral_relids, which
is easily set up as a copy of lateral_relids before we perform the
transitive closure calculation.  Then we can just drop lateral_info_list
and LateralJoinInfo and the supporting code.  This makes the planner's
handling of lateral references noticeably more efficient, and shorter too.

Such a change can't be back-patched into stable branches for fear of
breaking extensions that might be looking at the planner's data structures;
but it seems not too late to push it into 9.5, so I've done so.
2015-12-11 15:52:38 -05:00
Tom Lane acfcd45cac Still more fixes for planner's handling of LATERAL references.
More fuzz testing by Andreas Seltenreich exposed that the planner did not
cope well with chains of lateral references.  If relation X references Y
laterally, and Y references Z laterally, then we will have to scan X on the
inside of a nestloop with Z, so for all intents and purposes X is laterally
dependent on Z too.  The planner did not understand this and would generate
intermediate joins that could not be used.  While that was usually harmless
except for wasting some planning cycles, under the right circumstances it
would lead to "failed to build any N-way joins" or "could not devise a
query plan" planner failures.

To fix that, convert the existing per-relation lateral_relids and
lateral_referencers relid sets into their transitive closures; that is,
they now show all relations on which a rel is directly or indirectly
laterally dependent.  This not only fixes the chained-reference problem
but allows some of the relevant tests to be made substantially simpler
and faster, since they can be reduced to simple bitmap manipulations
instead of searches of the LateralJoinInfo list.

Also, when a PlaceHolderVar that is due to be evaluated at a join contains
lateral references, we should treat those references as indirect lateral
dependencies of each of the join's base relations.  This prevents us from
trying to join any individual base relations to the lateral reference
source before the join is formed, which again cannot work.

Andreas' testing also exposed another oversight in the "dangerous
PlaceHolderVar" test added in commit 85e5e222b1.  Simply rejecting
unsafe join paths in joinpath.c is insufficient, because in some cases
we will end up rejecting *all* possible paths for a particular join, again
leading to "could not devise a query plan" failures.  The restriction has
to be known also to join_is_legal and its cohort functions, so that they
will not select a join for which that will happen.  I chose to move the
supporting logic into joinrels.c where the latter functions are.

Back-patch to 9.3 where LATERAL support was introduced.
2015-12-11 14:22:20 -05:00
Robert Haas 385f337c9f Allow foreign and custom joins to handle EvalPlanQual rechecks.
Commit e7cb7ee145 provided basic
infrastructure for allowing a foreign data wrapper or custom scan
provider to replace a join of one or more tables with a scan.
However, this infrastructure failed to take into account the need
for possible EvalPlanQual rechecks, and ExecScanFetch would fail
an assertion (or just overwrite memory) if such a check was attempted
for a plan containing a pushed-down join.  To fix, adjust the EPQ
machinery to skip some processing steps when scanrelid == 0, making
those the responsibility of scan's recheck method, which also has
the responsibility in this case of correctly populating the relevant
slot.

To allow foreign scans to gain control in the right place to make
use of this new facility, add a new, optional RecheckForeignScan
method.  Also, allow a foreign scan to have a child plan, which can
be used to correctly populate the slot (or perhaps for something
else, but this is the only use currently envisioned).

KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro
Horiguchi.
2015-12-08 12:31:03 -05:00
Tom Lane edca44b152 Simplify LATERAL-related calculations within add_paths_to_joinrel().
While convincing myself that commit 7e19db0c09 would solve both of
the problems recently reported by Andreas Seltenreich, I realized that
add_paths_to_joinrel's handling of LATERAL restrictions could be made
noticeably simpler and faster if we were to retain the minimum possible
parameterization for each joinrel (that is, the set of relids supplying
unsatisfied lateral references in it).  We already retain that for
baserels, in RelOptInfo.lateral_relids, so we can use that field for
joinrels too.

I re-pgindent'd the files touched here, which affects some unrelated
comments.

This is, I believe, just a minor optimization not a bug fix, so no
back-patch.
2015-12-07 18:56:17 -05:00
Tom Lane 7e19db0c09 Fix another oversight in checking if a join with LATERAL refs is legal.
It was possible for the planner to decide to join a LATERAL subquery to
the outer side of an outer join before the outer join itself is completed.
Normally that's fine because of the associativity rules, but it doesn't
work if the subquery contains a lateral reference to the inner side of the
outer join.  In such a situation the outer join *must* be done first.
join_is_legal() missed this consideration and would allow the join to be
attempted, but the actual path-building code correctly decided that no
valid join path could be made, sometimes leading to planner errors such as
"failed to build any N-way joins".

Per report from Andreas Seltenreich.  Back-patch to 9.3 where LATERAL
support was added.
2015-12-07 17:42:11 -05:00
Robert Haas c7485a82c3 Add handling for GatherPath to print_path.
Peter Geoghegan
2015-12-02 08:19:50 -05:00
Robert Haas 7907a949ab Fix incomplete set_foreignscan_references handling for fdw_recheck_quals
KaiGai Kohei
2015-11-18 22:12:21 -05:00
Robert Haas 80558c1f5a Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker.  Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation.  That's probably
the right decision in most cases, anyway.

Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL.  The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now.  Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.

Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0.  Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.

Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.

Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 09:02:52 -05:00
Robert Haas f0661c4e8c Make sequential scans parallel-aware.
In addition, this path fills in a number of missing bits and pieces in
the parallel infrastructure.  Paths and plans now have a parallel_aware
flag indicating whether whatever parallel-aware logic they have should
be engaged.  It is believed that we will need this flag for a number of
path/plan types, not just sequential scans, which is why the flag is
generic rather than part of the SeqScan structures specifically.
Also, execParallel.c now gives parallel nodes a chance to initialize
their PlanState nodes from the DSM during parallel worker startup.

Amit Kapila, with a fair amount of adjustment by me.  Review of previous
patch versions by Haribabu Kommi and others.
2015-11-11 08:57:52 -05:00
Robert Haas 5c90a2ffdd Comment update.
Adjust to account for 5fc4c26db5.

Etsuro Fujita
2015-11-09 13:48:50 -05:00
Tom Lane 59464bd6f9 Improve implementation of GEQO's init_tour() function.
Rather than filling a temporary array and then copying values to the
output array, we can generate the required random permutation in-place
using the Fisher-Yates shuffle algorithm.  This is shorter as well as
more efficient than before.  It's pretty unlikely that anyone would
notice a speed improvement, but shorter code is better.

Nathan Wagner, edited a bit by me
2015-11-05 10:46:14 -05:00
Robert Haas 8538a63070 Make Gather node projection-capable.
The original Gather code failed to mark a Gather node as not able to
do projection, but it couldn't, even though it did call initialize its
projection info via ExecAssignProjectionInfo.  There doesn't seem to
be any good reason for this node not to have projection capability,
so clean things up so that it does.  Without this, plans using Gather
nodes might need to carry extra Result nodes to do projection.
2015-10-28 00:27:58 +01:00
Robert Haas a53c06a13e Prohibit parallel query when the isolation level is serializable.
In order for this to be safe, the code which hands true serializability
will need to taught that the SIRead locks taken by a parallel worker
pertain to the same transaction as those taken by the parallel leader.
Some further changes may be needed as well.  Until the necessary
adaptations are made, don't generate parallel plans in serializable
mode, and if a previously-generated parallel plan is used after
serializable mode has been activated, run it serially.

This fixes a bug in commit 7aea8e4f2d.
2015-10-16 11:58:27 -04:00
Robert Haas 5fc4c26db5 Allow FDWs to push down quals without breaking EvalPlanQual rechecks.
This fixes a long-standing bug which was discovered while investigating
the interaction between the new join pushdown code and the EvalPlanQual
machinery: if a ForeignScan appears on the inner side of a paramaterized
nestloop, an EPQ recheck would re-return the original tuple even if
it no longer satisfied the pushed-down quals due to changed parameter
values.

This fix adds a new member to ForeignScan and ForeignScanState and a
new argument to make_foreignscan, and requires changes to FDWs which
push down quals to populate that new argument with a list of quals they
have chosen to push down.  Therefore, I'm only back-patching to 9.5,
even though the bug is not new in 9.5.

Etsuro Fujita, reviewed by me and by Kyotaro Horiguchi.
2015-10-15 13:00:40 -04:00
Stephen Frost b7aac36245 Handle append_rel_list in expand_security_qual
During expand_security_quals, we take the security barrier quals on an
RTE and create a subquery which evaluates the quals.  During this, we
have to replace any variables in the outer query which refer to the
original RTE with references to the columns from the subquery.

We need to also perform that replacement for any Vars in the
append_rel_list.

Only backpatching to 9.5 as we only go through this process in 9.4 for
auto-updatable security barrier views, which UNION ALL queries aren't.

Discovered by Haribabu Kommi

Patch by Dean Rasheed
2015-10-09 10:49:02 -04:00
Andres Freund ad22783792 Fix several bugs related to ON CONFLICT's EXCLUDED pseudo relation.
Four related issues:

1) attnos/varnos/resnos for EXCLUDED were out of sync when a column
   after one dropped in the underlying relation was referenced.
2) References to whole-row variables (i.e. EXCLUDED.*) lead to errors.
3) It was possible to reference system columns in the EXCLUDED pseudo
   relations, even though they would not have valid contents.
4) References to EXCLUDED were rewritten by the RLS machinery, as
   EXCLUDED was treated as if it were the underlying relation.

To fix the first two issues, generate the excluded targetlist with
dropped columns in mind and add an entry for whole row
variables. Instead of unconditionally adding a wholerow entry we could
pull up the expression if needed, but doing it unconditionally seems
simpler. The wholerow entry is only really needed for ruleutils/EXPLAIN
support anyway.

The remaining two issues are addressed by changing the EXCLUDED RTE to
have relkind = composite. That fits with EXCLUDED not actually being a
real relation, and allows to treat it differently in the relevant
places. scanRTEForColumn now skips looking up system columns when the
RTE has a composite relkind; fireRIRrules() already had a corresponding
check, thereby preventing RLS expansion on EXCLUDED.

Also add tests for these issues, and improve a few comments around
excluded handling in setrefs.c.

Reported-By: Peter Geoghegan, Geoff Winkless
Author: Andres Freund, Amit Langote, Peter Geoghegan
Discussion: CAEzk6fdzJ3xYQZGbcuYM2rBd2BuDkUksmK=mY9UYYDugg_GgZg@mail.gmail.com,
   CAM3SWZS+CauzbiCEcg-GdE6K6ycHE_Bz6Ksszy8AoixcMHOmsA@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced
2015-10-03 15:12:10 +02:00
Tom Lane 21995d3f6d Fix documentation error in commit 8703059c6b.
Etsuro Fujita spotted a thinko in the README commentary.
2015-10-01 10:32:11 -04:00
Robert Haas 3bd909b220 Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream.  It can also run the plan itself, if the workers are
unavailable or haven't started up yet.  It is intended to work with
the Partial Seq Scan node which will be added in future commits.

It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used.  In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results.  So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes.  Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.

There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne.  But we're getting
close.

Amit Kapila.  Some designs suggestions were provided by me, and I also
reviewed the patch.  Single-copy mode, documentation, and other minor
changes also by me.
2015-09-30 19:23:36 -04:00
Robert Haas 758fcfdc01 Comment update for join pushdown.
Etsuro Fujita
2015-09-29 07:42:30 -04:00
Robert Haas d1b7c1ffe7 Parallel executor support.
This code provides infrastructure for a parallel leader to start up
parallel workers to execute subtrees of the plan tree being executed
in the master.  User-supplied parameters from ParamListInfo are passed
down, but PARAM_EXEC parameters are not.  Various other constructs,
such as initplans, subplans, and CTEs, are also not currently shared.
Nevertheless, there's enough here to support a basic implementation of
parallel query, and we can lift some of the current restrictions as
needed.

Amit Kapila and Robert Haas
2015-09-28 21:55:57 -04:00
Tom Lane 39df0f150c Allow planner to use expression-index stats for function calls in WHERE.
Previously, a function call appearing at the top level of WHERE had a
hard-wired selectivity estimate of 0.3333333, a kludge conveniently dated
in the source code itself to July 1992.  The expectation at the time was
that somebody would soon implement estimator support functions analogous
to those for operators; but no such code has appeared, nor does it seem
likely to in the near future.  We do have an alternative solution though,
at least for immutable functions on single relations: creating an
expression index on the function call will allow ANALYZE to gather stats
about the function's selectivity.  But the code in clause_selectivity()
failed to make use of such data even if it exists.

Refactor so that that will happen.  I chose to make it try this technique
for any clause type for which clause_selectivity() doesn't have a special
case, not just functions.  To avoid adding unnecessary overhead in the
common case where we don't learn anything new, make selfuncs.c provide an
API that hooks directly to examine_variable() and then var_eq_const(),
rather than the previous coding which laboriously constructed an OpExpr
only so that it could be expensively deconstructed again.

I preserved the behavior that the default estimate for a function call
is 0.3333333.  (For any other expression node type, it's 0.5, as before.)
I had originally thought to make the default be 0.5 across the board, but
changing a default estimate that's survived for twenty-three years seems
like something not to do without a lot more testing than I care to put
into it right now.

Per a complaint from Jehan-Guillaume de Rorthais.  Back-patch into 9.5,
but not further, at least for the moment.
2015-09-24 18:35:46 -04:00
Robert Haas 7aea8e4f2d Determine whether it's safe to attempt a parallel plan for a query.
Commit 924bcf4f16 introduced a framework
for parallel computation in PostgreSQL that makes most but not all
built-in functions safe to execute in parallel mode.  In order to have
parallel query, we'll need to be able to determine whether that query
contains functions (either built-in or user-defined) that cannot be
safely executed in parallel mode.  This requires those functions to be
labeled, so this patch introduces an infrastructure for that.  Some
functions currently labeled as safe may need to be revised depending on
how pending issues related to heavyweight locking under paralllelism
are resolved.

Parallel plans can't be used except for the case where the query will
run to completion.  If portal execution were suspended, the parallel
mode restrictions would need to remain in effect during that time, but
that might make other queries fail.  Therefore, this patch introduces
a framework that enables consideration of parallel plans only when it
is known that the plan will be run to completion.  This probably needs
some refinement; for example, at bind time, we do not know whether a
query run via the extended protocol will be execution to completion or
run with a limited fetch count.  Having the client indicate its
intentions at bind time would constitute a wire protocol break.  Some
contexts in which parallel mode would be safe are not adjusted by this
patch; the default is not to try parallel plans except from call sites
that have been updated to say that such plans are OK.

This commit doesn't introduce any parallel paths or plans; it just
provides a way to determine whether they could potentially be used.
I'm committing it on the theory that the remaining parallel sequential
scan patches will also get committed to this release, hopefully in the
not-too-distant future.

Robert Haas and Amit Kapila.  Reviewed (in earlier versions) by Noah
Misch.
2015-09-16 15:38:47 -04:00
Tom Lane 87efbc2be1 Fix setrefs.c comment properly.
The "typo" alleged in commit 1e460d4bd was actually a comment that was
correct when written, but I missed updating it in commit b5282aa89.
Use a slightly less specific (and hopefully more future-proof) description
of what is collected.  Back-patch to 9.2 where that commit appeared, and
revert the comment to its then-entirely-correct state before that.
2015-09-10 10:23:56 -04:00
Stephen Frost 1e460d4bd6 Fix typo in setrefs.c
We're adding OIDs, not TIDs, to invalItems.

Pointed out by Etsuro Fujita.

Back-patch to all supported branches.
2015-09-10 09:22:03 -04:00
Heikki Linnakangas c80b5f66c6 Fix misc typos.
Oskari Saarenmaa. Backpatch to stable branches where applicable.
2015-09-05 11:35:49 +03:00
Tom Lane cfe30a72fa Undo mistaken tightening in join_is_legal().
One of the changes I made in commit 8703059c6b turns out not to have
been such a good idea: we still need the exception in join_is_legal() that
allows a join if both inputs already overlap the RHS of the special join
we're checking.  Otherwise we can miss valid plans, and might indeed fail
to find a plan at all, as in recent report from Andreas Seltenreich.

That code was added way back in commit c17117649b, but I failed to
include a regression test case then; my bad.  Put it back with a better
explanation, and a test this time.  The logic does end up a bit different
than before though: I now believe it's appropriate to make this check
first, thereby allowing such a case whether or not we'd consider the
previous SJ(s) to commute with this one.  (Presumably, we already decided
they did; but it was confusing to have this consideration in the middle
of the code that was handling the other case.)

Back-patch to all active branches, like the previous patch.
2015-08-12 21:19:03 -04:00
Tom Lane 68fa28f771 Postpone extParam/allParam calculations until the very end of planning.
Until now we computed these Param ID sets at the end of subquery_planner,
but that approach depends on subquery_planner returning a concrete Plan
tree.  We would like to switch over to returning one or more Paths for a
subquery, and in that representation the necessary details aren't fully
fleshed out (not to mention that we don't really want to do this work for
Paths that end up getting discarded).  Hence, refactor so that we can
compute the param ID sets at the end of planning, just before
set_plan_references is run.

The main change necessary to make this work is that we need to capture
the set of outer-level Param IDs available to the current query level
before exiting subquery_planner, since the outer levels' plan_params lists
are transient.  (That's not going to pose a problem for returning Paths,
since all the work involved in producing that data is part of expression
preprocessing, which will continue to happen before Paths are produced.)
On the plus side, this change gets rid of several existing kluges.

Eventually I'd like to get rid of SS_finalize_plan altogether in favor of
doing this work during set_plan_references, but that will require some
complex rejiggering because SS_finalize_plan needs to visit subplans and
initplans before the main plan.  So leave that idea for another day.
2015-08-11 23:48:37 -04:00
Tom Lane 4200a92862 Further mucking with PlaceHolderVar-related restrictions on join order.
Commit 85e5e222b1 turns out not to have taken
care of all cases of the partially-evaluatable-PlaceHolderVar problem found
by Andreas Seltenreich's fuzz testing.  I had set it up to check for risky
PHVs only in the event that we were making a star-schema-based exception to
the param_source_rels join ordering heuristic.  However, it turns out that
the problem can occur even in joins that satisfy the param_source_rels
heuristic, in which case allow_star_schema_join() isn't consulted.
Refactor so that we check for risky PHVs whenever the proposed join has
any remaining parameterization.

Back-patch to 9.2, like the previous patch (except for the regression test
case, which only works back to 9.3 because it uses LATERAL).

Note that this discovery implies that problems of this sort could've
occurred in 9.2 and up even before the star-schema patch; though I've not
tried to prove that experimentally.
2015-08-10 17:18:17 -04:00
Tom Lane 89db83922a Further adjustments to PlaceHolderVar removal.
A new test case from Andreas Seltenreich showed that we were still a bit
confused about removing PlaceHolderVars during join removal.  Specifically,
remove_rel_from_query would remove a PHV that was used only underneath
the removable join, even if the place where it's used was the join partner
relation and not the join clause being deleted.  This would lead to a
"too late to create a new PlaceHolderInfo" error later on.  We can defend
against that by checking ph_eval_at to see if the PHV could possibly be
getting used at some partner rel.

Also improve some nearby LATERAL-related logic.  I decided that the check
on ph_lateral needed to take precedence over the check on ph_needed, in
case there's a lateral reference underneath the join being considered.
(That may be impossible, but I'm not convinced of it, and it's easy enough
to defend against the case.)  Also, I realized that remove_rel_from_query's
logic for updating LateralJoinInfos is dead code, because we don't build
those at all until after join removal.

Back-patch to 9.3.  Previous versions didn't have the LATERAL issues, of
course, and they also didn't attempt to remove PlaceHolderInfos during join
removal.  (I'm starting to wonder if changing that was really such a great
idea.)
2015-08-07 14:13:50 -04:00
Tom Lane bab163e121 Fix old oversight in join removal logic.
Commit 9e7e29c75a introduced an Assert that
join removal didn't reduce the eval_at set of any PlaceHolderVar to empty.
At first glance it looks like join_is_removable ensures that's true --- but
actually, the loop in join_is_removable skips PlaceHolderVars that are not
referenced above the join due to be removed.  So, if we don't want any
empty eval_at sets, the right thing to do is to delete any now-unreferenced
PlaceHolderVars from the data structure entirely.

Per fuzz testing by Andreas Seltenreich.  Back-patch to 9.3 where the
aforesaid Assert was added.
2015-08-06 22:14:27 -04:00
Tom Lane cde35cf4ae Fix eclass_useful_for_merging to give valid results for appendrel children.
Formerly, this function would always return "true" for an appendrel child
relation, because it would think that the appendrel parent was a potential
join target for the child.  In principle that should only lead to some
inefficiency in planning, but fuzz testing by Andreas Seltenreich disclosed
that it could lead to "could not find pathkey item to sort" planner errors
in odd corner cases.  Specifically, we would think that all columns of a
child table's multicolumn index were interesting pathkeys, causing us to
generate a MergeAppend path that sorts by all the columns.  However, if any
of those columns weren't actually used above the level of the appendrel,
they would not get added to that rel's targetlist, which would result in
being unable to resolve the MergeAppend's sort keys against its targetlist
during createplan.c.

Backpatch to 9.3.  In older versions, columns of an appendrel get added
to its targetlist even if they're not mentioned above the scan level,
so that the failure doesn't occur.  It might be worth back-patching this
fix to older versions anyway, but I'll refrain for the moment.
2015-08-06 20:14:53 -04:00
Tom Lane 8703059c6b Further fixes for degenerate outer join clauses.
Further testing revealed that commit f69b4b9495 was still a few
bricks shy of a load: minor tweaking of the previous test cases resulted
in the same wrong-outer-join-order problem coming back.  After study
I concluded that my previous changes in make_outerjoininfo() were just
accidentally masking the problem, and should be reverted in favor of
forcing syntactic join order whenever an upper outer join's predicate
doesn't mention a lower outer join's LHS.  This still allows the
chained-outer-joins style that is the normally optimizable case.

I also tightened things up some more in join_is_legal().  It seems to me
on review that what's really happening in the exception case where we
ignore a mismatched special join is that we're allowing the proposed join
to associate into the RHS of the outer join we're comparing it to.  As
such, we should *always* insist that the proposed join be a left join,
which eliminates a bunch of rather dubious argumentation.  The case where
we weren't enforcing that was the one that was already known buggy anyway
(it had a violatable Assert before the aforesaid commit) so it hardly
deserves a lot of deference.

Back-patch to all active branches, like the previous patch.  The added
regression test case failed in all branches back to 9.1, and I think it's
only an unrelated change in costing calculations that kept 9.0 from
choosing a broken plan.
2015-08-06 15:35:46 -04:00
Tom Lane 6af9ee4c8c Make real sure we don't reassociate joins into or out of SEMI/ANTI joins.
Per the discussion in optimizer/README, it's unsafe to reassociate anything
into or out of the RHS of a SEMI or ANTI join.  An example from Piotr
Stefaniak showed that join_is_legal() wasn't sufficiently enforcing this
rule, so lock it down a little harder.

I couldn't find a reasonably simple example of the optimizer trying to
do this, so no new regression test.  (Piotr's example involved the random
search in GEQO accidentally trying an invalid case and triggering a sanity
check way downstream in clause selectivity estimation, which did not seem
like a sequence of events that would be useful to memorialize in a
regression test as-is.)

Back-patch to all active branches.
2015-08-05 14:39:29 -04:00
Tom Lane 85e5e222b1 Fix a PlaceHolderVar-related oversight in star-schema planning patch.
In commit b514a7460d, I changed the planner
so that it would allow nestloop paths to remain partially parameterized,
ie the inner relation might need parameters from both the current outer
relation and some upper-level outer relation.  That's fine so long as we're
talking about distinct parameters; but the patch also allowed creation of
nestloop paths for cases where the inner relation's parameter was a
PlaceHolderVar whose eval_at set included the current outer relation and
some upper-level one.  That does *not* work.

In principle we could allow such a PlaceHolderVar to be evaluated at the
lower join node using values passed down from the upper relation along with
values from the join's own outer relation.  However, nodeNestloop.c only
supports simple Vars not arbitrary expressions as nestloop parameters.
createplan.c is also a few bricks shy of being able to handle such cases;
it misplaces the PlaceHolderVar parameters in the plan tree, which is why
the visible symptoms of this bug are "plan should not reference subplan's
variable" and "failed to assign all NestLoopParams to plan nodes" planner
errors.

Adding the necessary complexity to make this work doesn't seem like it
would be repaid in significantly better plans, because in cases where such
a PHV exists, there is probably a corresponding join order constraint that
would allow a good plan to be found without using the star-schema exception.
Furthermore, adding complexity to nodeNestloop.c would create a run-time
penalty even for plans where this whole consideration is irrelevant.
So let's just reject such paths instead.

Per fuzz testing by Andreas Seltenreich; the added regression test is based
on his example query.  Back-patch to 9.2, like the previous patch.
2015-08-04 14:55:50 -04:00
Tom Lane f69b4b9495 Fix some planner issues with degenerate outer join clauses.
An outer join clause that didn't actually reference the RHS (perhaps only
after constant-folding) could confuse the join order enforcement logic,
leading to wrong query results.  Also, nested occurrences of such things
could trigger an Assertion that on reflection seems incorrect.

Per fuzz testing by Andreas Seltenreich.  The practical use of such cases
seems thin enough that it's not too surprising we've not heard field
reports about it.

This has been broken for a long time, so back-patch to all active branches.
2015-08-01 20:57:41 -04:00
Tom Lane dea1491ffb Teach predtest.c that "foo" implies "foo IS NOT NULL".
Per complaint from Peter Holzer.  It's useful to cover this special case,
since for a boolean variable "foo", earlier parts of the planner will have
reduced variants like "foo = true" to just "foo", and thus we may fail
to recognize the applicability of a partial index with predicate
"foo IS NOT NULL".

Back-patch to 9.5, but not further; given the lack of previous complaints
this doesn't seem like behavior to change in stable branches.
2015-08-01 14:31:46 -04:00
Tom Lane a6492ff897 Fix an oversight in checking whether a join with LATERAL refs is legal.
In many cases, we can implement a semijoin as a plain innerjoin by first
passing the righthand-side relation through a unique-ification step.
However, one of the cases where this does NOT work is where the RHS has
a LATERAL reference to the LHS; that makes the RHS dependent on the LHS
so that unique-ification is meaningless.  joinpath.c understood this,
and so would not generate any join paths of this kind ... but join_is_legal
neglected to check for the case, so it would think that we could do it.
The upshot would be a "could not devise a query plan for the given query"
failure once we had failed to generate any join paths at all for the bogus
join pair.

Back-patch to 9.3 where LATERAL was added.
2015-07-31 19:26:33 -04:00
Tom Lane 8693ebe37d Avoid some zero-divide hazards in the planner.
Although I think on all modern machines floating division by zero
results in Infinity not SIGFPE, we still don't want infinities
running around in the planner's costing estimates; too much risk
of that leading to insane behavior.

grouping_planner() failed to consider the possibility that final_rel
might be known dummy and hence have zero rowcount.  (I wonder if it
would be better to set a rows estimate of 1 for dummy relations?
But at least in the back branches, changing this convention seems
like a bad idea, so I'll leave that for another day.)

Make certain that get_variable_numdistinct() produces a nonzero result.
The case that can be shown to be broken is with stadistinct < 0.0 and
small ntuples; we did not prevent the result from rounding to zero.
For good luck I applied clamp_row_est() to all the nonconstant return
values.

In ExecChooseHashTableSize(), Assert that we compute positive nbuckets
and nbatch.  I know of no reason to think this isn't the case, but it
seems like a good safety check.

Per reports from Piotr Stefaniak.  Back-patch to all active branches.
2015-07-30 12:11:23 -04:00
Robert Haas f04ce31475 Fix incorrect comment.
Amit Langote
2015-07-29 16:47:12 -04:00
Tom Lane 95f4e59c32 Remove an unsafe Assert, and explain join_clause_is_movable_into() better.
join_clause_is_movable_into() is approximate, in the sense that it might
sometimes return "false" when actually it would be valid to push the given
join clause down to the specified level.  This is okay ... but there was
an Assert in get_joinrel_parampathinfo() that's only safe if the answers
are always exact.  Comment out the Assert, and add a bunch of commentary
to clarify what's going on.

Per fuzz testing by Andreas Seltenreich.  The added regression test is
a pretty silly query, but it's based on his crasher example.

Back-patch to 9.2 where the faulty logic was introduced.
2015-07-28 13:20:39 -04:00
Tom Lane fca8e59c1c Fix oversight in flattening of subqueries with empty FROM.
I missed a restriction that commit f4abd0241d
should have enforced: we can't pull up an empty-FROM subquery if it's under
an outer join, because then we'd need to wrap its output columns in
PlaceHolderVars.  As the code currently stands, the PHVs end up with empty
relid sets, which doesn't work (and is correctly caught by an Assert).

It's possible that this could be fixed by assigning the PHVs the relid
sets of the parent FromExpr/JoinExpr, but getting that to work is more
complication than I care to add right now; indeed it's likely that
we'll never bother, since pulling up empty-FROM subqueries is a rather
marginal optimization anyway.

Per report from Andreas Seltenreich.  Back-patch to 9.5 where the faulty
code was added.
2015-07-26 17:44:27 -04:00
Tom Lane 358eaa01bf Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL).  When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount.  Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist.  (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)

In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.

Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.

Per report from Andreas Seltenreich.  Back-patch to 9.2.  Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist.  It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 16:19:08 -04:00
Andres Freund 159cff58cf Check the relevant index element in ON CONFLICT unique index inference.
ON CONFLICT unique index inference had a thinko that could affect cases
where the user-supplied inference clause required that an attribute
match a particular (user specified) collation and/or opclass.

infer_collation_opclass_match() has to check for opclass and/or
collation matches and that the attribute is in the list of attributes or
expressions known to be in the definition of the index under
consideration. The bug was that these two conditions weren't necessarily
evaluated for the same index attribute.

Author: Peter Geoghegan
Discussion: CAM3SWZR4uug=WvmGk7UgsqHn2MkEzy9YU-+8jKGO4JPhesyeWg@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced
2015-07-26 18:20:41 +02:00
Andres Freund 61444bfb80 Allow to push down clauses from HAVING to WHERE when grouping sets are used.
Previously we disallowed pushing down quals to WHERE in the presence of
grouping sets. That's overly restrictive.

We now instead copy quals to WHERE if applicable, leaving the
one in HAVING in place. That's because, at that stage of the planning
process, it's nontrivial to determine if it's safe to remove the one in
HAVING.

Author: Andrew Gierth
Discussion: 874mkt3l59.fsf@news-spur.riddles.org.uk
Backpatch: 9.5, where grouping sets were introduced. This isn't exactly
    a bugfix, but it seems better to keep the branches in sync at this point.
2015-07-26 16:50:20 +02:00
Andres Freund e6d8cb77c0 Recognize GROUPING() as a aggregate expression.
Previously GROUPING() was not recognized as a aggregate expression,
erroneously allowing the planner to move it from HAVING to WHERE.

Author: Jeevan Chalke
Reviewed-By: Andrew Gierth
Discussion: CAM2+6=WG9omG5rFOMAYBweJxmpTaapvVp5pCeMrE6BfpCwr4Og@mail.gmail.com
Backpatch: 9.5, where grouping sets were introduced
2015-07-26 16:50:02 +02:00
Andres Freund 144666f65b Build column mapping for grouping sets in all required cases.
The previous coding frequently failed to fail because for one it's
unusual to have rollup clauses with one column, and for another
sometimes the wrong mapping didn't cause obvious problems.

Author: Jeevan Chalke
Reviewed-By: Andrew Gierth
Discussion: CAM2+6=W=9=hQOipH0HAPbkun3Z3TFWij_EiHue0_6UX=oR=1kw@mail.gmail.com
Backpatch: 9.5, where grouping sets were introduced
2015-07-26 16:46:27 +02:00
Tom Lane dd7a8f66ed Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM.  (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.)  Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type.  This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.

Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.

Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.

Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
Joe Conway b26e3d660d Make RLS work with UPDATE ... WHERE CURRENT OF
UPDATE ... WHERE CURRENT OF would not work in conjunction with
RLS. Arrange to allow the CURRENT OF expression to be pushed down.
Issue noted by Peter Geoghegan. Patch by Dean Rasheed. Back patch
to 9.5 where RLS was introduced.
2015-07-24 12:55:30 -07:00
Tom Lane 46d0a9bfac Fix add_rte_to_flat_rtable() for recent feature additions.
The TABLESAMPLE and row security patches each overlooked this function,
though their errors of omission were opposite: RLS failed to zero out the
securityQuals field, leading to wasteful copying of useless expression
trees in finished plans, while TABLESAMPLE neglected to add a comment
saying that it intentionally *isn't* deleting the tablesample subtree.
There probably should be a similar comment about ctename, too.

Back-patch as appropriate.
2015-07-21 20:03:58 -04:00
Magnus Hagander 828df727a6 Fix spelling error
David Rowley
2015-07-16 10:31:58 +03:00
Andres Freund 3ed26e5f87 For consistency add a pfree to ON CONFLICT set_plan_refs code.
Backpatch to 9.5 where ON CONFLICT was introduced.

Author: Peter Geoghegan
2015-07-12 22:18:57 +02:00
Heikki Linnakangas 7845db2aa7 Fix typo in comment
Etsuro Fujita
2015-06-27 10:17:42 +03:00
Robert Haas 5ca611841b Improve handling of CustomPath/CustomPlan(State) children.
Allow CustomPath to have a list of paths, CustomPlan a list of plans,
and CustomPlanState a list of planstates known to the core system, so
that custom path/plan providers can more reasonably use this
infrastructure for nodes with multiple children.

KaiGai Kohei, per a design suggestion from Tom Lane, with some
further kibitzing by me.
2015-06-26 09:40:47 -04:00
Robert Haas 51d0fe5d56 Update get_relation_info comment.
Thomas Munro
2015-06-23 10:09:53 -04:00
Tom Lane 2cb9ec1bcb Improve inheritance_planner()'s performance for large inheritance sets.
Commit c03ad5602f introduced a planner
performance regression for UPDATE/DELETE on large inheritance sets.
It required copying the append_rel_list (which is of size proportional to
the number of inherited tables) once for each inherited table, thus
resulting in O(N^2) time and memory consumption.  While it's difficult to
avoid that in general, the extra work only has to be done for
append_rel_list entries that actually reference subquery RTEs, which
inheritance-set entries will not.  So we can buy back essentially all of
the loss in cases without subqueries in FROM; and even for those, the added
work is mainly proportional to the number of UNION ALL subqueries.

Back-patch to 9.2, like the previous commit.

Tom Lane and Dean Rasheed, per a complaint from Thomas Munro.
2015-06-22 18:53:27 -04:00
Tom Lane 3b0f77601b Fix some questionable edge-case behaviors in add_path() and friends.
add_path_precheck was doing exact comparisons of path costs, but it really
needs to do them fuzzily to be sure it won't reject paths that could
survive add_path's comparisons.  (This can only matter if the initial cost
estimate is very close to the final one, but that turns out to often be
true.)

Also, it should ignore startup cost for this purpose if and only if
compare_path_costs_fuzzily would do so.  The previous coding always ignored
startup cost for parameterized paths, which is wrong as of commit
3f59be836c555fa6; it could result in improper early rejection of paths that
we care about for SEMI/ANTI joins.  It also always considered startup cost
for unparameterized paths, which is just as wrong though the only effect is
to waste planner cycles on paths that can't survive.  Instead, it should
consider startup cost only when directed to by the consider_startup/
consider_param_startup relation flags.

Likewise, compare_path_costs_fuzzily should have symmetrical behavior
for parameterized and unparameterized paths.  In this case, the best
answer seems to be that after establishing that total costs are fuzzily
equal, we should compare startup costs whether or not the consider_xxx
flags are on.  That is what it's always done for unparameterized paths,
so let's make the behavior for parameterized  paths match.

These issues were noted while developing the SEMI/ANTI join costing fix
of commit 3f59be836c, but we chose not to back-patch these fixes,
because they can cause changes in the planner's choices among
nearly-same-cost plans.  (There is in fact one minor change in plan choice
within the core regression tests.)  Destabilizing plan choices in back
branches without very clear improvements is frowned on, so we'll just fix
this in HEAD.
2015-06-03 18:02:39 -04:00
Tom Lane 3f59be836c Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.
When the inner side of a nestloop SEMI or ANTI join is an indexscan that
uses all the join clauses as indexquals, it can be presumed that both
matched and unmatched outer rows will be processed very quickly: for
matched rows, we'll stop after fetching one row from the indexscan, while
for unmatched rows we'll have an indexscan that finds no matching index
entries, which should also be quick.  The planner already knew about this,
but it was nonetheless charging for at least one full run of the inner
indexscan, as a consequence of concerns about the behavior of materialized
inner scans --- but those concerns don't apply in the fast case.  If the
inner side has low cardinality (many matching rows) this could make an
indexscan plan look far more expensive than it actually is.  To fix,
rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we
don't add the inner scan cost until we've inspected the indexquals, and
then we can add either the full-run cost or just the first tuple's cost as
appropriate.

Experimentation with this fix uncovered another problem: add_path and
friends were coded to disregard cheap startup cost when considering
parameterized paths.  That's usually okay (and desirable, because it thins
the path herd faster); but in this fast case for SEMI/ANTI joins, it could
result in throwing away the desired plain indexscan path in favor of a
bitmap scan path before we ever get to the join costing logic.  In the
many-matching-rows cases of interest here, a bitmap scan will do a lot more
work than required, so this is a problem.  To fix, add a per-relation flag
consider_param_startup that works like the existing consider_startup flag,
but applies to parameterized paths, and set it for relations that are the
inside of a SEMI or ANTI join.

To make this patch reasonably safe to back-patch, care has been taken to
avoid changing the planner's behavior except in the very narrow case of
SEMI/ANTI joins with inner indexscans.  There are places in
compare_path_costs_fuzzily and add_path_precheck that are not terribly
consistent with the new approach, but changing them will affect planner
decisions at the margins in other cases, so we'll leave that for a
HEAD-only fix.

Back-patch to 9.3; before that, the consider_startup flag didn't exist,
meaning that the second aspect of the patch would be too invasive.

Per a complaint from Peter Holzer and analysis by Tomas Vondra.
2015-06-03 11:59:10 -04:00
Tom Lane 2aa0476dc3 Manual cleanup of pgindent results.
Fix some places where pgindent did silly stuff, often because project
style wasn't followed to begin with.  (I've not touched the atomics
headers, though.)
2015-05-24 15:04:10 -04:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Andres Freund 631d749007 Remove the new UPSERT command tag and use INSERT instead.
Previously, INSERT with ON CONFLICT DO UPDATE specified used a new
command tag -- UPSERT.  It was introduced out of concern that INSERT as
a command tag would be a misrepresentation for ON CONFLICT DO UPDATE, as
some affected rows may actually have been updated.

Alvaro Herrera noticed that the implementation of that new command tag
was incomplete; in subsequent discussion we concluded that having it
doesn't provide benefits that are in line with the compatibility breaks
it requires.

Catversion bump due to the removal of PlannedStmt->isUpsert.

Author: Peter Geoghegan
Discussion: 20150520215816.GI5885@postgresql.org
2015-05-23 00:58:45 +02:00
Tom Lane c5dd8ead40 More fixes for lossy-GiST-distance-functions patch.
Paul Ramsey reported that commit 35fcb1b3d0
induced a core dump on commuted ORDER BY expressions, because it was
assuming that the indexorderby expression could be found verbatim in the
relevant equivalence class, but it wasn't there.  We really don't need
anything that complicated anyway; for the data types likely to be used for
index ORDER BY operators in the foreseeable future, the exprType() of the
ORDER BY expression will serve fine.  (The case where we'd have to work
harder is where the ORDER BY expression's result is only binary-compatible
with the declared input type of the ordering operator; long before worrying
about that, one would need to get rid of GiST's hard-wired assumption that
said datatype is float8.)

Aside from fixing that crash and adding a regression test for the case,
I did some desultory code review:

nodeIndexscan.c was likewise overthinking how hard it ought to work to
identify the datatype of the ORDER BY expressions.

Add comments explaining how come nodeIndexscan.c can get away with
simplifying assumptions about NULLS LAST ordering and no backward scan.

Revert no-longer-needed changes of find_ec_member_for_tle(); while the
new definition was no worse than the old, it wasn't better either, and
it might cause back-patching pain.

Revert entirely bogus additions to genam.h.
2015-05-21 19:47:48 -04:00
Heikki Linnakangas 4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Andres Freund 0740cbd759 Refactor ON CONFLICT index inference parse tree representation.
Defer lookup of opfamily and input type of a of a user specified opclass
until the optimizer selects among available unique indexes; and store
the opclass in the parse analyzed tree instead.  The primary reason for
doing this is that for rule deparsing it's easier to use the opclass
than the previous representation.

While at it also rename a variable in the inference code to better fit
it's purpose.

This is separate from the actual fixes for deparsing to make review
easier.
2015-05-19 21:21:27 +02:00
Tom Lane 424661913c Fix failure to copy IndexScan.indexorderbyops in copyfuncs.c.
This oversight results in a crash at executor startup if the plan has
been copied.  outfuncs.c was missed as well.

While we could probably have taught both those files to cope with the
originally chosen representation of an Oid array, it would have been
painful, not least because there'd be no easy way to verify the array
length.  An Oid List is far easier to work with.  And AFAICS, there is
no particular notational benefit to using an array rather than a list
in the existing parts of the patch either.  So just change it to a list.

Error in commit 35fcb1b3d0, which is new,
so no need for back-patch.
2015-05-17 21:22:12 -04:00
Andres Freund f3d3118532 Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.

This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.

The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.

The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.

Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage.  The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting.  The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.

A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.

Needs a catversion bump because stored rules may change.

Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
    Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-16 03:46:31 +02:00
Alvaro Herrera 26df7066cc Move strategy numbers to include/access/stratnum.h
For upcoming BRIN opclasses, it's convenient to have strategy numbers
defined in a single place.  Since there's nothing appropriate, create
it.  The StrategyNumber typedef now lives there, as well as existing
strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from
gist.h).  skey.h is forced to include stratnum.h because of the
StrategyNumber typedef, but gist.h is not; extensions that currently
rely on gist.h for rtree strategy numbers might need to add a new

A few .c files can stop including skey.h and/or gist.h, which is a nice
side benefit.

Per discussion:
https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org

Authored by Emre Hasegeli and Álvaro.

(It's not clear to me why bootscanner.l has any #include lines at all.)
2015-05-15 17:03:16 -03:00
Simon Riggs f6d208d6e5 TABLESAMPLE, SQL Standard and extensible
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.

Petr Jelinek

Reviewed by Michael Paquier and Simon Riggs
2015-05-15 14:37:10 -04:00
Heikki Linnakangas 35fcb1b3d0 Allow GiST distance function to return merely a lower-bound.
The distance function can now set *recheck = false, like index quals. The
executor will then re-check the ORDER BY expressions, and use a queue to
reorder the results on the fly.

This makes it possible to do kNN-searches on polygons and circles, which
don't store the exact value in the index, but just a bounding box.

Alexander Korotkov and me
2015-05-15 14:26:51 +03:00
Andres Freund 4af6e61a36 Fix ON CONFLICT bugs that manifest when used in rules.
Specifically the tlist and rti of the pseudo "excluded" relation weren't
properly treated by expression_tree_walker, which lead to errors when
excluded was referenced inside a rule because the varnos where not
properly adjusted.  Similar omissions in OffsetVarNodes and
expression_tree_mutator had less impact, but should obviously be fixed
nonetheless.

A couple tests of for ON CONFLICT UPDATE into INSERT rule bearing
relations have been added.

In passing I updated a couple comments.
2015-05-13 00:13:22 +02:00
Tom Lane afb9249d06 Add support for doing late row locking in FDWs.
Previously, FDWs could only do "early row locking", that is lock a row as
soon as it's fetched, even though local restriction/join conditions might
discard the row later.  This patch adds callbacks that allow FDWs to do
late locking in the same way that it's done for regular tables.

To make use of this feature, an FDW must support the "ctid" column as a
unique row identifier.  Currently, since ctid has to be of type TID,
the feature is of limited use, though in principle it could be used by
postgres_fdw.  We may eventually allow FDWs to specify another data type
for ctid, which would make it possible for more FDWs to use this feature.

This commit does not modify postgres_fdw to use late locking.  We've
tested some prototype code for that, but it's not in committable shape,
and besides it's quite unclear whether it actually makes sense to do late
locking against a remote server.  The extra round trips required are likely
to outweigh any benefit from improved concurrency.

Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me
2015-05-12 14:10:17 -04:00
Tom Lane 1a8a4e5cde Code review for foreign/custom join pushdown patch.
Commit e7cb7ee145 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments.  Clean up
as follows:

* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function.  In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs.  Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.

* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.

* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that.  Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.

* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries.  The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks.  It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway.  I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.

* Avoid ad-hocery in ExecAssignScanProjectionInfo.  It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.

* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.

* Lots of cleanup of documentation and missed comments.  Re-order some
code additions into more logical places.
2015-05-10 14:36:36 -04:00
Andres Freund bab64ef9e8 Fix two problems in infer_arbiter_indexes().
The first is a pretty simple bug where a relcache entry is used after
the relation is closed. In this particular situation it does not appear
to have bad consequences unless compiled with RELCACHE_FORCE_RELEASE.

The second is that infer_arbiter_indexes() skipped indexes that aren't
yet valid according to indcheckxmin. That's not required here, because
uniqueness checks don't care about visibility according to an older
snapshot.  While thats not really a bug, it makes things undesirably
non-deterministic.  There is some hope that this explains a test failure
on buildfarm member jaguarundi.

Discussion: 9096.1431102730@sss.pgh.pa.us
2015-05-08 22:28:23 +02:00
Andres Freund 168d5805e4 Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:43:10 +02:00
Andres Freund 2c8f4836db Represent columns requiring insert and update privileges indentently.
Previously, relation range table entries used a single Bitmapset field
representing which columns required either UPDATE or INSERT privileges,
despite the fact that INSERT and UPDATE privileges are separately
cataloged, and may be independently held.  As statements so far required
either insert or update privileges but never both, that was
sufficient. The required permission could be inferred from the top level
statement run.

The upcoming INSERT ... ON CONFLICT UPDATE feature needs to
independently check for both privileges in one statement though, so that
is not sufficient anymore.

Bumps catversion as stored rules change.

Author: Peter Geoghegan
Reviewed-By: Andres Freund
2015-05-08 00:20:46 +02:00
Robert Haas e7cb7ee145 Allow FDWs and custom scan providers to replace joins with scans.
Foreign data wrappers can use this capability for so-called "join
pushdown"; that is, instead of executing two separate foreign scans
and then joining the results locally, they can generate a path which
performs the join on the remote server and then is scanned locally.
This commit does not extend postgres_fdw to take advantage of this
capability; it just provides the infrastructure.

Custom scan providers can use this in a similar way.  Previously,
it was only possible for a custom scan provider to scan a single
relation.  Now, it can scan an entire join tree, provided of course
that it knows how to produce the same results that the join would
have produced if executed normally.

KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.
2015-05-01 08:50:35 -04:00
Stephen Frost dcbf5948e1 Improve qual pushdown for RLS and SB views
The original security barrier view implementation, on which RLS is
built, prevented all non-leakproof functions from being pushed down to
below the view, even when the function was not receiving any data from
the view.  This optimization improves on that situation by, instead of
checking strictly for non-leakproof functions, it checks for Vars being
passed to non-leakproof functions and allows functions which do not
accept arguments or whose arguments are not from the current query level
(eg: constants can be particularly useful) to be pushed down.

As discussed, this does mean that a function which is pushed down might
gain some idea that there are rows meeting a certain criteria based on
the number of times the function is called, but this isn't a
particularly new issue and the documentation in rules.sgml already
addressed similar covert-channel risks.  That documentation is updated
to reflect that non-leakproof functions may be pushed down now, if
they meet the above-described criteria.

Author: Dean Rasheed, with a bit of rework to make things clearer,
along with comment and documentation updates from me.
2015-04-27 12:29:42 -04:00
Andres Freund 2e3ca04e2e Also correct therefor to therefore.
Since both forms are arguably legal I wasn't sure about changing
this. But then Tom argued for 'therefore'...

Author: Dmitriy Olshevskiy
Discussion: 34789.1430067832@sss.pgh.pa.us
2015-04-26 19:05:39 +02:00
Tom Lane 3cf8686014 Prevent improper reordering of antijoins vs. outer joins.
An outer join appearing within the RHS of an antijoin can't commute with
the antijoin, but somehow I missed teaching make_outerjoininfo() about
that.  In Teodor Sigaev's recent trouble report, this manifests as a
"could not find RelOptInfo for given relids" error within eqjoinsel();
but I think silently wrong query results are possible too, if the planner
misorders the joins and doesn't happen to trigger any internal consistency
checks.  It's broken as far back as we had antijoins, so back-patch to all
supported branches.
2015-04-25 16:44:27 -04:00
Tom Lane 70d44dd9de Fix obsolete comment in set_rel_size().
The cross-reference to set_append_rel_pathlist() was obsoleted by
commit e2fa76d80b, which split what
had been set_rel_pathlist() and child routines into two sets of
functions.  But I (tgl) evidently missed updating this comment.

Back-patch to 9.2 to avoid unnecessary divergence among branches.

Amit Langote
2015-04-24 15:18:07 -04:00
Stephen Frost 0bf22e0c8b RLS fixes, new hooks, and new test module
In prepend_row_security_policies(), defaultDeny was always true, so if
there were any hook policies, the RLS policies on the table would just
get discarded.  Fixed to start off with defaultDeny as false and then
properly set later if we detect that only the default deny policy exists
for the internal policies.

The infinite recursion detection in fireRIRrules() didn't properly
manage the activeRIRs list in the case of WCOs, so it would incorrectly
report infinite recusion if the same relation with RLS appeared more
than once in the rtable, for example "UPDATE t ... FROM t ...".

Further, the RLS expansion code in fireRIRrules() was handling RLS in
the main loop through the rtable, which lead to RTEs being visited twice
if they contained sublink subqueries, which
prepend_row_security_policies() attempted to handle by exiting early if
the RTE already had securityQuals.  That doesn't work, however, since
if the query involved a security barrier view on top of a table with
RLS, the RTE would already have securityQuals (from the view) by the
time fireRIRrules() was invoked, and so the table's RLS policies would
be ignored.  This is fixed in fireRIRrules() by handling RLS in a
separate loop at the end, after dealing with any other sublink
subqueries, thus ensuring that each RTE is only visited once for RLS
expansion.

The inheritance planner code didn't correctly handle non-target
relations with RLS, which would get turned into subqueries during
planning. Thus an update of the form "UPDATE t1 ... FROM t2 ..." where
t1 has inheritance and t2 has RLS quals would fail.  Fix by making sure
to copy in and update the securityQuals when they exist for non-target
relations.

process_policies() was adding WCOs to non-target relations, which is
unnecessary, and could lead to a lot of wasted time in the rewriter and
the planner. Fix by only adding WCO policies when working on the result
relation.  Also in process_policies, we should be copying the USING
policies to the WITH CHECK policies on a per-policy basis, fix by moving
the copying up into the per-policy loop.

Lastly, as noted by Dean, we were simply adding policies returned by the
hook provided to the list of quals being AND'd, meaning that they would
actually restrict records returned and there was no option to have
internal policies and hook-based policies work together permissively (as
all internal policies currently work).  Instead, explicitly add support
for both permissive and restrictive policies by having a hook for each
and combining the results appropriately.  To ensure this is all done
correctly, add a new test module (test_rls_hooks) to test the various
combinations of internal, permissive, and restrictive hook policies.

Largely from Dean Rasheed (thanks!):

CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com

Author: Dean Rasheed, though I added the new hooks and test module.
2015-04-22 12:01:06 -04:00
Stephen Frost 4ccc5bd28e Pull in tableoid for inheiritance with rowMarks
As noted by Etsuro Fujita [1] and Dean Rasheed[2],
cb1ca4d800 changed ExecBuildAuxRowMark()
to always look for the tableoid in the target list, but didn't also
change preprocess_targetlist() to always include the tableoid.  This
resulted in errors with soon-to-be-added RLS with inheritance tests,
and errors when using inheritance with foreign tables.

Authors: Etsuro Fujita and Dean Rasheed (independently)

Minor word-smithing on the comments by me.

[1] 552CF0B6.8010006@lab.ntt.co.jp
[2] CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com
2015-04-22 11:29:35 -04:00
Tom Lane ca6805338f Fix incorrect matching of subexpressions in outer-join plan nodes.
Previously we would re-use input subexpressions in all expression trees
attached to a Join plan node.  However, if it's an outer join and the
subexpression appears in the nullable-side input, this is potentially
incorrect for apparently-matching subexpressions that came from above
the outer join (ie, targetlist and qpqual expressions), because the
executor will treat the subexpression value as NULL when maybe it should
not be.

The case is fairly hard to hit because (a) you need a non-strict
subexpression (else NULL is correct), and (b) we don't usually compute
expressions in the outputs of non-toplevel plan nodes.  But we might do
so if the expressions are sort keys for a mergejoin, for example.

Probably in the long run we should make a more explicit distinction between
Vars appearing above and below an outer join, but that will be a major
planner redesign and not at all back-patchable.  For the moment, just hack
set_join_references so that it will not match any non-Var expressions
coming from nullable inputs to expressions that came from above the join.
(This is somewhat overkill, in that a strict expression could still be
matched, but it doesn't seem worth the effort to check that.)

Per report from Qingqing Zhou.  The added regression test case is based
on his example.

This has been broken for a very long time, so back-patch to all active
branches.
2015-04-04 19:55:15 -04:00
Heikki Linnakangas d04c8ed904 Add support for index-only scans in GiST.
This adds a new GiST opclass method, 'fetch', which is used to reconstruct
the original Datum from the value stored in the index. Also, the 'canreturn'
index AM interface function gains a new 'attno' argument. That makes it
possible to use index-only scans on a multi-column index where some of the
opclasses support index-only scans but some do not.

This patch adds support in the box and point opclasses. Other opclasses
can added later as follow-on patches (btree_gist would be particularly
interesting).

Anastasia Lubennikova, with additional fixes and modifications by me.
2015-03-26 19:12:00 +02:00
Tom Lane cb1ca4d800 Allow foreign tables to participate in inheritance.
Foreign tables can now be inheritance children, or parents.  Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.

As side effects of this, allow foreign tables to have NOT VALID CHECK
constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to
accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS.  Continuing to
disallow these things would've required bizarre and inconsistent special
cases in inheritance behavior.  Since foreign tables don't enforce CHECK
constraints anyway, a NOT VALID one is a complete no-op, but that doesn't
mean we shouldn't allow it.  And it's possible that some FDWs might have
use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops
for most.

An additional change in support of this is that when a ModifyTable node
has multiple target tables, they will all now be explicitly identified
in EXPLAIN output, for example:

 Update on pt1  (cost=0.00..321.05 rows=3541 width=46)
   Update on pt1
   Foreign Update on ft1
   Foreign Update on ft2
   Update on child3
   ->  Seq Scan on pt1  (cost=0.00..0.00 rows=1 width=46)
   ->  Foreign Scan on ft1  (cost=100.00..148.03 rows=1170 width=46)
   ->  Foreign Scan on ft2  (cost=100.00..148.03 rows=1170 width=46)
   ->  Seq Scan on child3  (cost=0.00..25.00 rows=1200 width=46)

This was done mainly to provide an unambiguous place to attach "Remote SQL"
fields, but it is useful for inherited updates even when no foreign tables
are involved.

Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me
2015-03-22 13:53:21 -04:00
Tom Lane 7b8b8a4331 Improve representation of PlanRowMark.
This patch fixes two inadequacies of the PlanRowMark representation.

First, that the original LockingClauseStrength isn't stored (and cannot be
inferred for foreign tables, which always get ROW_MARK_COPY).  Since some
PlanRowMarks are created out of whole cloth and don't actually have an
ancestral RowMarkClause, this requires adding a dummy LCS_NONE value to
enum LockingClauseStrength, which is fairly annoying but the alternatives
seem worse.  This fix allows getting rid of the use of get_parse_rowmark()
in FDWs (as per the discussion around commits 462bd95705 and
8ec8760fc8), and it simplifies some things elsewhere.

Second, that the representation assumed that all child tables in an
inheritance hierarchy would use the same RowMarkType.  That's true today
but will soon not be true.  We add an "allMarkTypes" field that identifies
the union of mark types used in all a parent table's children, and use
that where appropriate (currently, only in preprocess_targetlist()).

In passing fix a couple of minor infelicities left over from the SKIP
LOCKED patch, notably that _outPlanRowMark still thought waitPolicy
is a bool.

Catversion bump is required because the numeric values of enum
LockingClauseStrength can appear in on-disk rules.

Extracted from a much larger patch to support foreign table inheritance;
it seemed worth breaking this out, since it's a separable concern.

Shigeru Hanada and Etsuro Fujita, somewhat modified by me
2015-03-15 18:41:47 -04:00
Tom Lane f4abd0241d Support flattening of empty-FROM subqueries and one-row VALUES tables.
We can't handle this in the general case due to limitations of the
planner's data representations; but we can allow it in many useful cases,
by being careful to flatten only when we are pulling a single-row subquery
up into a FROM (or, equivalently, inner JOIN) node that will still have at
least one remaining relation child.  Per discussion of an example from
Kyotaro Horiguchi.
2015-03-11 23:18:03 -04:00
Tom Lane b746d0c32d Fix old bug in get_loop_count().
While poking at David Kubečka's issue I noticed an ancient logic error
in get_loop_count(): it used 1.0 as a "no data yet" indicator, but since
that is actually a valid rowcount estimate, this doesn't work.  If we
have one input relation with 1.0 as rowcount and then another one with
a larger rowcount, we should use 1.0 as the result, but we picked the
larger rowcount instead.  (I think when I coded this, I recognized the
conflict, but mistakenly thought that the logic would pick the desired
count anyway.)

Fixing this changed the plan for one existing regression test case.
Since the point of that test is to exercise creation of a particular
shape of nestloop plan, I tweaked the query a little bit so it still
results in the same plan choice.

This is definitely a bug, but I'm hesitant to back-patch since it might
change plan choices unexpectedly, and anyway failure to implement a
heuristic precisely as intended is a pretty low-grade bug.
2015-03-11 22:53:32 -04:00
Tom Lane b55722692b Improve planner's cost estimation in the presence of semijoins.
If we have a semijoin, say
	SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)
and we're estimating the cost of a parameterized indexscan on x, the number
of repetitions of the indexscan should not be taken as the size of y; it'll
really only be the number of distinct values of y1, because the only valid
plan with y on the outside of a nestloop would require y to be unique-ified
before joining it to x.  Most of the time this doesn't make that much
difference, but sometimes it can lead to drastically underestimating the
cost of the indexscan and hence choosing a bad plan, as pointed out by
David Kubečka.

Fixing this is a bit difficult because parameterized indexscans are costed
out quite early in the planning process, before we have the information
that would be needed to call estimate_num_groups() and thereby estimate the
number of distinct values of the join column(s).  However we can move the
code that extracts a semijoin RHS's unique-ification columns, so that it's
done in initsplan.c rather than on-the-fly in create_unique_path().  That
shouldn't make any difference speed-wise and it's really a bit cleaner too.

The other bit of information we need is the size of the semijoin RHS,
which is easy if it's a single relation (we make those estimates before
considering indexscan costs) but problematic if it's a join relation.
The solution adopted here is just to use the product of the sizes of the
join component rels.  That will generally be an overestimate, but since
estimate_num_groups() only uses this input as a clamp, an overestimate
shouldn't hurt us too badly.  In any case we don't allow this new logic
to produce a value larger than we would have chosen before, so that at
worst an overestimate leaves us no wiser than we were before.
2015-03-11 21:21:00 -04:00
Robert Haas bc93ac12c2 Require non-NULL pstate for all addRangeTableEntryFor* functions.
Per discussion, it's better to have a consistent coding rule here.

Michael Paquier, per a node from Greg Stark referencing an old post
from Tom Lane.
2015-03-11 15:26:43 -04:00
Tom Lane 497bac7d29 Fix long-obsolete code for separating filter conditions in cost_index().
This code relied on pointer equality to identify which restriction clauses
also appear in the indexquals (and, therefore, don't need to be applied as
simple filter conditions).  That was okay once upon a time, years ago,
before we introduced the equivalence-class machinery.  Now there's about a
50-50 chance that an equality clause appearing in the indexquals will be
the mirror image (commutator) of its mate in the restriction list.  When
that happens, we'd erroneously think that the clause would be re-evaluated
at each visited row, and therefore inflate the cost estimate for the
indexscan by the clause's cost.

Add some logic to catch this case.  It seems to me that it continues not to
be worthwhile to expend the extra predicate-proof work that createplan.c
will do on the finally-selected plan, but this case is common enough and
cheap enough to handle that we should do so.

This will make a small difference (about one cpu_operator_cost per row)
in simple cases; but in situations where there's an expensive function in
the indexquals, it can make a very large difference, as seen in recent
example from Jeff Janes.

This is a long-standing bug, but I'm hesitant to back-patch because of the
possibility of destabilizing plan choices that people may be happy with.
2015-03-03 21:19:42 -05:00
Stephen Frost ee4ddcb38a Fix targetRelation initializiation in prepsecurity
In 6f9bd50eab, we modified
expand_security_quals() to tell expand_security_qual() about when the
current RTE was the targetRelation.  Unfortunately, that commit
initialized the targetRelation variable used outside of the loop over
the RTEs instead of at the start of it.

This patch moves the variable and the initialization of it into the
loop, where it should have been to begin with.

Pointed out by Dean Rasheed.

Back-patch to 9.4 as the original commit was.
2015-03-01 15:27:26 -05:00
Tom Lane b514a7460d Fix planning of star-schema-style queries.
Part of the intent of the parameterized-path mechanism was to handle
star-schema queries efficiently, but some overly-restrictive search
limiting logic added in commit e2fa76d80b
prevented such cases from working as desired.  Fix that and add a
regression test about it.  Per gripe from Marc Cousin.

This is arguably a bug rather than a new feature, so back-patch to 9.2
where parameterized paths were introduced.
2015-02-28 12:43:04 -05:00
Stephen Frost 6f9bd50eab Add locking clause for SB views for update/delete
In expand_security_qual(), we were handling locking correctly when a
PlanRowMark existed, but not when we were working with the target
relation (which doesn't have any PlanRowMarks, but the subquery created
for the security barrier quals still needs to lock the rows under it).

Noted by Etsuro Fujita when working with the Postgres FDW, which wasn't
properly issuing a SELECT ... FOR UPDATE to the remote side under a
DELETE.

Back-patch to 9.4 where updatable security barrier views were
introduced.

Per discussion with Etsuro and Dean Rasheed.
2015-02-25 21:36:29 -05:00
Tom Lane c063da1769 Add parse location fields to NullTest and BooleanTest structs.
We did not need a location tag on NullTest or BooleanTest before, because
no error messages referred directly to their locations.  That's planned
to change though, so add these fields in a separate housekeeping commit.

Catversion bump because stored rules may change.
2015-02-22 14:40:27 -05:00
Tom Lane e1a11d9311 Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].
This requires changing quite a few places that were depending on
sizeof(HeapTupleHeaderData), but it seems for the best.

Michael Paquier, some adjustments by me
2015-02-21 15:13:06 -05:00
Tom Lane f2874feb7c Some more FLEXIBLE_ARRAY_MEMBER fixes. 2015-02-21 01:46:43 -05:00
Tom Lane abe45a9b31 Fix EXPLAIN output for cases where parent table is excluded by constraints.
The previous coding in EXPLAIN always labeled a ModifyTable node with the
name of the target table affected by its first child plan.  When originally
written, this was necessarily the parent table of the inheritance tree,
so everything was unconfusing.  But when we added NO INHERIT constraints,
it became possible for the parent table to be deleted from the plan by
constraint exclusion while still leaving child tables present.  This led to
the ModifyTable plan node being labeled with the first surviving child,
which was deemed confusing.  Fix it by retaining the parent table's RT
index in a new field in ModifyTable.

Etsuro Fujita, reviewed by Ashutosh Bapat and myself
2015-02-17 18:04:11 -05:00
Tom Lane 1a179f36f7 Fix GEQO to not assume its join order heuristic always works.
Back in commit 400e2c9344 I rewrote GEQO's
gimme_tree function to improve its heuristic for modifying the given tour
into a legal join order.  In what can only be called a fit of hubris,
I supposed that this new heuristic would *always* find a legal join order,
and ripped out the old logic that allowed gimme_tree to sometimes fail.

The folly of this is exposed by bug #12760, in which the "greedy" clumping
behavior of merge_clump() can lead it into a dead end which could only be
recovered from by un-clumping.  We have no code for that and wouldn't know
exactly what to do with it if we did.  Rather than try to improve the
heuristic rules still further, let's just recognize that it *is* a
heuristic and probably must always have failure cases.  So, put back the
code removed in the previous commit to allow for failure (but comment it
a bit better this time).

It's possible that this code was actually fully correct at the time and
has only been broken by the introduction of LATERAL.  But having seen this
example I no longer have much faith in that proposition, so back-patch to
all supported branches.
2015-02-10 20:37:19 -05:00
Tom Lane 75df6dc083 Fix ancient thinko in default table rowcount estimation.
The code used sizeof(ItemPointerData) where sizeof(ItemIdData) is correct,
since we're trying to account for a tuple's line pointer.  Spotted by
Tomonari Katsumata (bug #12584).

Although this mistake is of very long standing, no back-patch, since it's
a relatively harmless error and changing it would risk changing default
planner behavior in stable branches.  (I don't see any change in regression
test outputs here, but the buildfarm may think differently.)
2015-01-18 17:04:11 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Tom Lane 4a14f13a0a Improve hash_create's API for selecting simple-binary-key hash functions.
Previously, if you wanted anything besides C-string hash keys, you had to
specify a custom hashing function to hash_create().  Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and rather
error-prone, since a caller could easily miss the opportunity to optimize
by using hash_uint32 when appropriate.  Replace this with a design whereby
callers using simple binary-data keys just specify HASH_BLOBS and don't
need to mess with specific support functions.  hash_create() itself will
take care of optimizing when the key size is four bytes.

This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.

In future we could look into offering a similar optimized hashing function
for 8-byte keys.  Under this design that could be done in a centralized
and machine-independent fashion, whereas getting it right for keys of
platform-dependent sizes would've been notationally painful before.

For the moment, the old way still works fine, so as not to break source
code compatibility for loadable modules.  Eventually we might want to
remove tag_hash and friends from the exported API altogether, since there's
no real need for them to be explicitly referenced from outside dynahash.c.

Teodor Sigaev and Tom Lane
2014-12-18 13:36:36 -05:00
Tom Lane 462bd95705 Fix planning of SELECT FOR UPDATE on child table with partial index.
Ordinarily we can omit checking of a WHERE condition that matches a partial
index's condition, when we are using an indexscan on that partial index.
However, in SELECT FOR UPDATE we must include the "redundant" filter
condition in the plan so that it gets checked properly in an EvalPlanQual
recheck.  The planner got this mostly right, but improperly omitted the
filter condition if the index in question was on an inheritance child
table.  In READ COMMITTED mode, this could result in incorrectly returning
just-updated rows that no longer satisfy the filter condition.

The cause of the error is using get_parse_rowmark() when get_plan_rowmark()
is what should be used during planning.  In 9.3 and up, also fix the same
mistake in contrib/postgres_fdw.  It's currently harmless there (for lack
of inheritance support) but wrong is wrong, and the incorrect code might
get copied to someplace where it's more significant.

Report and fix by Kyotaro Horiguchi.  Back-patch to all supported branches.
2014-12-11 21:02:25 -05:00
Tom Lane d25367ec4f Add bms_get_singleton_member(), and use it where appropriate.
This patch adds a function that replaces a bms_membership() test followed
by a bms_singleton_member() call, performing both the test and the
extraction of a singleton set's member in one scan of the bitmapset.
The performance advantage over the old way is probably minimal in current
usage, but it seems worthwhile on notational grounds anyway.

David Rowley
2014-11-28 14:16:24 -05:00
Tom Lane f4e031c662 Add bms_next_member(), and use it where appropriate.
This patch adds a way of iterating through the members of a bitmapset
nondestructively, unlike the old way with bms_first_member().  While
bms_next_member() is very slightly slower than bms_first_member()
(at least for typical-size bitmapsets), eliminating the need to palloc
and pfree a temporary copy of the target bitmapset is a significant win.
So this method should be preferred in all cases where a temporary copy
would be necessary.

Tom Lane, with suggestions from Dean Rasheed and David Rowley
2014-11-28 13:37:25 -05:00
Stephen Frost 143b39c185 Rename pg_rowsecurity -> pg_policy and other fixes
As pointed out by Robert, we should really have named pg_rowsecurity
pg_policy, as the objects stored in that catalog are policies.  This
patch fixes that and updates the column names to start with 'pol' to
match the new catalog name.

The security consideration for COPY with row level security, also
pointed out by Robert, has also been addressed by remembering and
re-checking the OID of the relation initially referenced during COPY
processing, to make sure it hasn't changed under us by the time we
finish planning out the query which has been built.

Robert and Alvaro also commented on missing OCLASS and OBJECT entries
for POLICY (formerly ROWSECURITY or POLICY, depending) in various
places.  This patch fixes that too, which also happens to add the
ability to COMMENT on policies.

In passing, attempt to improve the consistency of messages, comments,
and documentation as well.  This removes various incarnations of
'row-security', 'row-level security', 'Row-security', etc, in favor
of 'policy', 'row level security' or 'row_security' as appropriate.

Happy Thanksgiving!
2014-11-27 01:15:57 -05:00
Tom Lane bac27394a1 Support arrays as input to array_agg() and ARRAY(SELECT ...).
These cases formerly failed with errors about "could not find array type
for data type".  Now they yield arrays of the same element type and one
higher dimension.

The implementation involves creating functions with API similar to the
existing accumArrayResult() family.  I (tgl) also extended the base family
by adding an initArrayResult() function, which allows callers to avoid
special-casing the zero-inputs case if they just want an empty array as
result.  (Not all do, so the previous calling convention remains valid.)
This allowed simplifying some existing code in xml.c and plperl.c.

Ali Akbar, reviewed by Pavel Stehule, significantly modified by me
2014-11-25 12:21:28 -05:00
Tom Lane b62f94c603 Allow simplification of EXISTS() subqueries containing LIMIT.
The locution "EXISTS(SELECT ... LIMIT 1)" seems to be rather common among
people who don't realize that the database already performs optimizations
equivalent to putting LIMIT 1 in the sub-select.  Unfortunately, this was
actually making things worse, because it prevented us from optimizing such
EXISTS clauses into semi or anti joins.  Teach simplify_EXISTS_query() to
suppress constant-positive LIMIT clauses.  That fixes the semi/anti-join
case, and may help marginally even for cases that have to be left as
sub-SELECTs.

Marti Raudsepp, reviewed by David Rowley
2014-11-22 19:12:38 -05:00
Tom Lane 9c58101117 Fix mishandling of system columns in FDW queries.
postgres_fdw would send query conditions involving system columns to the
remote server, even though it makes no effort to ensure that system
columns other than CTID match what the remote side thinks.  tableoid,
in particular, probably won't match and might have some use in queries.
Hence, prevent sending conditions that include non-CTID system columns.

Also, create_foreignscan_plan neglected to check local restriction
conditions while determining whether to set fsSystemCol for a foreign
scan plan node.  This again would bollix the results for queries that
test a foreign table's tableoid.

Back-patch the first fix to 9.3 where postgres_fdw was introduced.
Back-patch the second to 9.2.  The code is probably broken in 9.1 as
well, but the patch doesn't apply cleanly there; given the weak state
of support for FDWs in 9.1, it doesn't seem worth fixing.

Etsuro Fujita, reviewed by Ashutosh Bapat, and somewhat modified by me
2014-11-22 16:01:05 -05:00
Tom Lane 447770404c Rearrange CustomScan API.
Make it work more like FDW plans do: instead of assuming that there are
expressions in a CustomScan plan node that the core code doesn't know
about, insist that all subexpressions that need planner attention be in
a "custom_exprs" list in the Plan representation.  (Of course, the
custom plugin can break the list apart again at executor initialization.)
This lets us revert the parts of the patch that exposed setrefs.c and
subselect.c processing to the outside world.

Also revert the GetSpecialCustomVar stuff in ruleutils.c; that concept
may work in future, but it's far from fully baked right now.
2014-11-21 18:21:46 -05:00
Tom Lane c2ea2285e9 Simplify API for initially hooking custom-path providers into the planner.
Instead of register_custom_path_provider and a CreateCustomScanPath
callback, let's just provide a standard function hook in set_rel_pathlist.
This is more flexible than what was previously committed, is more like the
usual conventions for planner hooks, and requires less support code in the
core.  We had discussed this design (including centralizing the
set_cheapest() calls) back in March or so, so I'm not sure why it wasn't
done like this already.
2014-11-21 14:05:46 -05:00
Tom Lane a34fa8ee7c Initial code review for CustomScan patch.
Get rid of the pernicious entanglement between planner and executor headers
introduced by commit 0b03e5951b.

Also, rearrange the CustomFoo struct/typedef definitions so that all the
typedef names are seen as used by the compiler.  Without this pgindent
will mess things up a bit, which is not so important perhaps, but it also
removes a bizarre discrepancy between the declaration arrangement used for
CustomExecMethods and that used for CustomScanMethods and
CustomPathMethods.

Clean up the commentary around ExecSupportsMarkRestore to reflect the
rather large change in its API.

Const-ify register_custom_path_provider's argument.  This necessitates
casting away const in the function, but that seems better than forcing
callers of the function to do so (or else not const-ify their method
pointer structs, which was sort of the whole point).

De-export fix_expr_common.  I don't like the exporting of fix_scan_expr
or replace_nestloop_params either, but this one surely has got little
excuse.
2014-11-20 18:36:07 -05:00
Tom Lane 03e574af5f Fix another oversight in CustomScan patch.
disuse_physical_tlist() must work for all plan types handled by
create_scan_plan().
2014-11-20 14:49:02 -05:00
Robert Haas 0b03e5951b Introduce custom path and scan providers.
This allows extension modules to define their own methods for
scanning a relation, and get the core code to use them.  It's
unclear as yet how much use this capability will find, but we
won't find out if we never commit it.

KaiGai Kohei, reviewed at various times and in various levels
of detail by Shigeru Hanada, Tom Lane, Andres Freund, Álvaro
Herrera, and myself.
2014-11-07 17:34:36 -05:00
Tom Lane a4523c5aa5 Improve planning of btree index scans using ScalarArrayOpExpr quals.
Since we taught btree to handle ScalarArrayOpExpr quals natively (commit
9e8da0f757), the planner has always included
ScalarArrayOpExpr quals in index conditions if possible.  However, if the
qual is for a non-first index column, this could result in an inferior plan
because we can no longer take advantage of index ordering (cf. commit
807a40c551).  It can be better to omit the
ScalarArrayOpExpr qual from the index condition and let it be done as a
filter, so that the output doesn't need to get sorted.  Indeed, this is
true for the query introduced as a test case by the latter commit.

To fix, restructure get_index_paths and build_index_paths so that we
consider paths both with and without ScalarArrayOpExpr quals in non-first
index columns.  Redesign the API of build_index_paths so that it reports
what it found, saving useless second or third calls.

Report and patch by Andrew Gierth (though rather heavily modified by me).
Back-patch to 9.2 where this code was introduced, since the issue can
result in significant performance regressions compared to plans produced
by 9.1 and earlier.
2014-10-26 16:12:22 -04:00
Tom Lane f330a6d140 Fix mishandling of FieldSelect-on-whole-row-Var in nested lateral queries.
If an inline-able SQL function taking a composite argument is used in a
LATERAL subselect, and the composite argument is a lateral reference,
the planner could fail with "variable not found in subplan target list",
as seen in bug #11703 from Karl Bartel.  (The outer function call used in
the bug report and in the committed regression test is not really necessary
to provoke the bug --- you can get it if you manually expand the outer
function into "LATERAL (SELECT inner_function(outer_relation))", too.)

The cause of this is that we generate the reltargetlist for the referenced
relation before doing eval_const_expressions() on the lateral sub-select's
expressions (cf find_lateral_references()), so what's scheduled to be
emitted by the referenced relation is a whole-row Var, not the simplified
single-column Var produced by optimizing the function's FieldSelect on the
whole-row Var.  Then setrefs.c fails to match up that lateral reference to
what's available from the outer scan.

Preserving the FieldSelect optimization in such cases would require either
major planner restructuring (to recursively do expression simplification
on sub-selects much earlier) or some amazingly ugly kluge to change the
reltargetlist of a possibly-already-planned relation.  It seems better
just to skip the optimization when the Var is from an upper query level;
the case is not so common that it's likely anyone will notice a few
wasted cycles.

AFAICT this problem only occurs for uplevel LATERAL references, so
back-patch to 9.3 where LATERAL was added.
2014-10-20 12:23:42 -04:00
Alvaro Herrera df630b0dd5 Implement SKIP LOCKED for row-level locks
This clause changes the behavior of SELECT locking clauses in the
presence of locked rows: instead of causing a process to block waiting
for the locks held by other processes (or raise an error, with NOWAIT),
SKIP LOCKED makes the new reader skip over such rows.  While this is not
appropriate behavior for general purposes, there are some cases in which
it is useful, such as queue-like tables.

Catalog version bumped because this patch changes the representation of
stored rules.

Reviewed by Craig Ringer (based on a previous attempt at an
implementation by Simon Riggs, who also provided input on the syntax
used in the current patch), David Rowley, and Álvaro Herrera.

Author: Thomas Munro
2014-10-07 17:23:34 -03:00
Tom Lane 5a6c168c78 Fix some more problems with nested append relations.
As of commit a87c72915 (which later got backpatched as far as 9.1),
we're explicitly supporting the notion that append relations can be
nested; this can occur when UNION ALL constructs are nested, or when
a UNION ALL contains a table with inheritance children.

Bug #11457 from Nelson Page, as well as an earlier report from Elvis
Pranskevichus, showed that there were still nasty bugs associated with such
cases: in particular the EquivalenceClass mechanism could try to generate
"join" clauses connecting an appendrel child to some grandparent appendrel,
which would result in assertion failures or bogus plans.

Upon investigation I concluded that all current callers of
find_childrel_appendrelinfo() need to be fixed to explicitly consider
multiple levels of parent appendrels.  The most complex fix was in
processing of "broken" EquivalenceClasses, which are ECs for which we have
been unable to generate all the derived equality clauses we would like to
because of missing cross-type equality operators in the underlying btree
operator family.  That code path is more or less entirely untested by
the regression tests to date, because no standard opfamilies have such
holes in them.  So I wrote a new regression test script to try to exercise
it a bit, which turned out to be quite a worthwhile activity as it exposed
existing bugs in all supported branches.

The present patch is essentially the same as far back as 9.2, which is
where parameterized paths were introduced.  In 9.0 and 9.1, we only need
to back-patch a small fragment of commit 5b7b5518d, which fixes failure to
propagate out the original WHERE clauses when a broken EC contains constant
members.  (The regression test case results show that these older branches
are noticeably stupider than 9.2+ in terms of the quality of the plans
generated; but we don't really care about plan quality in such cases,
only that the plan not be outright wrong.  A more invasive fix in the
older branches would not be a good idea anyway from a plan-stability
standpoint.)
2014-10-01 19:31:12 -04:00
Tom Lane 3f6f9260e3 Fix bogus variable-mangling in security_barrier_replace_vars().
This function created new Vars with varno different from varnoold, which
is a condition that should never prevail before setrefs.c does the final
variable-renumbering pass.  The created Vars could not be seen as equal()
to normal Vars, which among other things broke equivalence-class processing
for them.  The consequences of this were indeed visible in the regression
tests, in the form of failure to propagate constants as one would expect.
I stumbled across it while poking at bug #11457 --- after intentionally
disabling join equivalence processing, the security-barrier regression
tests started falling over with fun errors like "could not find pathkey
item to sort", because of failure to match the corrupted Vars to normal
ones.
2014-09-24 15:59:34 -04:00
Stephen Frost 6ef8c658af Process withCheckOption exprs in setrefs.c
While withCheckOption exprs had been handled in many cases by
happenstance, they need to be handled during set_plan_references and
more specifically down in set_plan_refs for ModifyTable plan nodes.
This is to ensure that the opfuncid's are set for operators referenced
in the withCheckOption exprs.

Identified as an issue by Thom Brown

Patch by Dean Rasheed

Back-patch to 9.4, where withCheckOption was introduced.
2014-09-22 20:12:51 -04:00
Stephen Frost 491c029dbc Row-Level Security Policies (RLS)
Building on the updatable security-barrier views work, add the
ability to define policies on tables to limit the set of rows
which are returned from a query and which are allowed to be added
to a table.  Expressions defined by the policy for filtering are
added to the security barrier quals of the query, while expressions
defined to check records being added to a table are added to the
with-check options of the query.

New top-level commands are CREATE/ALTER/DROP POLICY and are
controlled by the table owner.  Row Security is able to be enabled
and disabled by the owner on a per-table basis using
ALTER TABLE .. ENABLE/DISABLE ROW SECURITY.

Per discussion, ROW SECURITY is disabled on tables by default and
must be enabled for policies on the table to be used.  If no
policies exist on a table with ROW SECURITY enabled, a default-deny
policy is used and no records will be visible.

By default, row security is applied at all times except for the
table owner and the superuser.  A new GUC, row_security, is added
which can be set to ON, OFF, or FORCE.  When set to FORCE, row
security will be applied even for the table owner and superusers.
When set to OFF, row security will be disabled when allowed and an
error will be thrown if the user does not have rights to bypass row
security.

Per discussion, pg_dump sets row_security = OFF by default to ensure
that exports and backups will have all data in the table or will
error if there are insufficient privileges to bypass row security.
A new option has been added to pg_dump, --enable-row-security, to
ask pg_dump to export with row security enabled.

A new role capability, BYPASSRLS, which can only be set by the
superuser, is added to allow other users to be able to bypass row
security using row_security = OFF.

Many thanks to the various individuals who have helped with the
design, particularly Robert Haas for his feedback.

Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean
Rasheed, with additional changes and rework by me.

Reviewers have included all of the above, Greg Smith,
Jeff McCormick, and Robert Haas.
2014-09-19 11:18:35 -04:00
Tom Lane 1b4cc493d2 Preserve AND/OR flatness while extracting restriction OR clauses.
The code I added in commit f343a880d5 was
careless about preserving AND/OR flatness: it could create a structure with
an OR node directly underneath another one.  That breaks an assumption
that's fairly important for planning efficiency, not to mention triggering
various Asserts (as reported by Benjamin Smith).  Add a trifle more logic
to handle the case properly.
2014-09-09 18:35:31 -04:00
Stephen Frost e414ba93ad Fix Var handling for security barrier views
In some cases, not all Vars were being correctly marked as having been
modified for updatable security barrier views, which resulted in invalid
plans (eg: when security barrier views were created over top of
inheiritance structures).

In passing, be sure to update both varattno and varonattno, as _equalVar
won't consider the Vars identical otherwise.  This isn't known to cause
any issues with updatable security barrier views, but was noticed as
missing while working on RLS and makes sense to get fixed.

Back-patch to 9.4 where updatable security barrier views were
introduced.
2014-08-26 23:08:41 -04:00
Tom Lane 27048980f5 Re-enable error for "SELECT ... OFFSET -1".
The executor has thrown errors for negative OFFSET values since 8.4 (see
commit bfce56eea4), but in a moment of brain
fade I taught the planner that OFFSET with a constant negative value was a
no-op (commit 1a1832eb08).  Reinstate the
former behavior by only discarding OFFSET with a value of exactly 0.  In
passing, adjust a planner comment that referenced the ancient behavior.

Back-patch to 9.3 where the mistake was introduced.
2014-07-22 13:30:01 -04:00
Tom Lane f15821eefd Allow join removal in some cases involving a left join to a subquery.
We can remove a left join to a relation if the relation's output is
provably distinct for the columns involved in the join clause (considering
only equijoin clauses) and the relation supplies no variables needed above
the join.  Previously, the join removal logic could only prove distinctness
by reference to unique indexes of a table.  This patch extends the logic
to consider subquery relations, wherein distinctness might be proven by
reference to GROUP BY, DISTINCT, etc.

We actually already had some code to check that a subquery's output was
provably distinct, but it was hidden inside pathnode.c; which was a pretty
bad place for it really, since that file is mostly boilerplate Path
construction and comparison.  Move that code to analyzejoins.c, which is
arguably a more appropriate location, and is certainly the site of the
new usage for it.

David Rowley, reviewed by Simon Riggs
2014-07-15 21:12:43 -04:00
Tom Lane 9e2f2d7a05 Don't assume a subquery's output is unique if there's a SRF in its tlist.
While the x output of "select x from t group by x" can be presumed unique,
this does not hold for "select x, generate_series(1,10) from t group by x",
because we may expand the set-returning function after the grouping step.
(Perhaps that should be re-thought; but considering all the other oddities
involved with SRFs in targetlists, it seems unlikely we'll change it.)
Put a check in query_is_distinct_for() so it's not fooled by such cases.

Back-patch to all supported branches.

David Rowley
2014-07-08 14:03:56 -04:00
Tom Lane d222585a9f Allow pushdown of WHERE quals into subqueries with window functions.
We can allow this even without any specific knowledge of the semantics
of the window function, so long as pushed-down quals will either accept
every row in a given window partition, or reject every such row.  Because
window functions act only within a partition, such a case can't result
in changing the window functions' outputs for any surviving row.
Eliminating entire partitions in this way obviously can reduce the cost
of the window-function computations substantially.

The fly in the ointment is that it's hard to be entirely sure whether
this is true for an arbitrary qual condition.  This patch allows pushdown
if (a) the qual references only partitioning columns, and (b) the qual
contains no volatile functions.  We are at risk of incorrect results if
the qual can produce different answers for values that the partitioning
equality operator sees as equal.  While it's not hard to invent cases
for which that can happen, it seems to seldom be a problem in practice,
since no one has complained about a similar assumption that we've had
for many years with respect to DISTINCT.  The potential performance
gains seem to be worth the risk.

David Rowley, reviewed by Vik Fearing; some credit is due also to
Thomas Mayer who did considerable preliminary investigation.
2014-06-27 23:08:08 -07:00
Tom Lane 1147035203 Disallow pushing volatile qual expressions down into DISTINCT subqueries.
A WHERE clause applied to the output of a subquery with DISTINCT should
theoretically be applied only once per distinct row; but if we push it
into the subquery then it will be evaluated at each row before duplicate
elimination occurs.  If the qual is volatile this can give rise to
observably wrong results, so don't do that.

While at it, refactor a little bit to allow subquery_is_pushdown_safe
to report more than one kind of restrictive condition without indefinitely
expanding its argument list.

Although this is a bug fix, it seems unwise to back-patch it into released
branches, since it might de-optimize plans for queries that aren't giving
any trouble in practice.  So apply to 9.4 but not further back.
2014-06-27 11:08:48 -07:00
Tom Lane 8f889b1083 Implement UPDATE tab SET (col1,col2,...) = (SELECT ...), ...
This SQL-standard feature allows a sub-SELECT yielding multiple columns
(but only one row) to be used to compute the new values of several columns
to be updated.  While the same results can be had with an independent
sub-SELECT per column, such a workaround can require a great deal of
duplicated computation.

The standard actually says that the source for a multi-column assignment
could be any row-valued expression.  The implementation used here is
tightly tied to our existing sub-SELECT support and can't handle other
cases; the Bison grammar would have some issues with them too.  However,
I don't feel too bad about this since other cases can be converted into
sub-SELECTs.  For instance, "SET (a,b,c) = row_valued_function(x)" could
be written "SET (a,b,c) = (SELECT * FROM row_valued_function(x))".
2014-06-18 13:22:34 -04:00
Tom Lane 2146f13408 Avoid recursion when processing simple lists of AND'ed or OR'ed clauses.
Since most of the system thinks AND and OR are N-argument expressions
anyway, let's have the grammar generate a representation of that form when
dealing with input like "x AND y AND z AND ...", rather than generating
a deeply-nested binary tree that just has to be flattened later by the
planner.  This avoids stack overflow in parse analysis when dealing with
queries having more than a few thousand such clauses; and in any case it
removes some rather unsightly inconsistencies, since some parts of parse
analysis were generating N-argument ANDs/ORs already.

It's still possible to get a stack overflow with weirdly parenthesized
input, such as "x AND (y AND (z AND ( ... )))", but such cases are not
mainstream usage.  The maximum depth of parenthesization is already
limited by Bison's stack in such cases, anyway, so that the limit is
probably fairly platform-independent.

Patch originally by Gurjeet Singh, heavily revised by me
2014-06-16 15:55:30 -04:00
Tom Lane 3f8c23c4d3 Improve predtest.c's ability to reason about operator expressions.
We have for a long time been able to prove implications and refutations
between clauses structured like "expr op const" with the same subexpression
and btree-related operators; for example that "x < 4" implies "x <= 5".
The implication machinery is needed to detect usability of partial indexes,
and the refutation machinery is needed to implement constraint exclusion.

This patch extends that machinery to make proofs for operator expressions
involving the same two immutable-but-not-necessarily-just-Const input
expressions, ie does "expr1 op1 expr2" prove or refute "expr1 op2 expr2" or
"expr2 op2 expr1"?  An important example is that we can now prove "x = y"
given "y = x", which formerly the code could not deduce unless x or y was a
constant.  We can make use of the system's knowledge of operator commutator
and negator pairs, and can also make use of btree opclass relationships,
for example "x < y" implies "x <= y" and refutes "x > y" (notice that
neither of these could be proven just from commutator or negator links).

Inspired by a gripe from Brian Dunavant.  This seems more like a new
feature than a bug fix, though, so no back-patch.
2014-06-13 00:02:56 -04:00
Tom Lane 9d4444a6fc Preserve exposed type of subquery outputs when substituting NULLs.
I thought I could get away with hardcoded int4 here, but the buildfarm
says differently.
2014-06-12 17:11:53 -04:00
Tom Lane 55d5b3c082 Remove unnecessary output expressions from unflattened subqueries.
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query.  However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc.  Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var.  Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.

Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind.  Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants.  This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
2014-06-12 13:12:53 -04:00
Tom Lane a16d421ca4 Revert "Auto-tune effective_cache size to be 4x shared buffers"
This reverts commit ee1e5662d8, as well as
a remarkably large number of followup commits, which were mostly concerned
with the fact that the implementation didn't work terribly well.  It still
doesn't: we probably need some rather basic work in the GUC infrastructure
if we want to fully support GUCs whose default varies depending on the
value of another GUC.  Meanwhile, it also emerged that there wasn't really
consensus in favor of the definition the patch tried to implement (ie,
effective_cache_size should default to 4 times shared_buffers).  So whack
it all back to where it was.  In a followup commit, I'll do what was
recently agreed to, which is to simply change the default to a higher
value.
2014-05-08 20:49:38 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane 95811032d7 Improve planner to drop constant-NULL inputs of AND/OR where it's legal.
In general we can't discard constant-NULL inputs, since they could change
the result of the AND/OR to be NULL.  But at top level of WHERE, we do not
need to distinguish a NULL result from a FALSE result, so it's okay to
treat NULL as FALSE and then simplify AND/OR accordingly.

This is a very ancient oversight, but in 9.2 and later it can lead to
failure to optimize queries that previous releases did optimize, as a
result of more aggressive parameter substitution rules making it possible
to reduce more subexpressions to NULL constants.  This is the root cause of
bug #10171 from Arnold Scheffler.  We could alternatively have fixed that
by teaching orclauses.c to ignore constant-NULL OR arms, but it seems
better to get rid of them globally.

I resisted the temptation to back-patch this change into all active
branches, but it seems appropriate to back-patch as far as 9.2 so that
there will not be performance regressions of the kind shown in this bug.
2014-04-29 13:12:46 -04:00
Stephen Frost 842faa714c Make security barrier views automatically updatable
Views which are marked as security_barrier must have their quals
applied before any user-defined quals are called, to prevent
user-defined functions from being able to see rows which the
security barrier view is intended to prevent them from seeing.

Remove the restriction on security barrier views being automatically
updatable by adding a new securityQuals list to the RTE structure
which keeps track of the quals from security barrier views at each
level, independently of the user-supplied quals.  When RTEs are
later discovered which have securityQuals populated, they are turned
into subquery RTEs which are marked as security_barrier to prevent
any user-supplied quals being pushed down (modulo LEAKPROOF quals).

Dean Rasheed, reviewed by Craig Ringer, Simon Riggs, KaiGai Kohei
2014-04-12 21:04:58 -04:00
Tom Lane a9d9acbf21 Create infrastructure for moving-aggregate optimization.
Until now, when executing an aggregate function as a window function
within a window with moving frame start (that is, any frame start mode
except UNBOUNDED PRECEDING), we had to recalculate the aggregate from
scratch each time the frame head moved.  This patch allows an aggregate
definition to include an alternate "moving aggregate" implementation
that includes an inverse transition function for removing rows from
the aggregate's running state.  As long as this can be done successfully,
runtime is proportional to the total number of input rows, rather than
to the number of input rows times the average frame length.

This commit includes the core infrastructure, documentation, and regression
tests using user-defined aggregates.  Follow-on commits will update some
of the built-in aggregates to use this feature.

David Rowley and Florian Pflug, reviewed by Dean Rasheed; additional
hacking by me
2014-04-12 12:03:30 -04:00
Tom Lane a87c729153 Fix EquivalenceClass processing for nested append relations.
The original coding of EquivalenceClasses didn't foresee that appendrel
child relations might themselves be appendrels; but this is possible for
example when a UNION ALL subquery scans a table with inheritance children.
The oversight led to failure to optimize ordering-related issues very well
for the grandchild tables.  After some false starts involving explicitly
flattening the appendrel representation, we found that this could be fixed
easily by removing a few implicit assumptions about appendrel parent rels
not being children themselves.

Kyotaro Horiguchi and Tom Lane, reviewed by Noah Misch
2014-03-28 11:50:01 -04:00
Tom Lane af930e606a Again fix initialization of auto-tuned effective_cache_size.
The previous method was overly complex and underly correct; in particular,
by assigning the default value with PGC_S_OVERRIDE, it prevented later
attempts to change the setting in postgresql.conf, as noted by Jeff Janes.
We should just assign the default value with source PGC_S_DYNAMIC_DEFAULT,
which will have the desired priority relative to the boot_val as well as
user-set values.

There is still a gap in this method: if there's an explicit assignment of
effective_cache_size = -1 in the postgresql.conf file, and that assignment
appears before shared_buffers is assigned, the code will substitute 4 times
the bootstrap default for shared_buffers, and that value will then persist
(since it will have source PGC_S_FILE).  I don't see any very nice way
to avoid that though, and it's not a case to be expected in practice.
The existing comments in guc-file.l look forward to a redesign of the
DYNAMIC_DEFAULT mechanism; if that ever happens, we should consider this
case as one of the things we'd like to improve.
2014-03-20 12:58:30 -04:00
Tom Lane 043f6ff05d Fix bogus handling of "postponed" lateral quals.
When pulling a "postponed" qual from a LATERAL subquery up into the quals
of an outer join, we must make sure that the postponed qual is included
in those seen by make_outerjoininfo().  Otherwise we might compute a
too-small min_lefthand or min_righthand for the outer join, leading to
"JOIN qualification cannot refer to other relations" failures from
distribute_qual_to_rels.  Subtler errors in the created plan seem possible,
too, if the extra qual would only affect join ordering constraints.

Per bug #9041 from David Leverton.  Back-patch to 9.3.
2014-01-30 14:51:16 -05:00
Tom Lane 2850896961 Code review for auto-tuned effective_cache_size.
Fix integer overflow issue noted by Magnus Hagander, as well as a bunch
of other infelicities in commit ee1e5662d8
and its unreasonably large number of followups.
2014-01-27 00:05:56 -05:00
Simon Riggs 4d1e2aeb1a Speed up COPY into tables with DEFAULT nextval()
Previously the presence of a nextval() prevented the
use of batch-mode COPY.  This patch introduces a
special case just for nextval() functions. In future
we will introduce a general case solution for
labelling volatile functions as safe for use.
2014-01-20 17:22:38 +00:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Peter Eisentraut edc43458d7 Add more use of psprintf() 2014-01-06 21:30:26 -05:00
Tom Lane f7fbf4b0be Remove dead code now that orindxpath.c is history.
We don't need make_restrictinfo_from_bitmapqual() anymore at all.
generate_bitmap_or_paths() doesn't need to be exported, and we can
drop its rather klugy restriction_only flag.
2013-12-30 12:50:31 -05:00
Tom Lane f343a880d5 Extract restriction OR clauses whether or not they are indexable.
It's possible to extract a restriction OR clause from a join clause that
has the form of an OR-of-ANDs, if each sub-AND includes a clause that
mentions only one specific relation.  While PG has been aware of that idea
for many years, the code previously only did it if it could extract an
indexable OR clause.  On reflection, though, that seems a silly limitation:
adding a restriction clause can be a win by reducing the number of rows
that have to be filtered at the join step, even if we have to test the
clause as a plain filter clause during the scan.  This should be especially
useful for foreign tables, where the change can cut the number of rows that
have to be retrieved from the foreign server; but testing shows it can win
even on local tables.  Per a suggestion from Robert Haas.

As a heuristic, I made the code accept an extracted restriction clause
if its estimated selectivity is less than 0.9, which will probably result
in accepting extracted clauses just about always.  We might need to tweak
that later based on experience.

Since the code no longer has even a weak connection to Path creation,
remove orindxpath.c and create a new file optimizer/util/orclauses.c.

There's some additional janitorial cleanup of now-dead code that needs
to happen, but it seems like that's a fit subject for a separate commit.
2013-12-30 12:24:37 -05:00
Tom Lane 8d65da1f01 Support ordered-set (WITHIN GROUP) aggregates.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()).  We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.

Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions.  To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c.  This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need.  There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.

In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates.  Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.

Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
2013-12-23 16:11:35 -05:00
Tom Lane c03ad5602f Fix inherited UPDATE/DELETE with UNION ALL subqueries.
Fix an oversight in commit b3aaf9081a1a95c245fd605dcf02c91b3a5c3a29: we do
indeed need to process the planner's append_rel_list when copying RTE
subqueries, because if any of them were flattenable UNION ALL subqueries,
the append_rel_list shows which subquery RTEs were pulled up out of which
other ones.  Without this, UNION ALL subqueries aren't correctly inserted
into the update plans for inheritance child tables after the first one,
typically resulting in no update happening for those child table(s).
Per report from Victor Yegorov.

Experimentation with this case also exposed a fault in commit
a7b965382cf0cb30aeacb112572718045e6d4be7: if an inherited UPDATE/DELETE
was proven totally dummy by constraint exclusion, we might arrive at
add_rtes_to_flat_rtable with root->simple_rel_array being NULL.  This
should be interpreted as not having any RelOptInfos.  I chose to code
the guard as a check against simple_rel_array_size, so as to also
provide some protection against indexing off the end of the array.

Back-patch to 9.2 where the faulty code was added.
2013-12-14 17:33:53 -05:00
Tom Lane 9ec6199d18 Fix possible crash with nested SubLinks.
An expression such as WHERE (... x IN (SELECT ...) ...) IN (SELECT ...)
could produce an invalid plan that results in a crash at execution time,
if the planner attempts to flatten the outer IN into a semi-join.
This happens because convert_testexpr() was not expecting any nested
SubLinks and would wrongly replace any PARAM_SUBLINK Params belonging
to the inner SubLink.  (I think the comment denying that this case could
happen was wrong when written; it's certainly been wrong for quite a long
time, since very early versions of the semijoin flattening logic.)

Per report from Teodor Sigaev.  Back-patch to all supported branches.
2013-12-10 16:10:17 -05:00
Robert Haas 8e18d04d4d Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there.  Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin.  With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.

This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.

Andres Freund, with some modifications by me.
2013-11-28 20:57:20 -05:00
Tom Lane f19e92ed04 Flatten join alias Vars before pulling up targetlist items from a subquery.
pullup_replace_vars()'s decisions about whether a pulled-up replacement
expression needs to be wrapped in a PlaceHolderVar depend on the assumption
that what looks like a Var behaves like a Var.  However, if the Var is a
join alias reference, later flattening of join aliases might replace the
Var with something that's not a Var at all, and should have been wrapped.

To fix, do a forcible pass of flatten_join_alias_vars() on the subquery
targetlist before we start to copy items out of it.  We'll re-run that
processing on the pulled-up expressions later, but that's harmless.

Per report from Ken Tanzer; the added regression test case is based on his
example.  This bug has been there since the PlaceHolderVar mechanism was
invented, but has escaped detection because the circumstances that trigger
it are fairly narrow.  You need a flattenable query underneath an outer
join, which contains another flattenable query inside a join of its own,
with a dangerous expression (a constant or something else non-strict)
in that one's targetlist.

Having seen this, I'm wondering if it wouldn't be prudent to do all
alias-variable flattening earlier, perhaps even in the rewriter.
But that would probably not be a back-patchable change.
2013-11-22 14:37:21 -05:00
Tom Lane 784e762e88 Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry.  The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others.  This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.

This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.

Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).

The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does.  There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST().  After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.

Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
2013-11-21 19:37:20 -05:00
Tom Lane 6cb86143e8 Allow aggregates to provide estimates of their transition state data size.
Formerly the planner had a hard-wired rule of thumb for guessing the amount
of space consumed by an aggregate function's transition state data.  This
estimate is critical to deciding whether it's OK to use hash aggregation,
and in many situations the built-in estimate isn't very good.  This patch
adds a column to pg_aggregate wherein a per-aggregate estimate can be
provided, overriding the planner's default, and infrastructure for setting
the column via CREATE AGGREGATE.

It may be that additional smarts will be required in future, perhaps even
a per-aggregate estimation function.  But this is already a step forward.

This is extracted from a larger patch to improve the performance of numeric
and int8 aggregates.  I (tgl) thought it was worth reviewing and committing
this infrastructure separately.  In this commit, all built-in aggregates
are given aggtransspace = 0, so no behavior should change.

Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra
2013-11-16 16:03:40 -05:00
Tom Lane f3b3b8d5be Compute correct em_nullable_relids in get_eclass_for_sort_expr().
Bug #8591 from Claudio Freire demonstrates that get_eclass_for_sort_expr
must be able to compute valid em_nullable_relids for any new equivalence
class members it creates.  I'd worried about this in the commit message
for db9f0e1d9a, but claimed that it wasn't a
problem because multi-member ECs should already exist when it runs.  That
is transparently wrong, though, because this function is also called by
initialize_mergeclause_eclasses, which runs during deconstruct_jointree.
The example given in the bug report (which the new regression test item
is based upon) fails because the COALESCE() expression is first seen by
initialize_mergeclause_eclasses rather than process_equivalence.

Fixing this requires passing the appropriate nullable_relids set to
get_eclass_for_sort_expr, and it requires new code to compute that set
for top-level expressions such as ORDER BY, GROUP BY, etc.  We store
the top-level nullable_relids in a new field in PlannerInfo to avoid
computing it many times.  In the back branches, I've added the new
field at the end of the struct to minimize ABI breakage for planner
plugins.  There doesn't seem to be a good alternative to changing
get_eclass_for_sort_expr's API signature, though.  There probably aren't
any third-party extensions calling that function directly; moreover,
if there are, they probably need to think about what to pass for
nullable_relids anyway.

Back-patch to 9.2, like the previous patch in this area.
2013-11-15 16:46:18 -05:00
Peter Eisentraut 001e114b8d Fix whitespace issues found by git diff --check, add gitattributes
Set per file type attributes in .gitattributes to fine-tune whitespace
checks.  With the associated cleanups, the tree is now clean for git
2013-11-10 14:48:29 -05:00
Tom Lane b97ee66cc1 Make contain_volatile_functions/contain_mutable_functions look into SubLinks.
This change prevents us from doing inappropriate subquery flattening in
cases such as dangerous functions hidden inside a sub-SELECT in the
targetlist of another sub-SELECT.  That could result in unexpected behavior
due to multiple evaluations of a volatile function, as in a recent
complaint from Etienne Dube.  It's been questionable from the very
beginning whether these functions should look into subqueries (as noted in
their comments), and this case seems to provide proof that they should.

Because the new code only descends into SubLinks, not SubPlans or
InitPlans, the change only affects the planner's behavior during
prepjointree processing and not later on --- for example, you can still get
it to use a volatile function in an indexqual if you wrap the function in
(SELECT ...).  That's a historical behavior, for sure, but it's reasonable
given that the executor's evaluation rules for subplans don't depend on
whether there are volatile functions inside them.  In any case, we need to
constrain the behavioral change as narrowly as we can to make this
reasonable to back-patch.
2013-11-08 11:36:57 -05:00
Tom Lane 5e900bc00f Fix generation of MergeAppend plans for optimized min/max on expressions.
Before jamming a desired targetlist into a plan node, one really ought to
make sure the plan node can handle projections, and insert a buffering
Result plan node if not.  planagg.c forgot to do this, which is a hangover
from the days when it only dealt with IndexScan plan types.  MergeAppend
doesn't project though, not to mention that it gets unhappy if you remove
its possibly-resjunk sort columns.  The code accidentally failed to fail
for cases in which the min/max argument was a simple Var, because the new
targetlist would be equivalent to the original "flat" tlist anyway.
For any more complex case, it's been broken since 9.1 where we introduced
the ability to optimize min/max using MergeAppend, as reported by Raphael
Bauduin.  Fix by duplicating the logic from grouping_planner that decides
whether we need a Result node.

In 9.2 and 9.1, this requires back-porting the tlist_same_exprs() function
introduced in commit 4387cf956b, else we'd
uselessly add a Result node in cases that worked before.  It's rather
tempting to back-patch that whole commit so that we can avoid extra Result
nodes in mainline cases too; but I'll refrain, since that code hasn't
really seen all that much field testing yet.
2013-11-07 13:14:14 -05:00
Tom Lane bb45c64041 Support default arguments and named-argument notation for window functions.
These things didn't work because the planner omitted to do the necessary
preprocessing of a WindowFunc's argument list.  Add the few dozen lines
of code needed to handle that.

Although this sounds like a feature addition, it's really a bug fix because
the default-argument case was likely to crash previously, due to lack of
checking of the number of supplied arguments in the built-in window
functions.  It's not a security issue because there's no way for a
non-superuser to create a window function definition with defaults that
refers to a built-in C function, but nonetheless people might be annoyed
that it crashes rather than producing a useful error message.  So
back-patch as far as the patch applies easily, which turns out to be 9.2.
I'll put a band-aid in earlier versions as a separate patch.

(Note that these features still don't work for aggregates, and fixing that
case will be harder since we represent aggregate arg lists as target lists
not bare expression lists.  There's no crash risk though because CREATE
AGGREGATE doesn't accept defaults, and we reject named-argument notation
when parsing an aggregate call.)
2013-11-06 13:33:09 -05:00
Tom Lane 6331de1d44 Fix some obsolete information in src/backend/optimizer/README.
Constant quals aren't handled the same way they used to be.  Also,
add mention of a couple more major steps in grouping_planner.
Per complaint a couple months back from Etsuro Fujita.
2013-11-05 11:31:35 -05:00
Peter Eisentraut 5b6d08cd29 Add use of asprintf()
Add asprintf(), pg_asprintf(), and psprintf() to simplify string
allocation and composition.  Replacement implementations taken from
NetBSD.

Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Asif Naeem <anaeem.it@gmail.com>
2013-10-13 00:09:18 -04:00
Bruce Momjian bf46524b31 Fix C comment in check_effective_cache_size() 2013-10-08 19:25:26 -04:00
Bruce Momjian ee1e5662d8 Auto-tune effective_cache size to be 4x shared buffers 2013-10-08 12:12:24 -04:00
Tom Lane fcf9ecad57 In locate_grouping_columns(), don't expect an exact match of Var typmods.
It's possible that inlining of SQL functions (or perhaps other changes?)
has exposed typmod information not known at parse time.  In such cases,
Vars generated by query_planner might have valid typmod values while the
original grouping columns only have typmod -1.  This isn't a semantic
problem since the behavior of grouping only depends on type not typmod,
but it breaks locate_grouping_columns' use of tlist_member to locate the
matching entry in query_planner's result tlist.

We can fix this without an excessive amount of new code or complexity by
relying on the fact that locate_grouping_columns only gets called when
make_subplanTargetList has set need_tlist_eval == false, and that can only
happen if all the grouping columns are simple Vars.  Therefore we only need
to search the sub_tlist for a matching Var, and we can reasonably define a
"match" as being a match of the Var identity fields
varno/varattno/varlevelsup.  The code still Asserts that vartype matches,
but ignores vartypmod.

Per bug #8393 from Evan Martin.  The added regression test case is
basically the same as his example.  This has been broken for a very long
time, so back-patch to all supported branches.
2013-08-23 17:30:53 -04:00
Tom Lane 3454876314 Fix hash table size estimation error in choose_hashed_distinct().
We should account for the per-group hashtable entry overhead when
considering whether to use a hash aggregate to implement DISTINCT.  The
comparable logic in choose_hashed_grouping() gets this right, but I think
I omitted it here in the mistaken belief that there would be no overhead
if there were no aggregate functions to be evaluated.  This can result in
more than 2X underestimate of the hash table size, if the tuples being
aggregated aren't very wide.  Per report from Tomas Vondra.

This bug is of long standing, but per discussion we'll only back-patch into
9.3.  Changing the estimation behavior in stable branches seems to carry too
much risk of destabilizing plan choices for already-tuned applications.
2013-08-21 13:38:34 -04:00
Tom Lane c64de21e96 Fix qual-clause-misplacement issues with pulled-up LATERAL subqueries.
In an example such as
SELECT * FROM
  i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true;
it is safe to pull up the LATERAL subquery into its parent, but we must
then treat the "i.n = j.n" clause as a qual clause of the LEFT JOIN.  The
previous coding in deconstruct_recurse mistakenly labeled the clause as
"is_pushed_down", resulting in wrong semantics if the clause were applied
at the join node, as per an example submitted awhile ago by Jeremy Evans.
To fix, postpone processing of such clauses until we return back up to
the appropriate recursion depth in deconstruct_recurse.

In addition, tighten the is-safe-to-pull-up checks in is_simple_subquery;
we previously missed the possibility that the LATERAL subquery might itself
contain an outer join that makes lateral references in lower quals unsafe.

A regression test case equivalent to Jeremy's example was already in my
commit of yesterday, but was giving the wrong results because of this
bug.  This patch fixes the expected output for that, and also adds a
test case for the second problem.
2013-08-19 13:19:41 -04:00