Commit Graph

997 Commits

Author SHA1 Message Date
Tom Lane 4a4e2442a7 Fix improper uses of canonicalize_qual().
One of the things canonicalize_qual() does is to remove constant-NULL
subexpressions of top-level AND/OR clauses.  It does that on the assumption
that what it's given is a top-level WHERE clause, so that NULL can be
treated like FALSE.  Although this is documented down inside a subroutine
of canonicalize_qual(), it wasn't mentioned in the documentation of that
function itself, and some callers hadn't gotten that memo.

Notably, commit d007a9505 caused get_relation_constraints() to apply
canonicalize_qual() to CHECK constraints.  That allowed constraint
exclusion to misoptimize situations in which a CHECK constraint had a
provably-NULL subclause, as seen in the regression test case added here,
in which a child table that should be scanned is not.  (Although this
thinko is ancient, the test case doesn't fail before 9.2, for reasons
I've not bothered to track down in detail.  There may be related cases
that do fail before that.)

More recently, commit f0e44751d added an independent bug by applying
canonicalize_qual() to index expressions, which is even sillier since
those might not even be boolean.  If they are, though, I think this
could lead to making incorrect index entries for affected index
expressions in v10.  I haven't attempted to prove that though.

To fix, add an "is_check" parameter to canonicalize_qual() to specify
whether it should assume WHERE or CHECK semantics, and make it perform
NULL-elimination accordingly.  Adjust the callers to apply the right
semantics, or remove the call entirely in cases where it's not known
that the expression has one or the other semantics.  I also removed
the call in some cases involving partition expressions, where it should
be a no-op because such expressions should be canonical already ...
and was a no-op, independently of whether it could in principle have
done something, because it was being handed the qual in implicit-AND
format which isn't what it expects.  In HEAD, add an Assert to catch
that type of mistake in future.

This represents an API break for external callers of canonicalize_qual().
While that's intentional in HEAD to make such callers think about which
case applies to them, it seems like something we probably wouldn't be
thanked for in released branches.  Hence, in released branches, the
extra parameter is added to a new function canonicalize_qual_ext(),
and canonicalize_qual() is a wrapper that retains its old behavior.

Patch by me with suggestions from Dean Rasheed.  Back-patch to all
supported branches.

Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us
2018-03-11 18:10:42 -04:00
Tom Lane 5748f3a0aa Improve predtest.c's internal docs, and enhance its functionality a bit.
Commit b08df9cab left things rather poorly documented as far as the
exact semantics of "clause_is_check" mode went.  Also, that mode did
not really work correctly for predicate_refuted_by; although given the
lack of specification as to what it should do, as well as the lack
of any actual use-case, that's perhaps not surprising.

Rename "clause_is_check" to "weak" proof mode, and provide specifications
for what it should do.  I defined weak refutation as meaning "truth of A
implies non-truth of B", which makes it possible to use the mode in the
part of relation_excluded_by_constraints that checks for mutually
contradictory WHERE clauses.  Fix up several places that did things wrong
for that definition.  (As far as I can see, these errors would only lead
to failure-to-prove, not incorrect claims of proof, making them not
serious bugs even aside from the fact that v10 contains no use of this
mode.  So there seems no need for back-patching.)

In addition, teach predicate_refuted_by_recurse that it can use
predicate_implied_by_recurse after all when processing a strong NOT-clause,
so long as it asks for the correct proof strength.  This is an optimization
that could have been included in commit b08df9cab, but wasn't.

Also, simplify and generalize the logic that checks for whether nullness of
the argument of IS [NOT] NULL would force overall nullness of the predicate
or clause.  (This results in a change in the partition_prune test's output,
as it is now able to prune an all-nulls partition that it did not recognize
before.)

In passing, in PartConstraintImpliedByRelConstraint, remove bogus
conversion of the constraint list to explicit-AND form and then right back
again; that accomplished nothing except forcing a useless extra level of
recursion inside predicate_implied_by.

Discussion: https://postgr.es/m/5983.1520487191@sss.pgh.pa.us
2018-03-09 16:58:26 -05:00
Peter Eisentraut fd1a421fe6 Add prokind column, replacing proisagg and proiswindow
The new column distinguishes normal functions, procedures, aggregates,
and window functions.  This replaces the existing columns proisagg and
proiswindow, and replaces the convention that procedures are indicated
by prorettype == 0.  Also change prorettype to be VOIDOID for procedures.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-02 13:48:33 -05:00
Robert Haas 2af28e6033 For partitionwise join, match on partcollation, not parttypcoll.
The previous code considered two tables to have the partition scheme
if the underlying columns had the same collation, but what we
actually need to compare is not the collations associated with the
column but the collation used for partitioning.  Fix that.

Robert Haas and Amit Langote

Discussion: http://postgr.es/m/0f95f924-0efa-4cf5-eb5f-9a3d1bc3c33d@lab.ntt.co.jp
2018-02-28 12:16:09 -05:00
Peter Eisentraut 2fb1abaeb0 Rename enable_partition_wise_join to enable_partitionwise_join
Discussion: https://www.postgresql.org/message-id/flat/ad24e4f4-6481-066e-e3fb-6ef4a3121882%402ndquadrant.com
2018-02-16 10:33:59 -05:00
Robert Haas f069c91a57 Fix possible crash in partition-wise join.
The previous code assumed that we'd always succeed in creating
child-joins for a joinrel for which partition-wise join was considered,
but that's not guaranteed, at least in the case where dummy rels
are involved.

Ashutosh Bapat, with some wordsmithing by me.

Discussion: http://postgr.es/m/CAFjFpRf8=uyMYYfeTBjWDMs1tR5t--FgOe2vKZPULxxdYQ4RNw@mail.gmail.com
2018-02-05 17:31:57 -05:00
Alvaro Herrera 05fb5d6619 Ignore partitioned indexes where appropriate
get_relation_info() was too optimistic about opening indexes in
partitioned tables, which would raise errors when any queries were
planned on such tables.  Fix by ignoring any indexes of the partitioned
kind.

CLUSTER (and ALTER TABLE CLUSTER ON) had a similar problem.  Fix by
disallowing these commands in partitioned tables.

Fallout from 8b08f7d482.
2018-01-25 16:12:15 -03:00
Tom Lane bb94ce4d26 Teach reparameterize_path() to handle AppendPaths.
If we're inside a lateral subquery, there may be no unparameterized paths
for a particular child relation of an appendrel, in which case we *must*
be able to create similarly-parameterized paths for each other child
relation, else the planner will fail with "could not devise a query plan
for the given query".  This means that there are situations where we'd
better be able to reparameterize at least one path for each child.

This calls into question the assumption in reparameterize_path() that
it can just punt if it feels like it.  However, the only case that is
known broken right now is where the child is itself an appendrel so that
all its paths are AppendPaths.  (I think possibly I disregarded that in
the original coding on the theory that nested appendrels would get folded
together --- but that only happens *after* reparameterize_path(), so it's
not excused from handling a child AppendPath.)  Given that this code's been
like this since 9.3 when LATERAL was introduced, it seems likely we'd have
heard of other cases by now if there were a larger problem.

Per report from Elvis Pranskevichus.  Back-patch to 9.3.

Discussion: https://postgr.es/m/5981018.zdth1YWmNy@hammer.magicstack.net
2018-01-23 16:50:34 -05:00
Robert Haas 2f17844104 Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint.  In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one.  This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented.  (There is a pending patch to improve the
situation further, but it needs more review.)

Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me.  A few final revisions by me.

Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 15:33:06 -05:00
Bruce Momjian f033462d8f Reorder C includes
Reorder header files in joinrels.c and pathnode.c in alphabetical order,
removing unnecessary ones.

Author: Etsuro Fujita
2018-01-17 18:10:05 -05:00
Peter Eisentraut 9e945f8626 Fix Latin spelling
"c.f." should be "cf.".
2018-01-11 08:32:01 -05:00
Tom Lane 624e440a47 Improve the heuristic for ordering child paths of a parallel append.
Commit ab7271677 introduced code that attempts to order the child
scans of a Parallel Append node in a way that will minimize execution
time, based on total cost and startup cost.  However, it failed to
think hard about what to do when estimated costs are exactly equal;
a case that's particularly likely to occur when comparing on startup
cost.  In such a case the ordering of the child paths would be left
to the whims of qsort, an algorithm that isn't even stable.

We can improve matters by applying the rule used elsewhere in the
planner: if total costs are equal, sort on startup cost, and
vice versa.  When both cost estimates are exactly equal, rather
than letting qsort do something unpredictable, sort based on the
child paths' relids, which should typically result in sorting in
inheritance order.  (The latter provision requires inventing a
qsort-style comparator for bitmapsets, but maybe we'll have use
for that for other reasons in future.)

This results in a few plan changes in the select_parallel test,
but those all look more reasonable than before, when the actual
underlying cost numbers are taken into account.

Discussion: https://postgr.es/m/4944.1515446989@sss.pgh.pa.us
2018-01-09 13:07:52 -05:00
Tom Lane 3decd150a2 Teach eval_const_expressions() to handle some more cases.
Add some infrastructure (mostly macros) to make it easier to write
typical cases for constant-expression simplification.  Add simplification
processing for ArrayRef, RowExpr, and ScalarArrayOpExpr node types,
which formerly went unsimplified even if all their inputs were constants.
Also teach it to simplify FieldSelect from a composite constant.
Make use of the new infrastructure to reduce the amount of code needed
for the existing ArrayExpr and ArrayCoerceExpr cases.

One existing test case changes output as a result of the fact that
RowExpr can now be folded to a constant.  All the new code is exercised
by existing test cases according to gcov, so I feel no need to add
additional tests.

Tom Lane, reviewed by Dmitry Dolgov

Discussion: https://postgr.es/m/3be3b82c-e29c-b674-2163-bf47d98817b1@iki.fi
2018-01-03 12:35:09 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Tom Lane 6719b238e8 Rearrange execution of PARAM_EXTERN Params for plpgsql's benefit.
This patch does three interrelated things:

* Create a new expression execution step type EEOP_PARAM_CALLBACK
and add the infrastructure needed for add-on modules to generate that.
As discussed, the best control mechanism for that seems to be to add
another hook function to ParamListInfo, which will be called by
ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found.
For stand-alone expressions, we add a new entry point to allow the
ParamListInfo to be specified directly, since it can't be retrieved
from the parent plan node's EState.

* Redesign the API for the ParamListInfo paramFetch hook so that the
ParamExternData array can be entirely virtual.  This also lets us get rid
of ParamListInfo.paramMask, instead leaving it to the paramFetch hook to
decide which param IDs should be accessible or not.  plpgsql_param_fetch
was already doing the identical masking check, so having callers do it too
seemed redundant.  While I was at it, I added a "speculative" flag to
paramFetch that the planner can specify as TRUE to avoid unwanted failures.
This solves an ancient problem for plpgsql that it couldn't provide values
of non-DTYPE_VAR variables to the planner for fear of triggering premature
"record not assigned yet" or "field not found" errors during planning.

* Rework plpgsql to get rid of the need for "unshared" parameter lists,
by dint of turning the single ParamListInfo per estate into a nearly
read-only data structure that doesn't instantiate any per-variable data.
Instead, the paramFetch hook controls access to per-variable data and can
make the right decisions on the fly, replacing the cases that we used to
need multiple ParamListInfos for.  This might perhaps have been a
performance loss on its own, but by using a paramCompile hook we can
bypass plpgsql_param_fetch entirely during normal query execution.
(It's now only called when, eg, we copy the ParamListInfo into a cursor
portal.  copyParamList() or SerializeParamList() effectively instantiate
the virtual parameter array as a simple physical array without a
paramFetch hook, which is what we want in those cases.)  This allows
reverting most of commit 6c82d8d1f, though I kept the cosmetic
code-consolidation aspects of that (eg the assign_simple_var function).

Performance testing shows this to be at worst a break-even change,
and it can provide wins ranging up to 20% in test cases involving
accesses to fields of "record" variables.  The fact that values of
such variables can now be exposed to the planner might produce wins
in some situations, too, but I've not pursued that angle.

In passing, remove the "parent" pointer from the arguments to
ExecInitExprRec and related functions, instead storing that pointer in a
transient field in ExprState.  The ParamListInfo pointer for a stand-alone
expression is handled the same way; we'd otherwise have had to add
yet another recursively-passed-down argument in expression compilation.

Discussion: https://postgr.es/m/32589.1513706441@sss.pgh.pa.us
2017-12-21 12:57:45 -05:00
Andres Freund 1804284042 Add parallel-aware hash joins.
Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel
Hash Join with Parallel Hash.  While hash joins could already appear in
parallel queries, they were previously always parallel-oblivious and had a
partial subplan only on the outer side, meaning that the work of the inner
subplan was duplicated in every worker.

After this commit, the planner will consider using a partial subplan on the
inner side too, using the Parallel Hash node to divide the work over the
available CPU cores and combine its results in shared memory.  If the join
needs to be split into multiple batches in order to respect work_mem, then
workers process different batches as much as possible and then work together
on the remaining batches.

The advantages of a parallel-aware hash join over a parallel-oblivious hash
join used in a parallel query are that it:

 * avoids wasting memory on duplicated hash tables
 * avoids wasting disk space on duplicated batch files
 * divides the work of building the hash table over the CPUs

One disadvantage is that there is some communication between the participating
CPUs which might outweigh the benefits of parallelism in the case of small
hash tables.  This is avoided by the planner's existing reluctance to supply
partial plans for small scans, but it may be necessary to estimate
synchronization costs in future if that situation changes.  Another is that
outer batch 0 must be written to disk if multiple batches are required.

A potential future advantage of parallel-aware hash joins is that right and
full outer joins could be supported, since there is a single set of matched
bits for each hashtable, but that is not yet implemented.

A new GUC enable_parallel_hash is defined to control the feature, defaulting
to on.

Author: Thomas Munro
Reviewed-By: Andres Freund, Robert Haas
Tested-By: Rafia Sabih, Prabhat Sahu
Discussion:
    https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
    https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
2017-12-21 00:43:41 -08:00
Robert Haas ab72716778 Support Parallel Append plan nodes.
When we create an Append node, we can spread out the workers over the
subplans instead of piling on to each subplan one at a time, which
should typically be a bit more efficient, both because the startup
cost of any plan executed entirely by one worker is paid only once and
also because of reduced contention.  We can also construct Append
plans using a mix of partial and non-partial subplans, which may allow
for parallelism in places that otherwise couldn't support it.
Unfortunately, this patch doesn't handle the important case of
parallelizing UNION ALL by running each branch in a separate worker;
the executor infrastructure is added here, but more planner work is
needed.

Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by
Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and
Rajkumar Raghuwanshi.

Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
2017-12-05 17:28:39 -05:00
Robert Haas 1cbc17aaca Try to exclude partitioned tables in toto.
Ashutosh Bapat, reviewed by Jeevan Chalke.  Comment by me.

Discussion: http://postgr.es/m/CAFjFpRcuRaydz88CY_aQekmuvmN2A9ax5z0k=ppT+s8KS8xMRA@mail.gmail.com
2017-12-01 10:59:09 -05:00
Peter Eisentraut e4128ee767 SQL procedures
This adds a new object type "procedure" that is similar to a function
but does not have a return type and is invoked by the new CALL statement
instead of SELECT or similar.  This implementation is aligned with the
SQL standard and compatible with or similar to other SQL implementations.

This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well
as ALTER/DROP ROUTINE that can refer to either a function or a
procedure (or an aggregate function, as an extension to SQL).  There is
also support for procedures in various utility commands such as COMMENT
and GRANT, as well as support in pg_dump and psql.  Support for defining
procedures is available in all the languages supplied by the core
distribution.

While this commit is mainly syntax sugar around existing functionality,
future features will rely on having procedures as a separate object
type.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2017-11-30 11:03:20 -05:00
Robert Haas 1761653bbb Make create_unique_path manage memory like mark_dummy_rel.
Put the unique path in the same context as the owning RelOptInfo, rather
than the toplevel planner context.  This is how this function worked
originally, but commit f41803bb39
changed it without explanation.  mark_dummy_rel adopted the older (or
newer?) technique in commit eca75a12a2,
which also featured a much better explanation of why it is correct.
So, switch back to that technique here, with the same explanation
given there.

Although this fixes a possible memory leak when GEQO is in use, the
leak is minor and probably nobody cares, so no back-patch.

Ashutosh Bapat, reviewed by Tom Lane and by me

Discussion: http://postgr.es/m/CAFjFpRcXkHHrXyD9BCvkgGJV4TnHG2SWJ0PhJfrDu3NAcQvh7g@mail.gmail.com
2017-11-30 09:50:10 -05:00
Robert Haas eaedf0df71 Update typedefs.list and re-run pgindent
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
2017-11-29 09:24:24 -05:00
Robert Haas e89a71fb44 Pass InitPlan values to workers via Gather (Merge).
If a PARAM_EXEC parameter is used below a Gather (Merge) but the InitPlan
that computes it is attached to or above the Gather (Merge), force the
value to be computed before starting parallelism and pass it down to all
workers.  This allows us to use parallelism in cases where it previously
would have had to be rejected as unsafe.  We do - in this case - lose the
optimization that the value is only computed if it's actually used.  An
alternative strategy would be to have the first worker that needs the value
compute it, but one downside of that approach is that we'd then need to
select a parallel-safe path to compute the parameter value; it couldn't for
example contain a Gather (Merge) node.  At some point in the future, we
might want to consider both approaches.

Independent of that consideration, there is a great deal more work that
could be done to make more kinds of PARAM_EXEC parameters parallel-safe.
This infrastructure could be used to allow a Gather (Merge) on the inner
side of a nested loop (although that's not a very appealing plan) and
cases where the InitPlan is attached below the Gather (Merge) could be
addressed as well using various techniques.  But this is a good start.

Amit Kapila, reviewed and revised by me.  Reviewing and testing from
Kuntal Ghosh, Haribabu Kommi, and Tushar Ahuja.

Discussion: http://postgr.es/m/CAA4eK1LV0Y1AUV4cUCdC+sYOx0Z0-8NAJ2Pd9=UKsbQ5Sr7+JQ@mail.gmail.com
2017-11-16 12:06:14 -05:00
Robert Haas 44ae64c388 Push target list evaluation through Gather Merge.
We already do this for Gather, but it got overlooked for Gather Merge.

Amit Kapila, with review and minor revisions by Rushabh Lathia
and by me.

Discussion: http://postgr.es/m/CAA4eK1KUC5Uyu7qaifxrjpHxbSeoQh3yzwN3bThnJsmJcZ-qtA@mail.gmail.com
2017-11-13 16:37:42 -05:00
Robert Haas e64861c79b Track in the plan the types associated with PARAM_EXEC parameters.
Up until now, we only tracked the number of parameters, which was
sufficient to allocate an array of Datums of the appropriate size,
but not sufficient to, for example, know how to serialize a Datum
stored in one of those slots.  An upcoming patch wants to do that,
so add this tracking to make it possible.

Patch by me, reviewed by Tom Lane and Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoYqpxDKn8koHdW8BEKk8FMUL0=e8m2Qe=M+r0UBjr3tuQ@mail.gmail.com
2017-11-13 15:24:12 -05:00
Robert Haas b9941d3468 Fix incorrect comment.
Etsuro Fujita

Discussion: http://postgr.es/m/5A05728E.4050009@lab.ntt.co.jp
2017-11-10 10:55:09 -05:00
Peter Eisentraut 2eb4a831e5 Change TRUE/FALSE to true/false
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources.  The upper case spellings are only used
in some files/modules.  So standardize on the standard spellings.

The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.

In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-08 11:37:28 -05:00
Tom Lane 7b6c075471 Teach planner to account for HAVING quals in aggregation plan nodes.
For some reason, we have never accounted for either the evaluation cost
or the selectivity of filter conditions attached to Agg and Group nodes
(which, in practice, are always conditions from a HAVING clause).

Applying our regular selectivity logic to post-grouping conditions is a
bit bogus, but it's surely better than taking the selectivity as 1.0.
Perhaps someday the extended-statistics mechanism can be taught to provide
statistics that would help us in getting non-default estimates here.

Per a gripe from Benjamin Coutu.  This is surely a bug fix, but I'm
hesitant to back-patch because of the prospect of destabilizing existing
plan choices.  Given that it took us this long to notice the bug, it's
probably not hurting too many people in the field.

Discussion: https://postgr.es/m/20968.1509486337@sss.pgh.pa.us
2017-11-02 11:24:12 -04:00
Robert Haas 682ce911f8 Allow parallel query for prepared statements with generic plans.
This was always intended to work, but due to an oversight in
max_parallel_hazard_walker, it didn't.  In testing, we missed the
fact that it was only working for custom plans, where the parameter
value has been substituted for the parameter itself early enough
that everything worked.  In a generic plan, the Param node survives
and must be treated as parallel-safe.  SerializeParamList provides
for the transmission of parameter values to workers.

Amit Kapila with help from Kuntal Ghosh.  Some changes by me.

Discussion: http://postgr.es/m/CAA4eK1+_BuZrmVCeua5Eqnm4Co9DAXdM5HPAOE2J19ePbR912Q@mail.gmail.com
2017-10-27 22:22:39 +02:00
Tom Lane 37a795a60b Support domains over composite types.
This is the last major omission in our domains feature: you can now
make a domain over anything that's not a pseudotype.

The major complication from an implementation standpoint is that places
that might be creating tuples of a domain type now need to be prepared
to apply domain_check().  It seems better that unprepared code fail
with an error like "<type> is not composite" than that it silently fail
to apply domain constraints.  Therefore, relevant infrastructure like
get_func_result_type() and lookup_rowtype_tupdesc() has been adjusted
to treat domain-over-composite as a distinct case that unprepared code
won't recognize, rather than just transparently treating it the same
as plain composite.  This isn't a 100% solution to the possibility of
overlooked domain checks, but it catches most places.

In passing, improve typcache.c's support for domains (it can now cache
the identity of a domain's base type), and rewrite the argument handling
logic in jsonfuncs.c's populate_record[set]_worker to reduce duplicative
per-call lookups.

I believe this is code-complete so far as the core and contrib code go.
The PLs need varying amounts of work, which will be tackled in followup
patches.

Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
2017-10-26 13:47:45 -04:00
Robert Haas 45866c7550 Copy information from the relcache instead of pointing to it.
We have the relations continuously locked, but not open, so relcache
pointers are not guaranteed to be stable.  Per buildfarm member
prion.

Ashutosh Bapat.  I fixed a typo.

Discussion: http://postgr.es/m/CAFjFpRcRBqoKLZSNmRsjKr81uEP=ennvqSQaXVCCBTXvJ2rW+Q@mail.gmail.com
2017-10-06 15:28:07 -04:00
Robert Haas f49842d1ee Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually.  This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels.  This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation.  In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side.  All the same, for now,
turn this feature off by default.

Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical.  It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.

Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me.  A few final adjustments by me.

Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
2017-10-06 11:11:10 -04:00
Tom Lane c12d570fa1 Support arrays over domains.
Allowing arrays with a domain type as their element type was left un-done
in the original domain patch, but not for any very good reason.  This
omission leads to such surprising results as array_agg() not working on
a domain column, because the parser can't identify a suitable output type
for the polymorphic aggregate.

In order to fix this, first clean up the APIs of coerce_to_domain() and
some internal functions in parse_coerce.c so that we consistently pass
around a CoercionContext along with CoercionForm.  Previously, we sometimes
passed an "isExplicit" boolean flag instead, which is strictly less
information; and coerce_to_domain() didn't even get that, but instead had
to reverse-engineer isExplicit from CoercionForm.  That's contrary to the
documentation in primnodes.h that says that CoercionForm only affects
display and not semantics.  I don't think this change fixes any live bugs,
but it makes things more consistent.  The main reason for doing it though
is that now build_coercion_expression() receives ccontext, which it needs
in order to be able to recursively invoke coerce_to_target_type().

Next, reimplement ArrayCoerceExpr so that the node does not directly know
any details of what has to be done to the individual array elements while
performing the array coercion.  Instead, the per-element processing is
represented by a sub-expression whose input is a source array element and
whose output is a target array element.  This simplifies life in
parse_coerce.c, because it can build that sub-expression by a recursive
invocation of coerce_to_target_type().  The executor now handles the
per-element processing as a compiled expression instead of hard-wired code.
The main advantage of this is that we can use a single ArrayCoerceExpr to
handle as many as three successive steps per element: base type conversion,
typmod coercion, and domain constraint checking.  The old code used two
stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty
inefficient, and adding yet another array deconstruction to do domain
constraint checking seemed very unappetizing.

In the case where we just need a single, very simple coercion function,
doing this straightforwardly leads to a noticeable increase in the
per-array-element runtime cost.  Hence, add an additional shortcut evalfunc
in execExprInterp.c that skips unnecessary overhead for that specific form
of expression.  The runtime speed of simple cases is within 1% or so of
where it was before, while cases that previously required two levels of
array processing are significantly faster.

Finally, create an implicit array type for every domain type, as we do for
base types, enums, etc.  Everything except the array-coercion case seems
to just work without further effort.

Tom Lane, reviewed by Andrew Dunstan

Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us
2017-09-30 13:40:56 -04:00
Robert Haas 9140cf8269 Associate partitioning information with each RelOptInfo.
This is not used for anything yet, but it is necessary infrastructure
for partition-wise join and for partition pruning without constraint
exclusion.

Ashutosh Bapat, reviewed by Amit Langote and with quite a few changes,
mostly cosmetic, by me.  Additional review and testing of this patch
series by Antonin Houska, Amit Khandekar, Rafia Sabih, Rajkumar
Raghuwanshi, Thomas Munro, and Dilip Kumar.

Discussion: http://postgr.es/m/CAFjFpRfneFG3H+F6BaiXemMrKF+FY-POpx3Ocy+RiH3yBmXSNw@mail.gmail.com
2017-09-20 23:39:13 -04:00
Tom Lane 8689e38263 Clean up handling of dropped columns in NAMEDTUPLESTORE RTEs.
The NAMEDTUPLESTORE patch piggybacked on the infrastructure for
TABLEFUNC/VALUES/CTE RTEs, none of which can ever have dropped columns,
so the possibility was ignored most places.  Fix that, including adding a
specification to parsenodes.h about what it's supposed to look like.

In passing, clean up assorted comments that hadn't been maintained
properly by said patch.

Per bug #14799 from Philippe Beaudoin.  Back-patch to v10.

Discussion: https://postgr.es/m/20170906120005.25630.84360@wrigleys.postgresql.org
2017-09-06 10:41:05 -04:00
Andres Freund 2cd7084524 Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n).
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc.  Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.

Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-20 11:19:07 -07:00
Tom Lane 4867d7f62f Avoid out-of-memory in a hash join with many duplicate inner keys.
The executor is capable of splitting buckets during a hash join if
too much memory is being used by a small number of buckets.  However,
this only helps if a bucket's population is actually divisible; if
all the hash keys are alike, the tuples still end up in the same
new bucket.  This can result in an OOM failure if there are enough
inner keys with identical hash values.  The planner's cost estimates
will bias it against choosing a hash join in such situations, but not
by so much that it will never do so.  To mitigate the OOM hazard,
explicitly estimate the hash bucket space needed by just the inner
side's most common value, and if that would exceed work_mem then
add disable_cost to the hash cost estimate.

This approach doesn't account for the possibility that two or more
common values would share the same hash value.  On the other hand,
work_mem is normally a fairly conservative bound, so that eating
two or more times that much space is probably not going to kill us.

If we have no stats about the inner side, ignore this consideration.
There was some discussion of making a conservative assumption, but that
would effectively result in disabling hash join whenever we lack stats,
which seems like an overreaction given how seldom the problem manifests
in the field.

Per a complaint from David Hinkle.  Although this could be viewed
as a bug fix, the lack of similar complaints weighs against back-
patching; indeed we waited for v11 because it seemed already rather
late in the v10 cycle to be making plan choice changes like this one.

Discussion: https://postgr.es/m/32013.1487271761@sss.pgh.pa.us
2017-08-15 14:05:53 -04:00
Robert Haas e139f1953f Assorted preparatory refactoring for partition-wise join.
Instead of duplicating the logic to search for a matching
ParamPathInfo in multiple places, factor it out into a separate
function.

Pass only the relevant bits of the PartitionKey to
partition_bounds_equal instead of the whole thing, because
partition-wise join will want to call this without having a
PartitionKey available.

Adjust allow_star_schema_join and calc_nestloop_required_outer
to take relevant Relids rather than the entire Path, because
partition-wise join will want to call it with the top-parent
relids to determine whether a child join is allowable.

Ashutosh Bapat.  Review and testing of the larger patch set of which
this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih,
Thomas Munro, Dilip Kumar, and me.

Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com
2017-08-15 12:30:38 -04:00
Tom Lane 00418c6124 Simplify plpgsql's check for simple expressions.
plpgsql wants to recognize expressions that it can execute directly
via ExecEvalExpr() instead of going through the full SPI machinery.
Originally the test for this consisted of recursively groveling through
the post-planning expression tree to see if it contained only nodes that
plpgsql recognized as safe.  That was a major maintenance headache, since
it required updating plpgsql every time we added any kind of expression
node.  It was also kind of expensive, so over time we added various
pre-planning checks to try to short-circuit having to do that.
Robert Haas pointed out that as of the SRF-processing changes in v10,
particularly the addition of Query.hasTargetSRFs, there really isn't
any reason to make the recursive scan at all: the initial checks cover
everything we really care about.  We do have to make sure that those
checks agree with what inline_function() considers, so that inlining
of a function that formerly wasn't inlined can't cause an expression
considered simple to become non-simple.

Hence, delete the recursive function exec_simple_check_node(), and tweak
those other tests to more exactly agree with inline_function().  Adjust
some comments and function naming to match.

Discussion: https://postgr.es/m/CA+TgmoZGZpwdEV2FQWaVxA_qZXsQE1DAS5Fu8fwxXDNvfndiUQ@mail.gmail.com
2017-08-15 12:28:39 -04:00
Tom Lane decb08ebdf Code review for NextValueExpr expression node type.
Add missing infrastructure for this node type, notably in ruleutils.c where
its lack could demonstrably cause EXPLAIN to fail.  Add outfuncs/readfuncs
support.  (outfuncs support is useful today for debugging purposes.  The
readfuncs support may never be needed, since at present it would only
matter for parallel query and NextValueExpr should never appear in a
parallelizable query; but it seems like a bad idea to have a primnode type
that isn't fully supported here.)  Teach planner infrastructure that
NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node
with cost cpu_operator_cost.  Given its limited scope of usage, there
*might* be no live bug today from the lack of that knowledge, but it's
certainly going to bite us on the rear someday.  Teach pg_stat_statements
about the new node type, too.

While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction,
XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost.
Failing to do this for SQLValueFunction was an oversight in my commit
0bb51aa96.  The others are longer-standing oversights, but no time like the
present to fix them.  (In principle, CoerceToDomain could have cost much
higher than this, but it doesn't presently seem worth trying to examine the
domain's constraints here.)

Modify execExprInterp.c to execute NextValueExpr as an out-of-line
function; it seems quite unlikely to me that it's worth insisting that
it be inlined in all expression eval methods.  Besides, providing the
out-of-line function doesn't stop anyone from inlining if they want to.

Adjust some places where NextValueExpr support had been inserted with the
aid of a dartboard rather than keeping it in the same order as elsewhere.

Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us
2017-07-14 15:25:43 -04:00
Robert Haas 6af9f1bd4b Document partitioned_rels in create_modifytable_path header comment.
Etsuro Fujita, slightly adjusted by me.

Discussion: http://postgr.es/m/e87c4a6d-23d7-5e7c-e8db-44ed418eb5d1@lab.ntt.co.jp
2017-06-22 13:52:50 -04:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Tom Lane e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Robert Haas b08df9cab7 Teach predtest.c about CHECK clauses to fix partitioning bugs.
In a CHECK clause, a null result means true, whereas in a WHERE clause
it means false.  predtest.c provided different functions depending on
which set of semantics applied to the predicate being proved, but had
no option to control what a null meant in the clauses provided as
axioms.  Add one.

Use that in the partitioning code when figuring out whether the
validation scan on a new partition can be skipped.  Rip out the
old logic that attempted (not very successfully) to compensate
for the absence of the necessary support in predtest.c.

Ashutosh Bapat and Robert Haas, reviewed by Amit Langote and
incorporating feedback from Tom Lane.

Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com
2017-06-14 13:13:11 -04:00
Robert Haas b522759508 Copy partitioned_rels lists to avoid shared substructure.
Otherwise, set_plan_refs() can get applied to the same list
multiple times through different references, leading to chaos.

Amit Langote, Dilip Kumar, and Robert Haas, reviewed by Ashutosh
Bapat.  Original report by Sveinn Sveinsson.

Discussion: http://postgr.es/m/20170517141151.1435.79890@wrigleys.postgresql.org
2017-05-19 15:26:05 -04:00
Bruce Momjian a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Tom Lane f04c9a6146 Standardize terminology for pg_statistic_ext entries.
Consistently refer to such an entry as a "statistics object", not just
"statistics" or "extended statistics".  Previously we had a mismash of
terms, accompanied by utter confusion as to whether the term was
singular or plural.  That's not only grating (at least to the ear of
a native English speaker) but could be outright misleading, eg in error
messages that seemed to be referring to multiple objects where only one
could be meant.

This commit fixes the code and a lot of comments (though I may have
missed a few).  I also renamed two new SQL functions,
pg_get_statisticsextdef -> pg_get_statisticsobjdef
pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible
to conform better with this terminology.

I have not touched the SGML docs other than fixing those function
names; the docs certainly need work but it seems like a separable task.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-14 10:55:01 -04:00
Tom Lane 39151781c8 Fix testing of parallel-safety of SubPlans.
is_parallel_safe() supposed that the only relevant property of a SubPlan
was the parallel safety of the referenced subplan tree.  This is wrong:
the testexpr or args subtrees might contain parallel-unsafe stuff, as
demonstrated by the test case added here.  However, just recursing into the
subtrees fails in a different way: we'll typically find PARAM_EXEC Params
representing the subplan's output columns in the testexpr.  The previous
coding supposed that any Param must be treated as parallel-restricted, so
that a naive attempt at fixing this disabled parallel pushdown of SubPlans
altogether.  We must instead determine, for any visited Param, whether it
is one that would be computed by a surrounding SubPlan node; if so, it's
safe to push down along with the SubPlan node.

We might later be able to extend this logic to cope with Params used for
correlated subplans and other cases; but that's a task for v11 or beyond.

Tom Lane and Amit Kapila

Discussion: https://postgr.es/m/7064.1492022469@sss.pgh.pa.us
2017-04-18 15:43:56 -04:00
Alvaro Herrera ee6922112e Rename columns in new pg_statistic_ext catalog
The new catalog reused a column prefix "sta" from pg_statistic, but this
is undesirable, so change the catalog to use prefix "stx" instead.
Also, rename the column that lists enabled statistic kinds as "stxkind"
rather than "enabled".

Discussion: https://postgr.es/m/CAKJS1f_2t5jhSN7huYRFH3w3rrHfG2QU7hiUHsu-Vdjd1rYT3w@mail.gmail.com
2017-04-17 18:34:29 -03:00
Tom Lane 8f0530f580 Improve castNode notation by introducing list-extraction-specific variants.
This extends the castNode() notation introduced by commit 5bcab1114 to
provide, in one step, extraction of a list cell's pointer and coercion to
a concrete node type.  For example, "lfirst_node(Foo, lc)" is the same
as "castNode(Foo, lfirst(lc))".  Almost half of the uses of castNode
that have appeared so far include a list extraction call, so this is
pretty widely useful, and it saves a few more keystrokes compared to the
old way.

As with the previous patch, back-patch the addition of these macros to
pg_list.h, so that the notation will be available when back-patching.

Patch by me, after an idea of Andrew Gierth's.

Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us
2017-04-10 13:51:53 -04:00
Tom Lane 9c7f5229ad Optimize joins when the inner relation can be proven unique.
If there can certainly be no more than one matching inner row for a given
outer row, then the executor can move on to the next outer row as soon as
it's found one match; there's no need to continue scanning the inner
relation for this outer row.  This saves useless scanning in nestloop
and hash joins.  In merge joins, it offers the opportunity to skip
mark/restore processing, because we know we have not advanced past the
first possible match for the next outer row.

Of course, the devil is in the details: the proof of uniqueness must
depend only on joinquals (not otherquals), and if we want to skip
mergejoin mark/restore then it must depend only on merge clauses.
To avoid adding more planning overhead than absolutely necessary,
the present patch errs in the conservative direction: there are cases
where inner_unique or skip_mark_restore processing could be used, but
it will not do so because it's not sure that the uniqueness proof
depended only on "safe" clauses.  This could be improved later.

David Rowley, reviewed and rather heavily editorialized on by me

Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com
2017-04-07 22:20:13 -04:00
Simon Riggs ac2b095088 Reset API of clause_selectivity()
Discussion: https://postgr.es/m/CAKJS1f9yurJQW9pdnzL+rmOtsp2vOytkpXKGnMFJEO-qz5O5eA@mail.gmail.com
2017-04-06 19:10:51 -04:00
Alvaro Herrera b1fc51a36e Comment fixes for extended statistics
Clean up some code comments in new extended statistics code, from
7b504eb282.
2017-04-06 12:28:50 -03:00
Simon Riggs 2686ee1b7c Collect and use multi-column dependency stats
Follow on patch in the multi-variate statistics patch series.

CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
ANALYZE;
will collect dependency stats on (a, b) and then use the measured
dependency in subsequent query planning.

Commit 7b504eb282 added
CREATE STATISTICS with n-distinct coefficients. These are now
specified using the mutually exclusive option WITH (ndistinct).

Author: Tomas Vondra, David Rowley
Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas
and many other comments and contributions
Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
2017-04-05 18:00:42 -04:00
Robert Haas 7a39b5e4d1 Abstract logic to allow for multiple kinds of child rels.
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels.  To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)

Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel.  Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.

Ashutosh Bapat, slightly tweaked by me.  Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.

Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
2017-04-03 22:41:31 -04:00
Kevin Grittner 18ce3a4ab2 Add infrastructure to support EphemeralNamedRelation references.
A QueryEnvironment concept is added, which allows new types of
objects to be passed into queries from parsing on through
execution.  At this point, the only thing implemented is a
collection of EphemeralNamedRelation objects -- relations which
can be referenced by name in queries, but do not exist in the
catalogs.  The only type of ENR implemented is NamedTuplestore, but
provision is made to add more types fairly easily.

An ENR can carry its own TupleDesc or reference a relation in the
catalogs by relid.

Although these features can be used without SPI, convenience
functions are added to SPI so that ENRs can easily be used by code
run through SPI.

The initial use of all this is going to be transition tables in
AFTER triggers, but that will be added to each PL as a separate
commit.

An incidental effect of this patch is to produce a more informative
error message if an attempt is made to modify the contents of a CTE
from a referencing DML statement.  No tests previously covered that
possibility, so one is added.

Kevin Grittner and Thomas Munro
Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro
with valuable comments and suggestions from many others
2017-03-31 23:17:18 -05:00
Robert Haas 7d8f6986b8 Fix parallel query so it doesn't spoil row estimates above Gather.
Commit 45be99f8cd removed GatherPath's
num_workers field, but this is entirely bogus.  Normally, a path's
parallel_workers flag is supposed to indicate the number of workers
that it wants, and should be 0 for a non-partial path.  In that
commit, I mistakenly thought that GatherPath could also use that field
to indicate the number of workers that it would try to start, but
that's disastrous, because then it can propagate up to higher nodes in
the plan tree, which will then get incorrect rowcounts because the
parallel_workers flag is involved in computing those values.  Repair
by putting the separate field back.

Report by Tomas Vondra.  Patch by me, reviewed by Amit Kapila.

Discussion: http://postgr.es/m/f91b4a44-f739-04bd-c4b6-f135bd643669@2ndquadrant.com
2017-03-31 21:01:20 -04:00
Peter Eisentraut 4cb824699e Cast result of copyObject() to correct type
copyObject() is declared to return void *, which allows easily assigning
the result independent of the input, but it loses all type checking.

If the compiler supports typeof or something similar, cast the result to
the input type.  This creates a greater amount of type safety.  In some
cases, where the result is assigned to a generic type such as Node * or
Expr *, new casts are now necessary, but in general casts are now
unnecessary in the normal case and indicate that something unusual is
happening.

Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
2017-03-28 21:59:23 -04:00
Andrew Gierth b5635948ab Support hashed aggregation with grouping sets.
This extends the Aggregate node with two new features: HashAggregate
can now run multiple hashtables concurrently, and a new strategy
MixedAggregate populates hashtables while doing sorted grouping.

The planner will now attempt to save as many sorts as possible when
planning grouping sets queries, while not exceeding work_mem for the
estimated combined sizes of all hashtables used.  No SQL-level changes
are required.  There should be no user-visible impact other than the
new EXPLAIN output and possible changes to result ordering when ORDER
BY was not used (which affected a few regression tests).  The
enable_hashagg option is respected.

Author: Andrew Gierth
Reviewers: Mark Dilger, Andres Freund
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
2017-03-27 04:20:54 +01:00
Tom Lane 244dd95ce9 Update some obsolete comments.
Fix a few stray references to expression eval functions that don't
exist anymore or don't take the same input representation they used to.
2017-03-26 11:36:46 -04:00
Andres Freund b8d7f053c5 Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.

This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.

The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
  function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
  out operation metadata sequentially; including the avoidance of
  nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
  constant re-checks at evaluation time

Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.

The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
  overhead of expression evaluation, by caching state in prepared
  statements.  That'd be helpful in OLTPish scenarios where
  initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
  work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
  been made here too.

The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
  initialization, whereas previously they were done during
  execution. In edge cases this can lead to errors being raised that
  previously wouldn't have been, e.g. a NULL array being coerced to a
  different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
  during expression initialization, previously it was re-built
  every time a domain check was evaluated. For normal queries this
  doesn't change much, but e.g. for plpgsql functions, which caches
  ExprStates, the old set could stick around longer.  The behavior
  around might still change.

Author: Andres Freund, with significant changes by Tom Lane,
	changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-25 14:52:06 -07:00
Alvaro Herrera 7b504eb282 Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns.  Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too.  All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.

This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table.  This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve.  A new
special pseudo-type pg_ndistinct is used.

(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)

Author: Tomas Vondra.  Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
    Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
    https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 14:06:10 -03:00
Robert Haas d3cc37f1d8 Don't scan partitioned tables.
Partitioned tables do not contain any data; only their unpartitioned
descendents need to be scanned.  However, the partitioned tables still
need to be locked, even though they're not scanned.  To make that
work, Append and MergeAppend relations now need to carry a list of
(unscanned) partitioned relations that must be locked, and InitPlan
must lock all partitioned result relations.

Aside from the obvious advantage of avoiding some work at execution
time, this has two other advantages.  First, it may improve the
planner's decision-making in some cases since the empty relation
might throw things off.  Second, it paves the way to getting rid of
the storage for partitioned tables altogether.

Amit Langote, reviewed by me.

Discussion: http://postgr.es/m/6837c359-45c4-8044-34d1-736756335a15@lab.ntt.co.jp
2017-03-21 09:48:04 -04:00
Robert Haas c44c47a773 Some preliminary refactoring towards partitionwise join.
Partitionwise join proposes add a concept of child join relations,
which will have the same relationship with join relations as "other
member" relations do with base relations.  These relations will need
some but not all of the handling that we currently have for join
relations, and some but not all of the handling that we currently have
for appendrels, since they are a mix of the two.  Refactor a little
bit so that the necessary bits of logic are exposed as separate
functions.

Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi and
by me.

Discussion: http://postgr.es/m/CAFjFpRfqotRR6cM3sooBHMHEVdkFfAZ6PyYg4GRZsoMuW08HjQ@mail.gmail.com
2017-03-14 19:25:47 -04:00
Robert Haas 355d3993c5 Add a Gather Merge executor node.
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers.  (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)

This works out to a win if it saves us from having to perform an
expensive Sort.  In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't.  But if this avoids an expensive sort then it's a win.

Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.

Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
2017-03-09 07:49:29 -05:00
Robert Haas f35742ccb7 Support parallel bitmap heap scans.
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.

Dilip Kumar, with some corrections and cosmetic changes by me.  The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.

Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
2017-03-08 12:05:43 -05:00
Alvaro Herrera fcec6caafa Support XMLTABLE query expression
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.

This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE.  (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).

The implemented syntax deviates slightly from what the standard
requires.  First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only.  Second, we don't
currently support a default namespace to be specified.

This implementation relies on a new executor node based on a hardcoded
method table.  (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)

Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
2017-03-08 12:40:26 -03:00
Peter Eisentraut 38d103763d Make more use of castNode() 2017-02-21 11:59:09 -05:00
Robert Haas 5262f7a4fc Add optimizer and executor support for parallel index scans.
In combination with 569174f1be, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes.  This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.

Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
2017-02-15 13:53:24 -05:00
Robert Haas 5e6d8d2bbb Allow parallel workers to execute subplans.
This doesn't do anything to make Param nodes anything other than
parallel-restricted, so this only helps with uncorrelated subplans,
and it's not necessarily very cheap because each worker will run the
subplan separately (just as a Hash Join will build a separate copy of
the hash table in each participating process), but it's a first step
toward supporting cases that are more likely to help in practice, and
is occasionally useful on its own.

Amit Kapila, reviewed and tested by Rafia Sabih, Dilip Kumar, and
me.

Discussion: http://postgr.es/m/CAA4eK1+e8Z45D2n+rnDMDYsVEb5iW7jqaCH_tvPMYau=1Rru9w@mail.gmail.com
2017-02-14 18:16:03 -05:00
Heikki Linnakangas 181bdb90ba Fix typos in comments.
Backpatch to all supported versions, where applicable, to make backpatching
of future fixes go more smoothly.

Josh Soref

Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
2017-02-06 11:33:58 +02:00
Tom Lane c82d4e658e Fix mishandling of tSRFs at different nesting levels.
Given a targetlist like "srf(x), f(srf(x))", split_pathtarget_at_srfs()
decided that it needed two levels of ProjectSet nodes, failing to notice
that the two SRF calls are textually equal().  Because of that, setrefs.c
would convert the upper ProjectSet's tlist to "Var1, f(Var1)" (where Var1
represents a reference to the srf(x) output of the lower ProjectSet).
This triggered an assertion in nodeProjectSet.c complaining that it found
no SRFs to evaluate, as reported by Erik Rijkers.

What we want in such a case is to evaluate srf(x) only once and use a plain
Result node to compute "Var1, f(Var1)"; that gives results similar to what
previous versions produced, whereas allowing srf(x) to be evaluated again
in an upper ProjectSet would square the number of rows emitted.

Furthermore, even if the SRF calls aren't textually identical, we want them
to be evaluated in lockstep, because that's what happened in the old
implementation.  But split_pathtarget_at_srfs() got this completely wrong,
using two levels of ProjectSet for a case like "srf(x), f(srf(y))".

Hence, rewrite split_pathtarget_at_srfs() from the ground up so that it
groups SRFs according to the depth of nesting of SRFs in their arguments.
This is pretty much how we envisioned that working originally, but I blew
it when it came to implementation.

In passing, optimize the case of target == input_target, which I noticed
is not only possible but quite common.

Discussion: https://postgr.es/m/dcbd2853c05d22088766553d60dc78c6@xs4all.nl
2017-02-02 16:38:18 -05:00
Andres Freund ea15e18677 Remove obsoleted code relating to targetlist SRF evaluation.
Since 69f4b9c plain expression evaluation (and thus normal projection)
can't return sets of tuples anymore. Thus remove code dealing with
that possibility.

This will require adjustments in external code using
ExecEvalExpr()/ExecProject() - that should neither be hard nor very
common.

Author: Andres Freund and Tom Lane
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-19 14:40:41 -08:00
Andres Freund 69f4b9c85f Move targetlist SRF handling from expression evaluation to new executor node.
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.

This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.

That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.

To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist.  The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes.  The ProjectSet nodes always get
input from an underlying node.

We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.

While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e.  continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones.  We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.

As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.

Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node.  As that's quite confusing, we're
likely to instead prohibit SRFs in those places.  But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.

There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.

Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-18 13:40:27 -08:00
Tom Lane 215b43cdc8 Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows.  The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.

To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation.  Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so.  This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.

This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only.  Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.

Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.

Patch by me, reviewed by Stephen Frost and Dean Rasheed.

Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 12:58:20 -05:00
Tom Lane ab1f0c8225 Change representation of statement lists, and add statement location info.
This patch makes several changes that improve the consistency of
representation of lists of statements.  It's always been the case
that the output of parse analysis is a list of Query nodes, whatever
the types of the individual statements in the list.  This patch brings
similar consistency to the outputs of raw parsing and planning steps:

* The output of raw parsing is now always a list of RawStmt nodes;
the statement-type-dependent nodes are one level down from that.

* The output of pg_plan_queries() is now always a list of PlannedStmt
nodes, even for utility statements.  In the case of a utility statement,
"planning" just consists of wrapping a CMD_UTILITY PlannedStmt around
the utility node.  This list representation is now used in Portal and
CachedPlan plan lists, replacing the former convention of intermixing
PlannedStmts with bare utility-statement nodes.

Now, every list of statements has a consistent head-node type depending
on how far along it is in processing.  This allows changing many places
that formerly used generic "Node *" pointers to use a more specific
pointer type, thus reducing the number of IsA() tests and casts needed,
as well as improving code clarity.

Also, the post-parse-analysis representation of DECLARE CURSOR is changed
so that it looks more like EXPLAIN, PREPARE, etc.  That is, the contained
SELECT remains a child of the DeclareCursorStmt rather than getting flipped
around to be the other way.  It's now true for both Query and PlannedStmt
that utilityStmt is non-null if and only if commandType is CMD_UTILITY.
That allows simplifying a lot of places that were testing both fields.
(I think some of those were just defensive programming, but in many places,
it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.)

Because PlannedStmt carries a canSetTag field, we're also able to get rid
of some ad-hoc rules about how to reconstruct canSetTag for a bare utility
statement; specifically, the assumption that a utility is canSetTag if and
only if it's the only one in its list.  While I see no near-term need for
relaxing that restriction, it's nice to get rid of the ad-hocery.

The API of ProcessUtility() is changed so that what it's passed is the
wrapper PlannedStmt not just the bare utility statement.  This will affect
all users of ProcessUtility_hook, but the changes are pretty trivial; see
the affected contrib modules for examples of the minimum change needed.
(Most compilers should give pointer-type-mismatch warnings for uncorrected
code.)

There's also a change in the API of ExplainOneQuery_hook, to pass through
cursorOptions instead of expecting hook functions to know what to pick.
This is needed because of the DECLARE CURSOR changes, but really should
have been done in 9.6; it's unlikely that any extant hook functions
know about using CURSOR_OPT_PARALLEL_OK.

Finally, teach gram.y to save statement boundary locations in RawStmt
nodes, and pass those through to Query and PlannedStmt nodes.  This allows
more intelligent handling of cases where a source query string contains
multiple statements.  This patch doesn't actually do anything with the
information, but a follow-on patch will.  (Passing this information through
cleanly is the true motivation for these changes; while I think this is all
good cleanup, it's unlikely we'd have bothered without this end goal.)

catversion bump because addition of location fields to struct Query
affects stored rules.

This patch is by me, but it owes a good deal to Fabien Coelho who did
a lot of preliminary work on the problem, and also reviewed the patch.

Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
2017-01-14 16:02:35 -05:00
Robert Haas 18fc5192a6 Remove unnecessary arguments from partitioning functions.
RelationGetPartitionQual() and generate_partition_qual() are always
called with recurse = true, so we don't need an argument for that.

Extracted by me from a larger patch by Amit Langote.
2017-01-04 14:56:37 -05:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Robert Haas f0e44751d7 Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.

Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.

Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.

Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.
2016-12-07 13:17:55 -05:00
Tom Lane 4e20511d5b Fix estimate_expression_value to constant-fold SQLValueFunction nodes.
Oversight in my commit 0bb51aa96.  Noted while poking at a recent
bug report --- HEAD's estimates for a query using CURRENT_DATE
were unexpectedly much worse than 9.6's.
2016-11-28 19:08:45 -05:00
Alvaro Herrera eb68141688 Fix get_relation_info name typo'ed in a comment
Plus add a missing comment about this in get_relation_info itself.

Author: Amit Langote
Discussion: https://postgr.es/m/e46c0569-0449-afa0-e2fe-f3776e4b3fd5@lab.ntt.co.jp
2016-11-28 15:56:00 -03:00
Tom Lane 4324ade9a6 Fix optimization for skipping searches for parallel-query hazards.
Fix thinko in commit da1c91631: even if the original query was free of
parallel hazards, we might introduce such a hazard by adding PARAM_EXEC
Param nodes.  Adjust is_parallel_safe() so that it will scan the given
expression whenever any such nodes have been created.  Per report from
Andreas Seltenreich.

Discussion: <878tse6yvf.fsf@credativ.de>
2016-11-21 13:19:23 -05:00
Tom Lane 0832f2db68 Fix latent costing error in create_merge_append_path.
create_merge_append_path should use the path rowcount it just computed,
not rel->tuples, for costing purposes.  Those numbers should always be
the same at present, but if we ever support parameterized MergeAppend
paths (a case this function is otherwise prepared for), the former would
be right and the latter wrong.

No need for back-patch since the problem is only latent.

Ashutosh Bapat

Discussion: <CAFjFpRek+cLCnTo24youuGtsq4zRphEB8EUUPjDxZjnL4n4HYQ@mail.gmail.com>
2016-11-19 15:06:45 -05:00
Tom Lane 770671062f Don't make FK-based selectivity estimates in inheritance situations.
The foreign-key-aware logic for estimation of join sizes (added in commit
100340e2d) blindly tried to apply the concept to rels that are actually
parents of inheritance trees.  This is just plain wrong so far as the
referenced relation is concerned, since the inheritance scan may well
produce lots of rows that are not participating in the constraint.  It's
wrong for the referencing relation too, for the same reason; although on
that end we could conceivably detect whether all members of the inheritance
tree have equivalent FK constraints pointing to the same referenced rel,
and then proceed more or less as we do now.  But pending somebody writing
code to do that, we must disable this, because it's producing completely
silly estimates when there's an FK linking the heads of inheritance trees.

Per bug #14404 from Clinton Adams.  Back-patch to 9.6 where the new
estimation logic came in.

Report: <20161028200412.15987.96482@wrigleys.postgresql.org>
2016-11-02 15:50:15 -04:00
Tom Lane 9a00f03e47 Improve speed of aggregates that use array_append as transition function.
In the previous coding, if an aggregate's transition function returned an
expanded array, nodeAgg.c and nodeWindowAgg.c would always copy it and thus
force it into the flat representation.  This led to ping-ponging between
flat and expanded formats, which costs a lot.  For an aggregate using
array_append as transition function, I measured about a 15X slowdown
compared to the pre-9.5 code, when working on simple int[] arrays.
Of course, the old code was already O(N^2) in this usage due to copying
flat arrays all the time, but it wasn't quite this inefficient.

To fix, teach nodeAgg.c and nodeWindowAgg.c to allow expanded transition
values without copying, so long as the transition function takes care to
return the transition value already properly parented under the aggcontext.
That puts a bit of extra responsibility on the transition function, but
doing it this way allows us to not need any extra logic in the fast path
of advance_transition_function (ie, with a pass-by-value transition value,
or with a modified-in-place pass-by-reference value).  We already know
that that's a hot spot so I'm loath to add any cycles at all there.  Also,
while only array_append currently knows how to follow this convention,
this solution allows other transition functions to opt-in without needing
to have a whitelist in the core aggregation code.

(The reason we would need a whitelist is that currently, if you pass a
R/W expanded-object pointer to an arbitrary function, it's allowed to do
anything with it including deleting it; that breaks the core agg code's
assumption that it should free discarded values.  Returning a value under
aggcontext is the transition function's signal that it knows it is an
aggregate transition function and will play nice.  Possibly the API rules
for expanded objects should be refined, but that would not be a
back-patchable change.)

With this fix, an aggregate using array_append is no longer O(N^2), so it's
much faster than pre-9.5 code rather than much slower.  It's still a bit
slower than the bespoke infrastructure for array_agg, but the differential
seems to be only about 10%-20% rather than orders of magnitude.

Discussion: <6315.1477677885@sss.pgh.pa.us>
2016-10-30 12:27:41 -04:00
Tom Lane a4c35ea1c2 Improve parser's and planner's handling of set-returning functions.
Teach the parser to reject misplaced set-returning functions during parse
analysis using p_expr_kind, in much the same way as we do for aggregates
and window functions (cf commit eaccfded9).  While this isn't complete
(it misses nesting-based restrictions), it's much better than the previous
error reporting for such cases, and it allows elimination of assorted
ad-hoc expression_returns_set() error checks.  We could add nesting checks
later if it seems important to catch all cases at parse time.

There is one case the parser will now throw error for although previous
versions allowed it, which is SRFs in the tlist of an UPDATE.  That never
behaved sensibly (since it's ill-defined which generated row should be
used to perform the update) and it's hard to see why it should not be
treated as an error.  It's a release-note-worthy change though.

Also, add a new Query field hasTargetSRFs reporting whether there are
any SRFs in the targetlist (including GROUP BY/ORDER BY expressions).
The parser can now set that basically for free during parse analysis,
and we can use it in a number of places to avoid expression_returns_set
searches.  (There will be more such checks soon.)  In some places, this
allows decontorting the logic since it's no longer expensive to check for
SRFs in the tlist --- so I made the checks parallel to the handling of
hasAggs/hasWindowFuncs wherever it seemed appropriate.

catversion bump because adding a Query field changes stored rules.

Andres Freund and Tom Lane

Discussion: <24639.1473782855@sss.pgh.pa.us>
2016-09-13 13:54:24 -04:00
Tom Lane ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Tom Lane da1c91631e Speed up planner's scanning for parallel-query hazards.
We need to scan the whole parse tree for parallel-unsafe functions.
If there are none, we'll later need to determine whether particular
subtrees contain any parallel-restricted functions.  The previous coding
retained no knowledge from the first scan, even though this is very
wasteful in the common case where the query contains only parallel-safe
functions.  We can bypass all of the later scans by remembering that fact.
This provides a small but measurable speed improvement when the case
applies, and shouldn't cost anything when it doesn't.

Patch by me, reviewed by Robert Haas

Discussion: <3740.1471538387@sss.pgh.pa.us>
2016-08-19 14:03:13 -04:00
Tom Lane 0bb51aa967 Improve parsetree representation of special functions such as CURRENT_DATE.
We implement a dozen or so parameterless functions that the SQL standard
defines special syntax for.  Up to now, that was done by converting them
into more or less ad-hoc constructs such as "'now'::text::date".  That's
messy for multiple reasons: it exposes what should be implementation
details to users, and performance is worse than it needs to be in several
cases.  To improve matters, invent a new expression node type
SQLValueFunction that can represent any of these parameterless functions.

Bump catversion because this changes stored parsetrees for rules.

Discussion: <30058.1463091294@sss.pgh.pa.us>
2016-08-16 20:33:01 -04:00
Tom Lane f0c7b789ab Fix two errors with nested CASE/WHEN constructs.
ExecEvalCase() tried to save a cycle or two by passing
&econtext->caseValue_isNull as the isNull argument to its sub-evaluation of
the CASE value expression.  If that subexpression itself contained a CASE,
then *isNull was an alias for econtext->caseValue_isNull within the
recursive call of ExecEvalCase(), leading to confusion about whether the
inner call's caseValue was null or not.  In the worst case this could lead
to a core dump due to dereferencing a null pointer.  Fix by not assigning
to the global variable until control comes back from the subexpression.
Also, avoid using the passed-in isNull pointer transiently for evaluation
of WHEN expressions.  (Either one of these changes would have been
sufficient to fix the known misbehavior, but it's clear now that each of
these choices was in itself dangerous coding practice and best avoided.
There do not seem to be any similar hazards elsewhere in execQual.c.)

Also, it was possible for inlining of a SQL function that implements the
equality operator used for a CASE comparison to result in one CASE
expression's CaseTestExpr node being inserted inside another CASE
expression.  This would certainly result in wrong answers since the
improperly nested CaseTestExpr would be caused to return the inner CASE's
comparison value not the outer's.  If the CASE values were of different
data types, a crash might result; moreover such situations could be abused
to allow disclosure of portions of server memory.  To fix, teach
inline_function to check for "bare" CaseTestExpr nodes in the arguments of
a function to be inlined, and avoid inlining if there are any.

Heikki Linnakangas, Michael Paquier, Tom Lane

Report: https://github.com/greenplum-db/gpdb/pull/327
Report: <4DDCEEB8.50602@enterprisedb.com>
Security: CVE-2016-5423
2016-08-08 10:33:46 -04:00
Tom Lane 9492cf86e4 Fix assorted fallout from IS [NOT] NULL patch.
Commits 4452000f3 et al established semantics for NullTest.argisrow that
are a bit different from its initial conception: rather than being merely
a cache of whether we've determined the input to have composite type,
the flag now has the further meaning that we should apply field-by-field
testing as per the standard's definition of IS [NOT] NULL.  If argisrow
is false and yet the input has composite type, the construct instead has
the semantics of IS [NOT] DISTINCT FROM NULL.  Update the comments in
primnodes.h to clarify this, and fix ruleutils.c and deparse.c to print
such cases correctly.  In the case of ruleutils.c, this merely results in
cosmetic changes in EXPLAIN output, since the case can't currently arise
in stored rules.  However, it represents a live bug for deparse.c, which
would formerly have sent a remote query that had semantics different
from the local behavior.  (From the user's standpoint, this means that
testing a remote nested-composite column for null-ness could have had
unexpected recursive behavior much like that fixed in 4452000f3.)

In a related but somewhat independent fix, make plancat.c set argisrow
to false in all NullTest expressions constructed to represent "attnotnull"
constructs.  Since attnotnull is actually enforced as a simple null-value
check, this is a more accurate representation of the semantics; we were
previously overpromising what it meant for composite columns, which might
possibly lead to incorrect planner optimizations.  (It seems that what the
SQL spec expects a NOT NULL constraint to mean is an IS NOT NULL test, so
arguably we are violating the spec and should fix attnotnull to do the
other thing.  If we ever do, this part should get reverted.)

Back-patch, same as the previous commit.

Discussion: <10682.1469566308@sss.pgh.pa.us>
2016-07-28 16:09:15 -04:00
Tom Lane 4452000f31 Fix constant-folding of ROW(...) IS [NOT] NULL with composite fields.
The SQL standard appears to specify that IS [NOT] NULL's tests of field
nullness are non-recursive, ie, we shouldn't consider that a composite
field with value ROW(NULL,NULL) is null for this purpose.
ExecEvalNullTest got this right, but eval_const_expressions did not,
leading to weird inconsistencies depending on whether the expression
was such that the planner could apply constant folding.

Also, adjust the docs to mention that IS [NOT] DISTINCT FROM NULL can be
used as a substitute test if a simple null check is wanted for a rowtype
argument.  That motivated reordering things so that IS [NOT] DISTINCT FROM
is described before IS [NOT] NULL.  In HEAD, I went a bit further and added
a table showing all the comparison-related predicates.

Per bug #14235.  Back-patch to all supported branches, since it's certainly
undesirable that constant-folding should change the semantics.

Report and patch by Andrew Gierth; assorted wordsmithing and revised
regression test cases by me.

Report: <20160708024746.1410.57282@wrigleys.postgresql.org>
2016-07-26 15:25:02 -04:00
Tom Lane 45639a0525 Avoid invalidating all foreign-join cached plans when user mappings change.
We must not push down a foreign join when the foreign tables involved
should be accessed under different user mappings.  Previously we tried
to enforce that rule literally during planning, but that meant that the
resulting plans were dependent on the current contents of the
pg_user_mapping catalog, and we had to blow away all cached plans
containing any remote join when anything at all changed in pg_user_mapping.
This could have been improved somewhat, but the fact that a syscache inval
callback has very limited info about what changed made it hard to do better
within that design.  Instead, let's change the planner to not consider user
mappings per se, but to allow a foreign join if both RTEs have the same
checkAsUser value.  If they do, then they necessarily will use the same
user mapping at runtime, and we don't need to know specifically which one
that is.  Post-plan-time changes in pg_user_mapping no longer require any
plan invalidation.

This rule does give up some optimization ability, to wit where two foreign
table references come from views with different owners or one's from a view
and one's directly in the query, but nonetheless the same user mapping
would have applied.  We'll sacrifice the first case, but to not regress
more than we have to in the second case, allow a foreign join involving
both zero and nonzero checkAsUser values if the nonzero one is the same as
the prevailing effective userID.  In that case, mark the plan as only
runnable by that userID.

The plancache code already had a notion of plans being userID-specific,
in order to support RLS.  It was a little confused though, in particular
lacking clarity of thought as to whether it was the rewritten query or just
the finished plan that's dependent on the userID.  Rearrange that code so
that it's clearer what depends on which, and so that the same logic applies
to both RLS-injected role dependency and foreign-join-injected role
dependency.

Note that this patch doesn't remove the other issue mentioned in the
original complaint, which is that while we'll reliably stop using a foreign
join if it's disallowed in a new context, we might fail to start using a
foreign join if it's now allowed, but we previously created a generic
cached plan that didn't use one.  It was agreed that the chance of winning
that way was not high enough to justify the much larger number of plan
invalidations that would have to occur if we tried to cause it to happen.

In passing, clean up randomly-varying spelling of EXPLAIN commands in
postgres_fdw.sql, and fix a COSTS ON example that had been allowed to
leak into the committed tests.

This reverts most of commits fbe5a3fb7 and 5d4171d1c, which were the
previous attempt at ensuring we wouldn't push down foreign joins that
span permissions contexts.

Etsuro Fujita and Tom Lane

Discussion: <d49c1e5b-f059-20f4-c132-e9752ee0113e@lab.ntt.co.jp>
2016-07-15 17:23:02 -04:00
Peter Eisentraut 63cfdb8dde Adjust spellings of forms of "cancel" 2016-07-14 22:48:26 -04:00
Robert Haas 5ce5e4a12e Set consider_parallel correctly for upper planner rels.
Commit 3fc6e2d7f5 introduced new "upper"
RelOptInfo structures but didn't set consider_parallel for them
correctly, a point I completely missed when reviewing it.  Later,
commit e06a38965b made the situation
worse by doing it incorrectly for the grouping relation.  Try to
straighten all of that out.  Along the way, get rid of the annoying
wholePlanParallelSafe flag, which was only necessarily because of
the fact that upper planning stages didn't use paths at the time
that code was written.

The most important immediate impact of these changes is that
force_parallel_mode will provide useful test coverage in quite a few
more scenarios than it did previously, but it's also necessary
preparation for fixing some problems related to subqueries.

Patch by me, reviewed by Tom Lane.
2016-07-01 11:52:56 -04:00
Tom Lane f1993038a4 Avoid making a separate pass over the query to check for partializability.
It's rather silly to make a separate pass over the tlist + HAVING qual,
and a separate set of visits to the syscache, when get_agg_clause_costs
already has all the required information in hand.  This nets out as less
code as well as fewer cycles.
2016-06-26 15:55:01 -04:00
Tom Lane 19e972d558 Rethink node-level representation of partial-aggregation modes.
The original coding had three separate booleans representing partial
aggregation behavior, which was confusing, unreadable, and error-prone,
not least because the booleans weren't always listed in the same order.
It was also inadequate for the allegedly-desirable future extension to
support intermediate partial aggregation, because we'd need separate
markers for serialization and deserialization in such a case.

Merge these bools into an enum "AggSplit" to provide symbolic names for
the supported operating modes (and document what those are).  By assigning
the values of the enum constants carefully, we can treat AggSplit values
as options bitmasks so that tests of what to do aren't noticeably more
expensive than before.

While at it, get rid of Aggref.aggoutputtype.  That's not needed since
commit 59a3795c2 got rid of setrefs.c's special-purpose Aggref comparison
code, and it likewise seemed more confusing than helpful.

Assorted comment cleanup as well (there's still more that I want to do
in that line).

catversion bump for change in Aggref node contents.  Should be the last
one for partial-aggregation changes.

Discussion: <29309.1466699160@sss.pgh.pa.us>
2016-06-26 14:33:38 -04:00
Tom Lane 59a3795c25 Simplify planner's final setup of Aggrefs for partial aggregation.
Commit e06a38965's original coding for constructing the execution-time
expression tree for a combining aggregate was rather messy, involving
duplicating quite a lot of code in setrefs.c so that it could inject
a nonstandard matching rule for Aggrefs.  Get rid of that in favor of
explicitly constructing a combining Aggref with a partial Aggref as input,
then allowing setref's normal matching logic to match the partial Aggref
to the output of the lower plan node and hence replace it with a Var.

In passing, rename and redocument make_partialgroup_input_target to have
some connection to what it actually does.
2016-06-26 12:08:12 -04:00
Tom Lane f8ace5477e Fix type-safety problem with parallel aggregate serial/deserialization.
The original specification for this called for the deserialization function
to have signature "deserialize(serialtype) returns transtype", which is a
security violation if transtype is INTERNAL (which it always would be in
practice) and serialtype is not (which ditto).  The patch blithely overrode
the opr_sanity check for that, which was sloppy-enough work in itself,
but the indisputable reason this cannot be allowed to stand is that CREATE
FUNCTION will reject such a signature and thus it'd be impossible for
extensions to create parallelizable aggregates.

The minimum fix to make the signature type-safe is to add a second, dummy
argument of type INTERNAL.  But to lock it down a bit more and make misuse
of INTERNAL-accepting functions less likely, let's get rid of the ability
to specify a "serialtype" for an aggregate and just say that the only
useful serialtype is BYTEA --- which, in practice, is the only interesting
value anyway, due to the usefulness of the send/recv infrastructure for
this purpose.  That means we only have to allow "serialize(internal)
returns bytea" and "deserialize(bytea, internal) returns internal" as
the signatures for these support functions.

In passing fix bogus signature of int4_avg_combine, which I found thanks
to adding an opr_sanity check on combinefunc signatures.

catversion bump due to removing pg_aggregate.aggserialtype and adjusting
signatures of assorted built-in functions.

David Rowley and Tom Lane

Discussion: <27247.1466185504@sss.pgh.pa.us>
2016-06-22 16:52:41 -04:00
Tom Lane 8b9d323cb9 Refactor planning of projection steps that don't need a Result plan node.
The original upper-planner-pathification design (commit 3fc6e2d7f5)
assumed that we could always determine during Path formation whether or not
we would need a Result plan node to perform projection of a targetlist.
That turns out not to work very well, though, because createplan.c still
has some responsibilities for choosing the specific target list associated
with sorting/grouping nodes (in particular it might choose to add resjunk
columns for sorting).  We might not ever refactor that --- doing so would
push more work into Path formation, which isn't attractive --- and we
certainly won't do so for 9.6.  So, while create_projection_path and
apply_projection_to_path can tell for sure what will happen if the subpath
is projection-capable, they can't tell for sure when it isn't.  This is at
least a latent bug in apply_projection_to_path, which might think it can
apply a target to a non-projecting node when the node will end up computing
something different.

Also, I'd tied the creation of a ProjectionPath node to whether or not a
Result is needed, but it turns out that we sometimes need a ProjectionPath
node anyway to avoid modifying a possibly-shared subpath node.  Callers had
to use create_projection_path for such cases, and we added code to them
that knew about the potential omission of a Result node and attempted to
adjust the cost estimates for that.  That was uncertainly correct and
definitely ugly/unmaintainable.

To fix, have create_projection_path explicitly check whether a Result
is needed and adjust its cost estimate accordingly, though it creates
a ProjectionPath in either case.  apply_projection_to_path is now mostly
just an optimized version that can avoid creating an extra Path node when
the input is known to not be shared with any other live path.  (There
is one case that create_projection_path doesn't handle, which is pushing
parallel-safe expressions below a Gather node.  We could make it do that
by duplicating the GatherPath, but there seems no need as yet.)

create_projection_plan still has to recheck the tlist-match condition,
which means that if the matching situation does get changed by createplan.c
then we'll have made a slightly incorrect cost estimate.  But there seems
no help for that in the near term, and I doubt it occurs often enough,
let alone would change planning decisions often enough, to be worth
stressing about.

I added a "dummypp" field to ProjectionPath to track whether
create_projection_path thinks a Result is needed.  This is not really
necessary as-committed because create_projection_plan doesn't look at the
flag; but it seems like a good idea to remember what we thought when
forming the cost estimate, if only for debugging purposes.

In passing, get rid of the target_parallel parameter added to
apply_projection_to_path by commit 54f5c5150.  I don't think that's a good
idea because it involves callers in what should be an internal decision,
and opens us up to missing optimization opportunities if callers think they
don't need to provide a valid flag, as most don't.  For the moment, this
just costs us an extra has_parallel_hazard call when planning a Gather.
If that starts to look expensive, I think a better solution would be to
teach PathTarget to carry/cache knowledge of parallel-safety of its
contents.
2016-06-21 18:38:20 -04:00
Tom Lane 100340e2dc Restore foreign-key-aware estimation of join relation sizes.
This patch provides a new implementation of the logic added by commit
137805f89 and later removed by 77ba61080.  It differs from the original
primarily in expending much less effort per joinrel in large queries,
which it accomplishes by doing most of the matching work once per query not
once per joinrel.  Hopefully, it's also less buggy and better commented.
The never-documented enable_fkey_estimates GUC remains gone.

There remains work to be done to make the selectivity estimates account
for nulls in FK referencing columns; but that was true of the original
patch as well.  We may be able to address this point later in beta.
In the meantime, any error should be in the direction of overestimating
rather than underestimating joinrel sizes, which seems like the direction
we want to err in.

Tomas Vondra and Tom Lane

Discussion: <31041.1465069446@sss.pgh.pa.us>
2016-06-18 15:22:34 -04:00
Tom Lane 915b703e16 Fix handling of argument and result datatypes for partial aggregation.
When doing partial aggregation, the args list of the upper (combining)
Aggref node is replaced by a Var representing the output of the partial
aggregation steps, which has either the aggregate's transition data type
or a serialized representation of that.  However, nodeAgg.c blindly
continued to use the args list as an indication of the user-level argument
types.  This broke resolution of polymorphic transition datatypes at
executor startup (though it accidentally failed to fail for the ANYARRAY
case, which is likely the only one anyone had tested).  Moreover, the
constructed FuncExpr passed to the finalfunc contained completely wrong
information, which would have led to bogus answers or crashes for any case
where the finalfunc examined that information (which is only likely to be
with polymorphic aggregates using a non-polymorphic transition type).

As an independent bug, apply_partialaggref_adjustment neglected to resolve
a polymorphic transition datatype before assigning it as the output type
of the lower-level Aggref node.  This again accidentally failed to fail
for ANYARRAY but would be unlikely to work in other cases.

To fix the first problem, record the user-level argument types in a
separate OID-list field of Aggref, and look to that rather than the args
list when asking what the argument types were.  (It turns out to be
convenient to include any "direct" arguments in this list too, although
those are not currently subject to being overwritten.)

Rather than adding yet another resolve_aggregate_transtype() call to fix
the second problem, add an aggtranstype field to Aggref, and store the
resolved transition type OID there when the planner first computes it.
(By doing this in the planner and not the parser, we can allow the
aggregate's transition type to change from time to time, although no DDL
support yet exists for that.)  This saves nothing of consequence for
simple non-polymorphic aggregates, but for polymorphic transition types
we save a catalog lookup during executor startup as well as several
planner lookups that are new in 9.6 due to parallel query planning.

In passing, fix an error that was introduced into count_agg_clauses_walker
some time ago: it was applying exprTypmod() to something that wasn't an
expression node at all, but a TargetEntry.  exprTypmod silently returned
-1 so that there was not an obvious failure, but this broke the intended
sensitivity of aggregate space consumption estimates to the typmod of
varchar and similar data types.  This part needs to be back-patched.

Catversion bump due to change of stored Aggref nodes.

Discussion: <8229.1466109074@sss.pgh.pa.us>
2016-06-17 21:44:37 -04:00
Robert Haas 54f5c5150f Try again to fix the way the scanjoin_target is used with partial paths.
Commit 04ae11f62e removed some broken
code to apply the scan/join target to partial paths, but its theory
that this processing step is totally unnecessary turns out to be wrong.
Put similar code back again, but this time, check for parallel-safety
and avoid in-place modifications to paths that may already have been
used as part of some other path.

(This is not an entirely elegant solution to this problem; it might
be better, for example, to postpone generate_gather_paths for the
topmost scan/join rel until after the scan/join target has been
applied.  But this is not the time for such redesign work.)

Amit Kapila and Robert Haas
2016-06-17 16:29:07 -04:00
Tom Lane 3303ea1a32 Remove reltarget_has_non_vars flag.
Commit b12fd41c6 added a "reltarget_has_non_vars" field to RelOptInfo,
but failed to maintain it accurately.  Since its only purpose was to skip
calls to has_parallel_hazard() in the simple case where a rel's targetlist
is all Vars, and that call is really pretty cheap in that case anyway, it
seems like this is just a case of premature optimization.  Let's drop the
flag and do the calls unconditionally until it's proven that we need more
smarts here.
2016-06-10 16:20:03 -04:00
Tom Lane 2f153ddfdd Refactor to reduce code duplication for function property checking.
As noted by Andres Freund, we'd accumulated quite a few similar functions
in clauses.c that examine all functions in an expression tree to see if
they satisfy some boolean test.  Reduce the duplication by inventing a
function check_functions_in_node() that applies a simple callback function
to each SQL function OID appearing in a given expression node.  This also
fixes some arguable oversights; for example, contain_mutable_functions()
did not check aggregate or window functions for mutability.  I doubt that
that represents a live bug at the moment, because we don't really consider
mutability for aggregates; but it might someday be one.

I chose to put check_functions_in_node() in nodeFuncs.c because it seemed
like other modules might wish to use it in future.  That in turn forced
moving set_opfuncid() et al into nodeFuncs.c, as the alternative was for
nodeFuncs.c to depend on optimizer/setrefs.c which didn't seem very clean.

In passing, teach contain_leaked_vars_walker() about a few more expression
node types it can safely look through, and improve the rather messy and
undercommented code in has_parallel_hazard_walker().

Discussion: <20160527185853.ziol2os2zskahl7v@alap3.anarazel.de>
2016-06-10 16:03:46 -04:00
Tom Lane cae1c788b9 Improve the situation for parallel query versus temp relations.
Transmit the leader's temp-namespace state to workers.  This is important
because without it, the workers do not really have the same search path
as the leader.  For example, there is no good reason (and no extant code
either) to prevent a worker from executing a temp function that the
leader created previously; but as things stood it would fail to find the
temp function, and then either fail or execute the wrong function entirely.

We still prohibit a worker from creating a temp namespace on its own.
In effect, a worker can only see the session's temp namespace if the leader
had created it before starting the worker, which seems like the right
semantics.

Also, transmit the leader's BackendId to workers, and arrange for workers
to use that when determining the physical file path of a temp relation
belonging to their session.  While the original intent was to prevent such
accesses entirely, there were a number of holes in that, notably in places
like dbsize.c which assume they can safely access temp rels of other
sessions anyway.  We might as well get this right, as a small down payment
on someday allowing workers to access the leader's temp tables.  (With
this change, directly using "MyBackendId" as a relation or buffer backend
ID is deprecated; you should use BackendIdForTempRelations() instead.
I left a couple of such uses alone though, as they're not going to be
reachable in parallel workers until we do something about localbuf.c.)

Move the thou-shalt-not-access-thy-leader's-temp-tables prohibition down
into localbuf.c, which is where it actually matters, instead of having it
in relation_open().  This amounts to recognizing that access to temp
tables' catalog entries is perfectly safe in a worker, it's only the data
in local buffers that is problematic.

Having done all that, we can get rid of the test in has_parallel_hazard()
that says that use of a temp table's rowtype is unsafe in parallel workers.
That test was unduly expensive, and if we really did need such a
prohibition, that was not even close to being a bulletproof guard for it.
(For example, any user-defined function executed in a parallel worker
might have attempted such access.)
2016-06-09 20:16:11 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Robert Haas b12fd41c69 Don't generate parallel paths for rels with parallel-restricted outputs.
Such paths are unsafe.  To make it cheaper to detect when this case
applies, track whether a relation's default PathTarget contains any
non-Vars.  In most cases, the answer will be no, which enables us to
determine cheaply that the target list for a proposed path is
parallel-safe.  However, subquery pull-up can create cases that
require us to inspect the target list more carefully.

Amit Kapila, reviewed by me.
2016-06-09 12:43:36 -04:00
Tom Lane e4158319f3 Mop-up for parallel degree-ectomy.
Fix a couple of overlooked uses of "degree" terminology.  Make the parallel
worker count selection logic in create_plain_partial_paths more robust (in
particular, it failed with max_parallel_workers_per_gather set to zero).
2016-06-09 11:16:26 -04:00
Robert Haas c9ce4a1c61 Eliminate "parallel degree" terminology.
This terminology provoked widespread complaints.  So, instead, rename
the GUC max_parallel_degree to max_parallel_workers_per_gather
(leaving room for a possible future GUC max_parallel_workers that acts
as a system-wide limit), and rename the parallel_degree reloption to
parallel_workers.  Rename structure members to match.

These changes create a dump/restore hazard for users of PostgreSQL
9.6beta1 who have set the reloption (or applied the GUC using ALTER
USER or ALTER DATABASE).
2016-06-09 10:00:26 -04:00
Tom Lane 77ba610805 Revert "Use Foreign Key relationships to infer multi-column join selectivity".
This commit reverts 137805f89 as well as the associated commits 015e88942,
5306df283, and 68d704edb.  We found multiple bugs in this feature, and
there was concern about possible planner slowdown (though to be fair,
exhibiting a very large slowdown proved difficult).  The way forward
requires a considerable rewrite, which may or may not be possible to
accomplish in time for beta2.  In my judgment reviewing the rewrite will
be easier to accomplish starting from a clean slate, so let's temporarily
revert what's there now.  This also leaves us in a safe state if it turns
out to be necessary to postpone the rewrite to the next development cycle.

Discussion: <20160429102531.GA13701@huehner.biz>
2016-06-07 17:21:17 -04:00
Greg Stark e1623c3959 Fix various common mispellings.
Mostly these are just comments but there are a few in documentation
and a handful in code and tests. Hopefully this doesn't cause too much
unnecessary pain for backpatching. I relented from some of the most
common like "thru" for that reason. The rest don't seem numerous
enough to cause problems.

Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
2016-06-03 16:08:45 +01:00
Tom Lane aeb9ae6457 Disable physical tlist if any Var would need multiple sortgroupref labels.
As part of upper planner pathification (commit 3fc6e2d7f5) I redid
createplan.c's approach to the physical-tlist optimization, in which scan
nodes are allowed to return exactly the underlying table's columns so as
to save doing a projection step at runtime.  The logic was intentionally
more aggressive than before about applying the optimization, which is
generally a good thing, but Andres Freund found a case in which it got
too aggressive.  Namely, if any column is referenced more than once in
the parent plan node's sorting or grouping column list, we can't optimize
because then that column would need to have more than one ressortgroupref
label, and we only have space for one.

Add logic to detect this situation in use_physical_tlist(), and also add
some error checking in apply_pathtarget_labeling_to_tlist(), which this
example proves was being overly cavalier about whether what it was doing
made any sense.

The added test case exposes the problem only because we do not eliminate
duplicate grouping keys.  That might be something to fix someday, but it
doesn't seem like appropriate post-beta work.

Report: <20160526021235.w4nq7k3gnheg7vit@alap3.anarazel.de>
2016-05-26 14:52:30 -04:00
Tom Lane 8a13d5e6d1 Fix infer_arbiter_indexes() to not barf on system columns.
While it could be argued that rejecting system column mentions in the
ON CONFLICT list is an unsupported feature, falling over altogether
just because the table has a unique index on OID is indubitably a bug.

As far as I can tell, fixing infer_arbiter_indexes() is sufficient to
make ON CONFLICT (oid) actually work, though making a regression test
for that case is problematic because of the impossibility of setting
the OID counter to a known value.

Minor cosmetic cleanups along with the bug fix.
2016-05-11 17:06:53 -04:00
Tom Lane 26e66184d6 Fix assorted missing infrastructure for ON CONFLICT.
subquery_planner() failed to apply expression preprocessing to the
arbiterElems and arbiterWhere fields of an OnConflictExpr.  No doubt the
theory was that this wasn't necessary because we don't actually try to
execute those expressions; but that's wrong, because it results in failure
to match to index expressions or index predicates that are changed at all
by preprocessing.  Per bug #14132 from Reynold Smith.

Also add pullup_replace_vars processing for onConflictWhere.  Perhaps
it's impossible to have a subquery reference there, but I'm not exactly
convinced; and even if true today it's a failure waiting to happen.

Also add some comments to other places where one or another field of
OnConflictExpr is intentionally ignored, with explanation as to why it's
okay to do so.

Also, catalog/dependency.c failed to record any dependency on the named
constraint in ON CONFLICT ON CONSTRAINT, allowing such a constraint to
be dropped while rules exist that depend on it, and allowing pg_dump to
dump such a rule before the constraint it refers to.  The normal execution
path managed to error out reasonably for a dangling constraint reference,
but ruleutils.c dumped core; so in addition to fixing the omission, add
a protective check in ruleutils.c, since we can't retroactively add a
dependency in existing databases.

Back-patch to 9.5 where this code was introduced.

Report: <20160510190350.2608.48667@wrigleys.postgresql.org>
2016-05-11 16:20:23 -04:00
Tom Lane c45bf5751b Fix planner crash from pfree'ing a partial path that a GatherPath uses.
We mustn't run generate_gather_paths() during add_paths_to_joinrel(),
because that function can be invoked multiple times for the same target
joinrel.  Not only is it wasteful to build GatherPaths repeatedly, but
a later add_partial_path() could delete the partial path that a previously
created GatherPath depends on.  Instead establish the convention that we
do generate_gather_paths() for a rel only just before set_cheapest().

The code was accidentally not broken for baserels, because as of today there
never is more than one partial path for a baserel.  But that assumption
obviously has a pretty short half-life, so move the generate_gather_paths()
calls for those cases as well.

Also add some generic comments explaining how and why this all works.

Per fuzz testing by Andreas Seltenreich.

Report: <871t5pgwdt.fsf@credativ.de>
2016-04-30 12:29:21 -04:00
Tom Lane 207d5a656e Fix mishandling of equivalence-class tests in parameterized plans.
Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like

	Nested Loop
	  ->  Scan X
	  ->  Nested Loop
	    ->  Scan Y
	    ->  Scan Z
	          Filter: Z.Z = X.X

The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node.  On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members.  So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there.  If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.

This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape.  (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.)  Many
thanks to Alexander Kirkouski for submitting a reproducible test case.

The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.
2016-04-29 20:19:38 -04:00
Robert Haas 59eb551279 Fix EXPLAIN VERBOSE output for parallel aggregate.
The way that PartialAggregate and FinalizeAggregate plan nodes were
displaying output columns before was bogus.  Now, FinalizeAggregate
produces the same outputs as an Aggregate would have produced, while
PartialAggregate produces each of those outputs prefixed by the word
PARTIAL.

Discussion: 12585.1460737650@sss.pgh.pa.us

Patch by me, reviewed by David Rowley.
2016-04-27 07:37:40 -04:00
Robert Haas deb71fa971 Fix costing for parallel aggregation.
The original patch kind of ignored the fact that we were doing something
different from a costing point of view, but nobody noticed.  This patch
fixes that oversight.

David Rowley
2016-04-12 16:25:55 -04:00
Tom Lane 5306df2831 Clean up foreign-key caching code in planner.
Coverity complained that the code added by 015e88942a lacked an
error check for SearchSysCache1 failures, which it should have.  But
the code was pretty duff in other ways too, including failure to think
about whether it could really cope with arrays of different lengths.
2016-04-10 23:47:30 -04:00
Teodor Sigaev 8b99edefca Revert CREATE INDEX ... INCLUDING ...
It's not ready yet, revert two commits
690c543550 - unstable test output
386e3d7609 - patch itself
2016-04-08 21:52:13 +03:00
Teodor Sigaev 386e3d7609 CREATE INDEX ... INCLUDING (column[, ...])
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.

Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
2016-04-08 19:45:59 +03:00
Robert Haas 25fe8b5f1a Add a 'parallel_degree' reloption.
The code that estimates what parallel degree should be uesd for the
scan of a relation is currently rather stupid, so add a parallel_degree
reloption that can be used to override the planner's rather limited
judgement.

Julien Rouhaud, reviewed by David Rowley, James Sewell, Amit Kapila,
and me.  Some further hacking by me.
2016-04-08 11:14:56 -04:00
Simon Riggs 015e88942a Load FK defs into relcache for use by planner
Fastpath ignores this if no triggers defined.

Author: Tomas Vondra, with fastpath and comments added by me
Reviewers: David Rowley, Simon Riggs
2016-04-07 12:08:33 +01:00
Tom Lane de94e2af18 Run pgindent on a batch of (mostly-planner-related) source files.
Getting annoyed at the amount of unrelated chatter I get from pgindent'ing
Rowley's unique-joins patch.  Re-indent all the files it touches.
2016-04-06 11:34:02 -04:00
Robert Haas 41ea0c2376 Fix parallel-safety code for parallel aggregation.
has_parallel_hazard() was ignoring the proparallel markings for
aggregates, which is no good.  Fix that.  There was no way to mark
an aggregate as actually being parallel-safe, either, so add a
PARALLEL option to CREATE AGGREGATE.

Patch by me, reviewed by David Rowley.
2016-04-05 16:06:15 -04:00
Tom Lane f9aefcb91f Support using index-only scans with partial indexes in more cases.
Previously, the planner would reject an index-only scan if any restriction
clause for its table used a column not available from the index, even
if that restriction clause would later be dropped from the plan entirely
because it's implied by the index's predicate.  This is a fairly common
situation for partial indexes because predicates using columns not included
in the index are often the most useful kind of predicate, and we have to
duplicate (or at least imply) the predicate in the WHERE clause in order
to get the index to be considered at all.  So index-only scans were
essentially unavailable with such partial indexes.

To fix, we have to do detection of implied-by-predicate clauses much
earlier in the planner.  This patch puts it in check_index_predicates
(nee check_partial_indexes), meaning it gets done for every partial index,
whereas we previously only considered this issue at createplan time,
so that the work was only done for an index actually selected for use.
That could result in a noticeable planning slowdown for queries against
tables with many partial indexes.  However, testing suggested that there
isn't really a significant cost, especially not with reasonable numbers
of partial indexes.  We do get a small additional benefit, which is that
cost_index is more accurate since it correctly discounts the evaluation
cost of clauses that will be removed.  We can also avoid considering such
clauses as potential indexquals, which saves useless matching cycles in
the case where the predicate columns aren't in the index, and prevents
generating bogus plans that double-count the clause's selectivity when
the columns are in the index.

Tomas Vondra and Kyotaro Horiguchi, reviewed by Kevin Grittner and
Konstantin Knizhnik, and whacked around a little by me
2016-03-31 14:49:10 -04:00
Robert Haas 5fe5a2cee9 Allow aggregate transition states to be serialized and deserialized.
This is necessary infrastructure for supporting parallel aggregation
for aggregates whose transition type is "internal".  Such values
can't be passed between cooperating processes, because they are
just pointers.

David Rowley, reviewed by Tomas Vondra and by me.
2016-03-29 15:04:05 -04:00
Robert Haas 5d4171d1c7 Don't require a user mapping for FDWs to work.
Commit fbe5a3fb73 accidentally changed
this behavior; put things back the way they were, and add some
regression tests.

Report by Andres Freund; patch by Ashutosh Bapat, with a bit of
kibitzing by me.
2016-03-28 21:50:28 -04:00
Robert Haas e06a38965b Support parallel aggregation.
Parallel workers can now partially aggregate the data and pass the
transition values back to the leader, which can combine the partial
results to produce the final answer.

David Rowley, based on earlier work by Haribabu Kommi.  Reviewed by
Álvaro Herrera, Tomas Vondra, Amit Kapila, James Sewell, and me.
2016-03-21 09:30:18 -04:00
Robert Haas 0bf3ae88af Directly modify foreign tables.
postgres_fdw can now sent an UPDATE or DELETE statement directly to
the foreign server in simple cases, rather than sending a SELECT FOR
UPDATE statement and then updating or deleting rows one-by-one.

Etsuro Fujita, reviewed by Rushabh Lathia, Shigeru Hanada, Kyotaro
Horiguchi, Albe Laurenz, Thom Brown, and me.
2016-03-18 13:55:52 -04:00
Robert Haas 992b5ba30d Push scan/join target list beneath Gather when possible.
This means that, for example, "SELECT expensive_func(a) FROM bigtab
WHERE something" can compute expensive_func(a) in the workers rather
than the leader if it happens to be parallel-safe, which figures to be
a big win in some practical cases.

Currently, we can only do this if the entire target list is
parallel-safe.  If we worked harder, we might be able to evaluate
parallel-safe targets in the worker and any parallel-restricted
targets in the leader, but that would be more complicated, and there
aren't that many parallel-restricted functions that people are likely
to use in queries anyway.  I think.  So just do the simple thing for
the moment.

Robert Haas, Amit Kapila, and Tom Lane
2016-03-18 09:50:05 -04:00
Robert Haas 3aff33aa68 Fix typos.
Oskari Saarenmaa
2016-03-15 18:06:11 -04:00
Tom Lane 101fd9349e Add a GetForeignUpperPaths callback function for FDWs.
This is basically like the just-added create_upper_paths_hook, but
control is funneled only to the FDW responsible for all the baserels
of the current query; so providing such a callback is much less likely
to add useless overhead than using the hook function is.

The documentation is a bit sketchy.  We'll likely want to improve it,
and/or adjust the call conventions, when we get some experience with
actually using this callback.  Hopefully somebody will find time to
experiment with it before 9.6 feature freeze.
2016-03-14 20:04:48 -04:00
Tom Lane 28048cbaa2 Allow callers of create_foreignscan_path to specify nondefault PathTarget.
Although the default choice of rel->reltarget should typically be
sufficient for scan or join paths, it's not at all sufficient for the
purposes PathTargets were invented for; in particular not for
upper-relation Paths.  So break API compatibility by adding a PathTarget
argument to create_foreignscan_path().  To ease updating of existing
code, accept a NULL value of the argument as selecting rel->reltarget.
2016-03-14 17:31:28 -04:00
Tom Lane 307c78852f Rethink representation of PathTargets.
In commit 19a541143a I did not make PathTarget a subtype of Node,
and embedded a RelOptInfo's reltarget directly into it rather than having
a separately-allocated Node.  In hindsight that was misguided
micro-optimization, enabled by the fact that at that point we didn't have
any Paths with custom PathTargets.  Now that PathTarget processing has
been fleshed out some more, it's easier to see that it's better to have
PathTarget as an indepedent Node type, even if it does cost us one more
palloc to create a RelOptInfo.  So change it while we still can.

This commit just changes the representation, without doing anything more
interesting than that.
2016-03-14 16:59:59 -04:00
Tom Lane 49635d7b3e Minor additional refactoring of planner.c's PathTarget handling.
Teach make_group_input_target() and make_window_input_target() to work
entirely with the PathTarget representation of tlists, rather than
constructing a tlist and immediately deconstructing it into PathTarget
format.  In itself this only saves a few palloc's; the bigger picture is
that it opens the door for sharing cost_qual_eval work across all of
planner.c's constructions of PathTargets.  I'll come back to that later.

In support of this, flesh out tlist.c's infrastructure for PathTargets
a bit more.
2016-03-11 10:24:55 -05:00
Tom Lane c82c92b111 Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too.
All along, this function should have treated WindowFuncs in a manner
similar to Aggrefs, ie with an option whether or not to recurse into them.
By not considering the case, it was always recursing, which is OK for most
callers (although I suspect that the case in prepare_sort_from_pathkeys
might represent a bug).  But now we need return-without-recursing behavior
as well.  There are also more than a few callers that should never see a
WindowFunc, and now we'll get some error checking on that.
2016-03-10 16:23:52 -05:00
Tom Lane 364a9f47ab Refactor pull_var_clause's API to make it less tedious to extend.
In commit 1d97c19a0f and later c1d9579dd8, we extended
pull_var_clause's API by adding enum-type arguments.  That's sort of a pain
to maintain, though, because it means every time we add a new behavior we
must touch every last one of the call sites, even if there's a reasonable
default behavior that most of them could use.  Let's switch over to using a
bitmask of flags, instead; that seems more maintainable and might save a
nanosecond or two as well.  This commit changes no behavior in itself,
though I'm going to follow it up with one that does add a new behavior.

In passing, remove flatten_tlist(), which has not been used since 9.1
and would otherwise need the same API changes.

Removing these enums means that optimizer/tlist.h no longer needs to
depend on optimizer/var.h.  Changing that caused a number of C files to
need addition of #include "optimizer/var.h" (probably we can thank old
runs of pgrminclude for that); but on balance it seems like a good change
anyway.
2016-03-10 15:53:07 -05:00
Tom Lane 51c0f63e4d Improve handling of pathtargets in planner.c.
Refactor so that the internal APIs in planner.c deal in PathTargets not
targetlists, and establish a more regular structure for deriving the
targets needed for successive steps.

There is more that could be done here; calculating the eval costs of each
successive target independently is both inefficient and wrong in detail,
since we won't actually recompute values available from the input node's
tlist.  But it's no worse than what happened before the pathification
rewrite.  In any case this seems like a good starting point for considering
how to handle Konstantin Knizhnik's function-evaluation-postponement patch.
2016-03-09 01:12:16 -05:00
Tom Lane 9e8b99420f Improve handling of group-column indexes in GroupingSetsPath.
Instead of having planner.c compute a groupColIdx array and store it in
GroupingSetsPaths, make create_groupingsets_plan() find the grouping
columns by searching in the child plan node's tlist.  Although that's
probably a bit slower for create_groupingsets_plan(), it's more like
the way every other plan node type does this, and it provides positive
confirmation that we know which child output columns we're supposed to be
grouping on.  (Indeed, looking at this now, I'm not at all sure that it
wasn't broken before, because create_groupingsets_plan() isn't demanding
an exact tlist match from its child node.)  Also, this allows substantial
simplification in planner.c, because it no longer needs to compute the
groupColIdx array at all; no other cases were using it.

I'd intended to put off this refactoring until later (like 9.7), but
in view of the likely bug fix and the need to rationalize planner.c's
tlist handling so we can do something sane with Konstantin Knizhnik's
function-evaluation-postponement patch, I think it can't wait.
2016-03-08 22:32:11 -05:00
Tom Lane 8c314b9853 Finish refactoring make_foo() functions in createplan.c.
This patch removes some redundant cost calculations that I left for later
cleanup in commit 3fc6e2d7f5.  There's now a uniform policy that the
make_foo() convenience functions don't do any cost calculations.  Most of
their callers copy costs from the source Path node, and for those that
don't, the calculation in the make_foo() function wasn't necessarily right
anyhow.  (make_result() was particularly a mess, as it was serving multiple
callers using cost calcs designed for only the first one or two that had
ever existed.)  Aside from saving a few cycles, this ensures that what
EXPLAIN prints matches the costs we used for planning purposes.  It does
not change any planner decisions, since the decisions are already made.
2016-03-08 16:28:34 -05:00
Tom Lane 3fc6e2d7f5 Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is.  This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps.  Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step.  We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.

In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan.  It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation.  (A couple of regression test outputs change in consequence of
that.  However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)

There is a great deal left to do here.  This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations.  (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.)  I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.

Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 15:58:22 -05:00
Tom Lane 19a541143a Add an explicit representation of the output targetlist to Paths.
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist).  However, there are good
reasons to remove that assumption.  For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free".  While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node).  Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next.  Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit.  It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.

While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist.  To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)

I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation.  Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't.  Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths.  This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more.  This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
2016-02-18 20:02:03 -05:00
Robert Haas fbe5a3fb73 Only try to push down foreign joins if the user mapping OIDs match.
Previously, the foreign join pushdown infrastructure left the question
of security entirely up to individual FDWs, but it would be easy for
a foreign data wrapper to inadvertently open up subtle security holes
that way.  So, make it the core code's job to determine which user
mapping OID is relevant, and don't attempt join pushdown unless it's
the same for all relevant relations.

Per a suggestion from Tom Lane.  Shigeru Hanada and Ashutosh Bapat,
reviewed by Etsuro Fujita and KaiGai Kohei, with some further
changes by me.
2016-01-28 14:05:36 -05:00
Tom Lane b99551832e Add defenses against putting expanded objects into Const nodes.
Putting a reference to an expanded-format value into a Const node would be
a bad idea for a couple of reasons.  It'd be possible for the supposedly
immutable Const to change value, if something modified the referenced
variable ... in fact, if the Const's reference were R/W, any function that
has the Const as argument might itself change it at runtime.  Also, because
datumIsEqual() is pretty simplistic, the Const might fail to compare equal
to other Consts that it should compare equal to, notably including copies
of itself.  This could lead to unexpected planner behavior, such as "could
not find pathkey item to sort" errors or inferior plans.

I have not been able to find any way to get an expanded value into a Const
within the existing core code; but Paul Ramsey was able to trigger the
problem by writing a datatype input function that returns an expanded
value.

The best fix seems to be to establish a rule that varlena values being
placed into Const nodes should be passed through pg_detoast_datum().
That will do nothing (and cost little) in normal cases, but it will flatten
expanded values and thereby avoid the above problems.  Also, it will
convert short-header or compressed values into canonical format, which will
avoid possible unexpected lack-of-equality issues for those cases too.
And it provides a last-ditch defense against putting a toasted value into
a Const, which we already knew was dangerous, cf commit 2b0c86b665.
(In the light of this discussion, I'm no longer sure that that commit
provided 100% protection against such cases, but this fix should do it.)

The test added in commit 65c3d05e18 to catch datatype input functions
with unstable results would fail for functions that returned expanded
values; but it seems a bit uncharitable to deem a result unstable just
because it's expressed in expanded form, so revise the coding so that we
check for bitwise equality only after applying pg_detoast_datum().  That's
a sufficient condition anyway given the new rule about detoasting when
forming a Const.

Back-patch to 9.5 where the expanded-object facility was added.  It's
possible that this should go back further; but in the absence of clear
evidence that there's any live bug in older branches, I'll refrain for now.
2016-01-21 12:56:08 -05:00
Robert Haas 45be99f8cd Support parallel joins, and make related improvements.
The core innovation of this patch is the introduction of the concept
of a partial path; that is, a path which if executed in parallel will
generate a subset of the output rows in each process.  Gathering a
partial path produces an ordinary (complete) path.  This allows us to
generate paths for parallel joins by joining a partial path for one
side (which at the baserel level is currently always a Partial Seq
Scan) to an ordinary path on the other side.  This is subject to
various restrictions at present, especially that this strategy seems
unlikely to be sensible for merge joins, so only nested loops and
hash joins paths are generated.

This also allows an Append node to be pushed below a Gather node in
the case of a partitioned table.

Testing revealed that early versions of this patch made poor decisions
in some cases, which turned out to be caused by the fact that the
original cost model for Parallel Seq Scan wasn't very good.  So this
patch tries to make some modest improvements in that area.

There is much more to be done in the area of generating good parallel
plans in all cases, but this seems like a useful step forward.

Patch by me, reviewed by Dilip Kumar and Amit Kapila.
2016-01-20 14:40:26 -05:00
Tom Lane 65c5fcd353 Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-17 19:36:59 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Tom Lane 4fcf48450d Get rid of the planner's LateralJoinInfo data structure.
I originally modeled this data structure on SpecialJoinInfo, but after
commit acfcd45cac that looks like a pretty poor decision.
All we really need is relid sets identifying laterally-referenced rels;
and most of the time, what we want to know about includes indirect lateral
references, a case the LateralJoinInfo data was unsuited to compute with
any efficiency.  The previous commit redefined RelOptInfo.lateral_relids
as the transitive closure of lateral references, so that it easily supports
checking indirect references.  For the places where we really do want just
direct references, add a new RelOptInfo field direct_lateral_relids, which
is easily set up as a copy of lateral_relids before we perform the
transitive closure calculation.  Then we can just drop lateral_info_list
and LateralJoinInfo and the supporting code.  This makes the planner's
handling of lateral references noticeably more efficient, and shorter too.

Such a change can't be back-patched into stable branches for fear of
breaking extensions that might be looking at the planner's data structures;
but it seems not too late to push it into 9.5, so I've done so.
2015-12-11 15:52:38 -05:00