Commit Graph

1343 Commits

Author SHA1 Message Date
Tom Lane 9a3ddeb519 Fix index-only scan plans, take 2.
Commit 4ace45677 failed to fix the problem fully, because the
same issue of attempting to fetch a non-returnable index column
can occur when rechecking the indexqual after using a lossy index
operator.  Moreover, it broke EXPLAIN for such indexquals (which
indicates a gap in our test cases :-().

Revert the code changes of 4ace45677 in favor of adding a new field
to struct IndexOnlyScan, containing a version of the indexqual that
can be executed against the index-returned tuple without using any
non-returnable columns.  (The restrictions imposed by check_index_only
guarantee this is possible, although we may have to recompute indexed
expressions.)  Support construction of that during setrefs.c
processing by marking IndexOnlyScan.indextlist entries as resjunk
if they can't be returned, rather than removing them entirely.
(We could alternatively require setrefs.c to look up the IndexOptInfo
again, but abusing resjunk this way seems like a reasonably safe way
to avoid needing to do that.)

This solution isn't great from an API-stability standpoint: if there
are any extensions out there that build IndexOnlyScan structs directly,
they'll be broken in the next minor releases.  However, only a very
invasive extension would be likely to do such a thing.  There's no
change in the Path representation, so typical planner extensions
shouldn't have a problem.

As before, back-patch to all supported branches.

Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us
Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org
2022-01-03 15:42:27 -05:00
Peter Eisentraut 4965f75484 Remove unused include
"utils/builtins.h" was used for pg_strtouint64(), added by
cff440d368, removed by
3c6f8c011f.
2021-12-22 15:06:02 +01:00
Peter Eisentraut 2f4fd1a73b Remove unused include
"fmgr.h" was used for load_external_function(), added by
a05dc4d7fd, removed by
f9143d102f.
2021-12-22 15:05:58 +01:00
Peter Eisentraut 3c6f8c011f Simplify the general-purpose 64-bit integer parsing APIs
pg_strtouint64() is a wrapper around strtoull/strtoul/_strtoui64, but
it seems no longer necessary to have this indirection.
msvc/Solution.pm claims HAVE_STRTOULL, so the "MSVC only" part seems
unnecessary.  Also, we have code in c.h to substitute alternatives for
strtoull() if not found, and that would appear to cover all currently
supported platforms, so having a further fallback in pg_strtouint64()
seems unnecessary.

Therefore, we could remove pg_strtouint64(), and use strtoull()
directly in all call sites.  However, it seems useful to keep a
separate notation for parsing exactly 64-bit integers, matching the
type definition int64/uint64.  For that, add new macros strtoi64() and
strtou64() in c.h as thin wrappers around strtol()/strtoul() or
strtoll()/stroull().  This makes these functions available everywhere
instead of just in the server code, and it makes the function naming
notably different from the pg_strtointNN() functions in numutils.c,
which have a different API.

Discussion: https://www.postgresql.org/message-id/flat/a3df47c9-b1b4-29f2-7e91-427baf8b75a3%40enterprisedb.com
2021-12-17 06:32:07 +01:00
Peter Eisentraut d6f96ed94e Allow specifying column list for foreign key ON DELETE SET actions
Extend the foreign key ON DELETE actions SET NULL and SET DEFAULT by
allowing the specification of a column list, like

    CREATE TABLE posts (
        ...
        FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL (author_id)
    );

If a column list is specified, only those columns are set to
null/default, instead of all the columns in the foreign-key
constraint.

This is useful for multitenant or sharded schemas, where the tenant or
shard ID is included in the primary key of all tables but shouldn't be
set to null.

Author: Paul Martinez <paulmtz@google.com>
Discussion: https://www.postgresql.org/message-id/flat/CACqFVBZQyMYJV=njbSMxf+rbDHpx=W=B7AEaMKn8dWn9OZJY7w@mail.gmail.com
2021-12-08 11:13:57 +01:00
David Rowley 411137a429 Flush Memoize cache when non-key parameters change, take 2
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node.  If this parameter changes then cache entries
may become out-dated due to the new parameter value.

Previously Memoize was mistakenly not aware of this.  We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.

Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
2021-11-24 23:29:14 +13:00
David Rowley dad20ad470 Revert "Flush Memoize cache when non-key parameters change"
This reverts commit 1050048a31.
2021-11-24 15:27:43 +13:00
David Rowley 1050048a31 Flush Memoize cache when non-key parameters change
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node.  If this parameter changes then cache entries
may become out-dated due to the new parameter value.

Previously Memoize was mistakenly not aware of this.  We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.

Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
2021-11-24 14:56:18 +13:00
David Rowley e502150f7d Allow Memoize to operate in binary comparison mode
Memoize would always use the hash equality operator for the cache key
types to determine if the current set of parameters were the same as some
previously cached set.  Certain types such as floating points where -0.0
and +0.0 differ in their binary representation but are classed as equal by
the hash equality operator may cause problems as unless the join uses the
same operator it's possible that whichever join operator is being used
would be able to distinguish the two values.  In which case we may
accidentally return in the incorrect rows out of the cache.

To fix this here we add a binary mode to Memoize to allow it to the
current set of parameters to previously cached values by comparing
bit-by-bit rather than logically using the hash equality operator.  This
binary mode is always used for LATERAL joins and it's used for normal
joins when any of the join operators are not hashable.

Reported-by: Tom Lane
Author: David Rowley
Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us
Backpatch-through: 14, where Memoize was added
2021-11-24 10:06:59 +13:00
David Rowley 39a3105678 Fix incorrect hash equality operator bug in Memoize
In v14, because we don't have a field in RestrictInfo to cache both the
left and right type's hash equality operator, we just restrict the scope
of Memoize to only when the left and right types of a RestrictInfo are the
same.

In master we add another field to RestrictInfo and cache both hash
equality operators.

Reported-by: Jaime Casanova
Author: David Rowley
Discussion: https://postgr.es/m/20210929185544.GB24346%40ahch-to
Backpatch-through: 14
2021-11-08 14:40:33 +13:00
Tom Lane 27ef132a80 Doc: add some notes about performance of the List functions.
Per suggestion from Andres Freund.

Discussion: https://postgr.es/m/20211104221248.pgo4h6wvnjl6uvkb@alap3.anarazel.de
2021-11-06 19:12:51 -04:00
Tom Lane 65c6cab136 Avoid O(N^2) behavior in SyncPostCheckpoint().
As in commits 6301c3ada and e9d9ba2a4, avoid doing repetitive
list_delete_first() operations, since that would be expensive when
there are many files waiting to be unlinked.  This is a slightly
larger change than in those cases.  We have to keep the list state
valid for calls to AbsorbSyncRequests(), so it's necessary to invent a
"canceled" field instead of immediately deleting PendingUnlinkEntry
entries.  Also, because we might not be able to process all the
entries, we need a new list primitive list_delete_first_n().

list_delete_first_n() is almost list_copy_tail(), but it modifies the
input List instead of making a new copy.  I found a couple of existing
uses of the latter that could profitably use the new function.  (There
might be more, but the other callers look like they probably shouldn't
overwrite the input List.)

As before, back-patch to v13.

Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
2021-11-02 11:31:54 -04:00
Amit Kapila 5a2832465f Allow publishing the tables of schema.
A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows
one or more schemas to be specified, whose tables are selected by the
publisher for sending the data to the subscriber.

The new syntax allows specifying both the tables and schemas. For example:
CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;

A new system table "pg_publication_namespace" has been added, to maintain
the schemas that the user wants to publish through the publication.
Modified the output plugin (pgoutput) to publish the changes if the
relation is part of schema publication.

Updates pg_dump to identify and dump schema publications. Updates the \d
family of commands to display schema publications and \dRp+ variant will
now display associated schemas if any.

Author: Vignesh C, Hou Zhijie, Amit Kapila
Syntax-Suggested-by: Tom Lane, Alvaro Herrera
Reviewed-by: Greg Nancarrow, Masahiko Sawada, Hou Zhijie, Amit Kapila, Haiying Tang, Ajin Cherian, Rahila Syed, Bharath Rupireddy, Mark Dilger
Tested-by: Haiying Tang
Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ@mail.gmail.com
2021-10-27 07:44:52 +05:30
Peter Eisentraut d942887039 Improve order in file
Move support functions for new PublicationTable node to more sensible
locations in the files.
2021-10-07 08:20:55 +02:00
Tom Lane e3ec3c00d8 Remove arbitrary 64K-or-so limit on rangetable size.
Up to now the size of a query's rangetable has been limited by the
constants INNER_VAR et al, which mustn't be equal to any real
rangetable index.  65000 doubtless seemed like enough for anybody,
and it still is orders of magnitude larger than the number of joins
we can realistically handle.  However, we need a rangetable entry
for each child partition that is (or might be) processed by a query.
Queries with a few thousand partitions are getting more realistic,
so that the day when that limit becomes a problem is in sight,
even if it's not here yet.  Hence, let's raise the limit.

Rather than just increase the values of INNER_VAR et al, this patch
adopts the approach of making them small negative values, so that
rangetables could theoretically become as long as INT_MAX.

The bulk of the patch is concerned with changing Var.varno and some
related variables from "Index" (unsigned int) to plain "int".  This
is basically cosmetic, with little actual effect other than to help
debuggers print their values nicely.  As such, I've only bothered
with changing places that could actually see INNER_VAR et al, which
the parser and most of the planner don't.  We do have to be careful
in places that are performing less/greater comparisons on varnos,
but there are very few such places, other than the IS_SPECIAL_VARNO
macro itself.

A notable side effect of this patch is that while it used to be
possible to add INNER_VAR et al to a Bitmapset, that will now
draw an error.  I don't see any likelihood that it wouldn't be a
bug to include these fake varnos in a bitmapset of real varnos,
so I think this is all to the good.

Although this touches outfuncs/readfuncs, I don't think a catversion
bump is required, since stored rules would never contain Vars
with these fake varnos.

Andrey Lepikhov and Tom Lane, after a suggestion by Peter Eisentraut

Discussion: https://postgr.es/m/43c7f2f5-1e27-27aa-8c65-c91859d15190@postgrespro.ru
2021-09-15 14:11:21 -04:00
Peter Eisentraut e581360696 Make node output prefix match node structure name
In most cases, the prefix string in a node output is the upper case of
the node structure name, e.g., MergeAppend -> MERGEAPPEND.  There were
a few exceptions that for either no apparent reason or perhaps minor
aesthetic reasons deviated from this.  In order to simplify this and
perhaps allow automatic generation without having to deal with
exception cases, make them all match.

Discussion: https://www.postgresql.org/message-id/c091e5cd-45f8-69ee-6a9b-de86912cc7e7@enterprisedb.com
2021-09-15 16:35:41 +02:00
Peter Eisentraut bdeb2c4ec2 Add WRITE_INDEX_ARRAY
We have a few WRITE_{name of type}_ARRAY macros, but the one case
using the Index type was hand-coded.  Wrap it into a macro as well.

This also changes the behavior slightly: Before, the field name was
skipped if the length was zero.  Now it prints the field name even in
that case.  This is more consistent with how other array fields are
handled.

Reviewed-by: Jacob Champion <pchampion@vmware.com>
Discussion: https://www.postgresql.org/message-id/c091e5cd-45f8-69ee-6a9b-de86912cc7e7@enterprisedb.com
2021-09-14 10:27:38 +02:00
Peter Eisentraut 308da179e7 Add COPY_ARRAY_FIELD and COMPARE_ARRAY_FIELD
These handle node fields that are inline arrays (as opposed to
dynamically allocated arrays handled by COPY_POINTER_FIELD and
COMPARE_POINTER_FIELD).  These cases were hand-coded until now.

Reviewed-by: Jacob Champion <pchampion@vmware.com>
Discussion: https://www.postgresql.org/message-id/c091e5cd-45f8-69ee-6a9b-de86912cc7e7@enterprisedb.com
2021-09-14 10:27:34 +02:00
Peter Eisentraut 0ffbe900ce Fix _equalA_Const
639a86e36a neglected to make the
necessary adjustments to _equalA_Const.  Found only via
COPY_PARSE_PLAN_TREES.
2021-09-09 10:23:29 +02:00
Peter Eisentraut 639a86e36a Remove Value node struct
The Value node struct is a weird construct.  It is its own node type,
but most of the time, it actually has a node type of Integer, Float,
String, or BitString.  As a consequence, the struct name and the node
type don't match most of the time, and so it has to be treated
specially a lot.  There doesn't seem to be any value in the special
construct.  There is very little code that wants to accept all Value
variants but nothing else (and even if it did, this doesn't provide
any convenient way to check it), and most code wants either just one
particular node type (usually String), or it accepts a broader set of
node types besides just Value.

This change removes the Value struct and node type and replaces them
by separate Integer, Float, String, and BitString node types that are
proper node types and structs of their own and behave mostly like
normal node types.

Also, this removes the T_Null node tag, which was previously also a
possible variant of Value but wasn't actually used outside of the
Value contained in A_Const.  Replace that by an isnull field in
A_Const.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5ba6bc5b-3f95-04f2-2419-f8ddb4c046fb@enterprisedb.com
2021-09-09 08:36:53 +02:00
Alvaro Herrera 0c6828fa98
Add PublicationTable and PublicationRelInfo structs
These encapsulate a relation when referred from replication DDL.
Currently they don't do anything useful (they're just wrappers around
RangeVar and Relation respectively) but in the future they'll be used to
carry column lists.

Extracted from a larger patch by Rahila Syed.

Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com
2021-09-06 14:24:50 -03:00
Peter Eisentraut c1132aae33 Check the size in COPY_POINTER_FIELD
instead of making each caller do it.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-08-08 18:46:34 +02:00
Peter Eisentraut 2226b4189b Change SeqScan node to contain Scan node
This makes the structure of all Scan-derived nodes the same,
independent of whether they have additional fields.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-08-08 18:46:34 +02:00
David Rowley 475dbd0b71 Track a Bitmapset of non-pruned partitions in RelOptInfo
For partitioned tables with large numbers of partitions where queries are
able to prune all but a very small number of partitions, the time spent in
the planner looping over RelOptInfo.part_rels checking for non-NULL
RelOptInfos could become a large portion of the overall planning time.

Here we add a Bitmapset that records the non-pruned partitions.  This
allows us to more efficiently skip the pruned partitions by looping over
the Bitmapset.

This will cause a very slight slow down in cases where no or not many
partitions could be pruned, however, those cases are already slow to plan
anyway and the overhead of looping over the Bitmapset would be
unmeasurable when compared with the other tasks such as path creation for
a large number of partitions.

Reviewed-by: Amit Langote, Zhihong Yu
Discussion: https://postgr.es/m/CAApHDvqnPx6JnUuPwaf5ao38zczrAb9mxt9gj4U1EKFfd4AqLA@mail.gmail.com
2021-08-03 11:47:24 +12:00
Peter Eisentraut 31360381f0 Rename some node support functions for consistency
Some node function names didn't match their node type names exactly.
Fix those for consistency.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-07-21 10:24:06 +02:00
Peter Eisentraut 3d25b4ea6e Rename argument of _outValue()
Rename from value to node, for consistency with similar functions.

Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
2021-07-21 09:18:32 +02:00
David Rowley 83f4fcc655 Change the name of the Result Cache node to Memoize
"Result Cache" was never a great name for this node, but nobody managed
to come up with another name that anyone liked enough.  That was until
David Johnston mentioned "Node Memoization", which Tom Lane revised to
just "Memoize".  People seem to like "Memoize", so let's do the rename.

Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us
Backpatch-through: 14, where Result Cache was introduced
2021-07-14 12:43:58 +12:00
David Rowley 29f45e299e Use a hash table to speed up NOT IN(values)
Similar to 50e17ad28, which allowed hash tables to be used for IN clauses
with a set of constants, here we add the same feature for NOT IN clauses.

NOT IN evaluates the same as: WHERE a <> v1 AND a <> v2 AND a <> v3.
Obviously, if we're using a hash table we must be exactly equivalent to
that and return the same result taking into account that either side of
the condition could contain a NULL.  This requires a little bit of
special handling to make work with the hash table version.

When processing NOT IN, the ScalarArrayOpExpr's operator will be the <>
operator.  To be able to build and lookup a hash table we must use the
<>'s negator operator.  The planner checks if that exists and is hashable
and sets the relevant fields in ScalarArrayOpExpr to instruct the executor
to use hashing.

Author: David Rowley, James Coleman
Reviewed-by: James Coleman, Zhihong Yu
Discussion: https://postgr.es/m/CAApHDvoF1mum_FRk6D621edcB6KSHBi2+GAgWmioj5AhOu2vwQ@mail.gmail.com
2021-07-07 16:29:17 +12:00
Tom Lane e56bce5d43 Reconsider the handling of procedure OUT parameters.
Commit 2453ea142 redefined pg_proc.proargtypes to include the types of
OUT parameters, for procedures only.  While that had some advantages
for implementing the SQL-spec behavior of DROP PROCEDURE, it was pretty
disastrous from a number of other perspectives.  Notably, since the
primary key of pg_proc is name + proargtypes, this made it possible to
have multiple procedures with identical names + input arguments and
differing output argument types.  That would make it impossible to call
any one of the procedures by writing just NULL (or "?", or any other
data-type-free notation) for the output argument(s).  The change also
seems likely to cause grave confusion for client applications that
examine pg_proc and expect the traditional definition of proargtypes.

Hence, revert the definition of proargtypes to what it was, and
undo a number of complications that had been added to support that.

To support the SQL-spec behavior of DROP PROCEDURE, when there are
no argmode markers in the command's parameter list, we perform the
lookup both ways (that is, matching against both proargtypes and
proallargtypes), succeeding if we get just one unique match.
In principle this could result in ambiguous-function failures
that would not happen when using only one of the two rules.
However, overloading of procedure names is thought to be a pretty
rare usage, so this shouldn't cause many problems in practice.
Postgres-specific code such as pg_dump can defend against any
possibility of such failures by being careful to specify argmodes
for all procedure arguments.

This also fixes a few other bugs in the area of CALL statements
with named parameters, and improves the documentation a little.

catversion bump forced because the representation of procedures
with OUT arguments changes.

Discussion: https://postgr.es/m/3742981.1621533210@sss.pgh.pa.us
2021-06-10 17:11:36 -04:00
Peter Eisentraut 3bb309be75 Add _outTidRangePath()
We have outNode() coverage for all path nodes, but this one was
missed when it was added.
2021-06-07 21:32:53 +02:00
Tom Lane a65e9f3f14 Fix inconsistent equalfuncs.c behavior for FuncCall.funcformat.
Other equalfuncs.c checks on CoercionForm fields use
COMPARE_COERCIONFORM_FIELD (which makes them no-ops),
but commit 40c24bfef neglected to make _equalFuncCall
do likewise.  Fix that.

This is only strictly correct if FuncCall.funcformat has
no semantic effect, instead just determining ruleutils.c
display formatting.  40c24bfef added a couple of checks
in parse analysis that could break that rule; but on closer
inspection, they're redundant, so just take them out again.

Per report from Noah Misch.

Discussion: https://postgr.es/m/20210606063331.GC297923@rfd.leadboat.com
2021-06-06 15:46:58 -04:00
Tomas Vondra d57ecebd12 Add transformed flag to nodes/*funcs.c for CREATE STATISTICS
Commit a4d75c86bf added a new flag, tracking if the statement was
processed by transformStatsStmt(), but failed to add this flag to
nodes/*funcs.c.

Catversion bump, due to adding a flag to copy/equal/out functions.

Reported-by: Noah Misch
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-06-06 20:52:58 +02:00
Noah Misch a2dee328bb Standardize nodes/*funcs.c cosmetics for ForeignScan.resultRelation.
catversion bump due to readfuncs.c field order change.
2021-06-06 00:08:21 -07:00
Tom Lane 049e1e2edb Fix mishandling of resjunk columns in ON CONFLICT ... UPDATE tlists.
It's unusual to have any resjunk columns in an ON CONFLICT ... UPDATE
list, but it can happen when MULTIEXPR_SUBLINK SubPlans are present.
If it happens, the ON CONFLICT UPDATE code path would end up storing
tuples that include the values of the extra resjunk columns.  That's
fairly harmless in the short run, but if new columns are added to
the table then the values would become accessible, possibly leading
to malfunctions if they don't match the datatypes of the new columns.

This had escaped notice through a confluence of missing sanity checks,
including

* There's no cross-check that a tuple presented to heap_insert or
heap_update matches the table rowtype.  While it's difficult to
check that fully at reasonable cost, we can easily add assertions
that there aren't too many columns.

* The output-column-assignment cases in execExprInterp.c lacked
any sanity checks on the output column numbers, which seems like
an oversight considering there are plenty of assertion checks on
input column numbers.  Add assertions there too.

* We failed to apply nodeModifyTable's ExecCheckPlanOutput() to
the ON CONFLICT UPDATE tlist.  That wouldn't have caught this
specific error, since that function is chartered to ignore resjunk
columns; but it sure seems like a bad omission now that we've seen
this bug.

In HEAD, the right way to fix this is to make the processing of
ON CONFLICT UPDATE tlists work the same as regular UPDATE tlists
now do, that is don't add "SET x = x" entries, and use
ExecBuildUpdateProjection to evaluate the tlist and combine it with
old values of the not-set columns.  This adds a little complication
to ExecBuildUpdateProjection, but allows removal of a comparable
amount of now-dead code from the planner.

In the back branches, the most expedient solution seems to be to
(a) use an output slot for the ON CONFLICT UPDATE projection that
actually matches the target table, and then (b) invent a variant of
ExecBuildProjectionInfo that can be told to not store values resulting
from resjunk columns, so it doesn't try to store into nonexistent
columns of the output slot.  (We can't simply ignore the resjunk columns
altogether; they have to be evaluated for MULTIEXPR_SUBLINK to work.)
This works back to v10.  In 9.6, projections work much differently and
we can't cheaply give them such an option.  The 9.6 version of this
patch works by inserting a JunkFilter when it's necessary to get rid
of resjunk columns.

In addition, v11 and up have the reverse problem when trying to
perform ON CONFLICT UPDATE on a partitioned table.  Through a
further oversight, adjust_partition_tlist() discarded resjunk columns
when re-ordering the ON CONFLICT UPDATE tlist to match a partition.
This accidentally prevented the storing-bogus-tuples problem, but
at the cost that MULTIEXPR_SUBLINK cases didn't work, typically
crashing if more than one row has to be updated.  Fix by preserving
resjunk columns in that routine.  (I failed to resist the temptation
to add more assertions there too, and to do some minor code
beautification.)

Per report from Andres Freund.  Back-patch to all supported branches.

Security: CVE-2021-32028
2021-05-10 11:02:29 -04:00
Thomas Munro ec48314708 Revert per-index collation version tracking feature.
Design problems were discovered in the handling of composite types and
record types that would cause some relevant versions not to be recorded.
Misgivings were also expressed about the use of the pg_depend catalog
for this purpose.  We're out of time for this release so we'll revert
and try again.

Commits reverted:

1bf946bd: Doc: Document known problem with Windows collation versions.
cf002008: Remove no-longer-relevant test case.
ef387bed: Fix bogus collation-version-recording logic.
0fb0a050: Hide internal error for pg_collation_actual_version(<bad OID>).
ff942057: Suppress "warning: variable 'collcollate' set but not used".
d50e3b1f: Fix assertion in collation version lookup.
f24b1569: Rethink extraction of collation dependencies.
257836a7: Track collation versions for indexes.
cd6f479e: Add pg_depend.refobjversion.
7d1297df: Remove pg_collation.collversion.

Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com
2021-05-07 21:10:11 +12:00
David Rowley 152d33bcce Improve slightly misleading comments in nodeFuncs.c
There were some comments in nodeFuncs.c that, depending on your
interpretation of the word "result", could lead you to believe that the
comments were badly copied and pasted from somewhere else.  If you thought
of "result" as the return value of the function that the comment is
written in, then you'd be misled.  However, if you'd correctly
interpreted "result" to mean the result type of the given node type,
you'd not have seen any issues.

Here we do a small cleanup to try to prevent any future
misinterpretations.  Per wording suggestion from Tom Lane.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAApHDvp+Bw=2Qiu5=uXMKfC7gd0+B=4JvexVgGJU=am2g9a1CA@mail.gmail.com
2021-04-10 19:19:45 +12:00
David Rowley 50e17ad281 Speedup ScalarArrayOpExpr evaluation
ScalarArrayOpExprs with "useOr=true" and a set of Consts on the righthand
side have traditionally been evaluated by using a linear search over the
array.  When these arrays contain large numbers of elements then this
linear search could become a significant part of execution time.

Here we add a new method of evaluating ScalarArrayOpExpr expressions to
allow them to be evaluated by first building a hash table containing each
element, then on subsequent evaluations, we just probe that hash table to
determine if there is a match.

The planner is in charge of determining when this optimization is possible
and it enables it by setting hashfuncid in the ScalarArrayOpExpr.  The
executor will only perform the hash table evaluation when the hashfuncid
is set.

This means that not all cases are optimized. For example CHECK constraints
containing an IN clause won't go through the planner, so won't get the
hashfuncid set.  We could maybe do something about that at some later
date.  The reason we're not doing it now is from fear that we may slow
down cases where the expression is evaluated only once.  Those cases can
be common, for example, a single row INSERT to a table with a CHECK
constraint containing an IN clause.

In the planner, we enable this when there are suitable hash functions for
the ScalarArrayOpExpr's operator and only when there is at least
MIN_ARRAY_SIZE_FOR_HASHED_SAOP elements in the array.  The threshold is
currently set to 9.

Author: James Coleman, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAaqYe8x62+=wn0zvNKCj55tPpg-JBHzhZFFc6ANovdqFw7-dA@mail.gmail.com
2021-04-08 23:51:22 +12:00
Peter Eisentraut e717a9a18b SQL-standard function body
This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
statements for language SQL with a function body that conforms to the
SQL standard and is portable to other implementations.

Instead of the PostgreSQL-specific AS $$ string literal $$ syntax,
this allows writing out the SQL statements making up the body
unquoted, either as a single statement:

    CREATE FUNCTION add(a integer, b integer) RETURNS integer
        LANGUAGE SQL
        RETURN a + b;

or as a block

    CREATE PROCEDURE insert_data(a integer, b integer)
    LANGUAGE SQL
    BEGIN ATOMIC
      INSERT INTO tbl VALUES (a);
      INSERT INTO tbl VALUES (b);
    END;

The function body is parsed at function definition time and stored as
expression nodes in a new pg_proc column prosqlbody.  So at run time,
no further parsing is required.

However, this form does not support polymorphic arguments, because
there is no more parse analysis done at call time.

Dependencies between the function and the objects it uses are fully
tracked.

A new RETURN statement is introduced.  This can only be used inside
function bodies.  Internally, it is treated much like a SELECT
statement.

psql needs some new intelligence to keep track of function body
boundaries so that it doesn't send off statements when it sees
semicolons that are inside a function body.

Tested-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/1c11f1eb-f00c-43b7-799d-2d44132c02d7@2ndquadrant.com
2021-04-07 21:47:55 +02:00
David Rowley 9eacee2e62 Add Result Cache executor node (take 2)
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-02 14:10:56 +13:00
David Rowley 28b3e3905c Revert b6002a796
This removes "Add Result Cache executor node".  It seems that something
weird is going on with the tracking of cache hits and misses as
highlighted by many buildfarm animals.  It's not yet clear what the
problem is as other parts of the plan indicate that the cache did work
correctly, it's just the hits and misses that were being reported as 0.

This is especially a bad time to have the buildfarm so broken, so
reverting before too many more animals go red.

Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com
2021-04-01 13:33:23 +13:00
David Rowley b6002a796d Add Result Cache executor node
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-01 12:32:22 +13:00
Tom Lane 86dc90056d Rework planning and execution of UPDATE and DELETE.
This patch makes two closely related sets of changes:

1. For UPDATE, the subplan of the ModifyTable node now only delivers
the new values of the changed columns (i.e., the expressions computed
in the query's SET clause) plus row identity information such as CTID.
ModifyTable must re-fetch the original tuple to merge in the old
values of any unchanged columns.  The core advantage of this is that
the changed columns are uniform across all tables of an inherited or
partitioned target relation, whereas the other columns might not be.
A secondary advantage, when the UPDATE involves joins, is that less
data needs to pass through the plan tree.  The disadvantage of course
is an extra fetch of each tuple to be updated.  However, that seems to
be very nearly free in context; even worst-case tests don't show it to
add more than a couple percent to the total query cost.  At some point
it might be interesting to combine the re-fetch with the tuple access
that ModifyTable must do anyway to mark the old tuple dead; but that
would require a good deal of refactoring and it seems it wouldn't buy
all that much, so this patch doesn't attempt it.

2. For inherited UPDATE/DELETE, instead of generating a separate
subplan for each target relation, we now generate a single subplan
that is just exactly like a SELECT's plan, then stick ModifyTable
on top of that.  To let ModifyTable know which target relation a
given incoming row refers to, a tableoid junk column is added to
the row identity information.  This gets rid of the horrid hack
that was inheritance_planner(), eliminating O(N^2) planning cost
and memory consumption in cases where there were many unprunable
target relations.

Point 2 of course requires point 1, so that there is a uniform
definition of the non-junk columns to be returned by the subplan.
We can't insist on uniform definition of the row identity junk
columns however, if we want to keep the ability to have both
plain and foreign tables in a partitioning hierarchy.  Since
it wouldn't scale very far to have every child table have its
own row identity column, this patch includes provisions to merge
similar row identity columns into one column of the subplan result.
In particular, we can merge the whole-row Vars typically used as
row identity by FDWs into one column by pretending they are type
RECORD.  (It's still okay for the actual composite Datums to be
labeled with the table's rowtype OID, though.)

There is more that can be done to file down residual inefficiencies
in this patch, but it seems to be committable now.

FDW authors should note several API changes:

* The argument list for AddForeignUpdateTargets() has changed, and so
has the method it must use for adding junk columns to the query.  Call
add_row_identity_var() instead of manipulating the parse tree directly.
You might want to reconsider exactly what you're adding, too.

* PlanDirectModify() must now work a little harder to find the
ForeignScan plan node; if the foreign table is part of a partitioning
hierarchy then the ForeignScan might not be the direct child of
ModifyTable.  See postgres_fdw for sample code.

* To check whether a relation is a target relation, it's no
longer sufficient to compare its relid to root->parse->resultRelation.
Instead, check it against all_result_relids or leaf_result_relids,
as appropriate.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
2021-03-31 11:52:37 -04:00
Peter Eisentraut 055fee7eb4 Allow an alias to be attached to a JOIN ... USING
This allows something like

    SELECT ... FROM t1 JOIN t2 USING (a, b, c) AS x

where x has the columns a, b, c and unlike a regular alias it does not
hide the range variables of the tables being joined t1 and t2.

Per SQL:2016 feature F404 "Range variable for common column names".

Reviewed-by: Vik Fearing <vik.fearing@2ndquadrant.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/454638cf-d563-ab76-a585-2564428062af@2ndquadrant.com
2021-03-31 17:10:50 +02:00
Etsuro Fujita 27e1f14563 Add support for asynchronous execution.
This implements asynchronous execution, which runs multiple parts of a
non-parallel-aware Append concurrently rather than serially to improve
performance when possible.  Currently, the only node type that can be
run concurrently is a ForeignScan that is an immediate child of such an
Append.  In the case where such ForeignScans access data on different
remote servers, this would run those ForeignScans concurrently, and
overlap the remote operations to be performed simultaneously, so it'll
improve the performance especially when the operations involve
time-consuming ones such as remote join and remote aggregation.

We may extend this to other node types such as joins or aggregates over
ForeignScans in the future.

This also adds the support for postgres_fdw, which is enabled by the
table-level/server-level option "async_capable".  The default is false.

Robert Haas, Kyotaro Horiguchi, Thomas Munro, and myself.  This commit
is mostly based on the patch proposed by Robert Haas, but also uses
stuff from the patch proposed by Kyotaro Horiguchi and from the patch
proposed by Thomas Munro.  Reviewed by Kyotaro Horiguchi, Konstantin
Knizhnik, Andrey Lepikhov, Movead Li, Thomas Munro, Justin Pryzby, and
others.

Discussion: https://postgr.es/m/CA%2BTgmoaXQEt4tZ03FtQhnzeDEMzBck%2BLrni0UWHVVgOTnA6C1w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2BhUKGLBRyu0rHrDCMC4%3DRn3252gogyp1SjOgG8SEKKZv%3DFwfQ%40mail.gmail.com
Discussion: https://postgr.es/m/20200228.170650.667613673625155850.horikyota.ntt%40gmail.com
2021-03-31 18:45:00 +09:00
David Rowley f58b230ed0 Cache if PathTarget and RestrictInfos contain volatile functions
Here we aim to reduce duplicate work done by contain_volatile_functions()
by caching whether PathTargets and RestrictInfos contain any volatile
functions the first time contain_volatile_functions() is called for them.
Any future calls for these nodes just use the cached value rather than
going to the trouble of recursively checking the sub-node all over again.
Thanks to Tom Lane for the idea.

Any locations in the code which make changes to a PathTarget or
RestrictInfo which could change the outcome of the volatility check must
change the cached value back to VOLATILITY_UNKNOWN again.
contain_volatile_functions() is the only code in charge of setting the
cache value to either VOLATILITY_VOLATILE or VOLATILITY_NOVOLATILE.

Some existing code does benefit from this additional caching, however,
this change is mainly aimed at an upcoming patch that must check for
volatility during the join search.  Repeated volatility checks in that
case can become very expensive when the join search contains more than a
few relations.

Author: David Rowley
Discussion: https://postgr.es/m/3795226.1614059027@sss.pgh.pa.us
2021-03-29 14:55:26 +13:00
Tomas Vondra a4d75c86bf Extended statistics on expressions
Allow defining extended statistics on expressions, not just just on
simple column references.  With this commit, expressions are supported
by all existing extended statistics kinds, improving the same types of
estimates. A simple example may look like this:

  CREATE TABLE t (a int);
  CREATE STATISTICS s ON mod(a,10), mod(a,20) FROM t;
  ANALYZE t;

The collected statistics are useful e.g. to estimate queries with those
expressions in WHERE or GROUP BY clauses:

  SELECT * FROM t WHERE mod(a,10) = 0 AND mod(a,20) = 0;

  SELECT 1 FROM t GROUP BY mod(a,10), mod(a,20);

This introduces new internal statistics kind 'e' (expressions) which is
built automatically when the statistics object definition includes any
expressions. This represents single-expression statistics, as if there
was an expression index (but without the index maintenance overhead).
The statistics is stored in pg_statistics_ext_data as an array of
composite types, which is possible thanks to 79f6a942bd.

CREATE STATISTICS allows building statistics on a single expression, in
which case in which case it's not possible to specify statistics kinds.

A new system view pg_stats_ext_exprs can be used to display expression
statistics, similarly to pg_stats and pg_stats_ext views.

ALTER TABLE ... ALTER COLUMN ... TYPE now treats indexes the same way it
treats indexes, i.e. it drops and recreates the statistics. This means
all statistics are reset, and we no longer try to preserve at least the
functional dependencies. This should not be a major issue in practice,
as the functional dependencies actually rely on per-column statistics,
which were always reset anyway.

Author: Tomas Vondra
Reviewed-by: Justin Pryzby, Dean Rasheed, Zhihong Yu
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-27 00:01:11 +01:00
Alvaro Herrera 71f4c8c6f7
ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY
Allow a partition be detached from its partitioned table without
blocking concurrent queries, by running in two transactions and only
requiring ShareUpdateExclusive in the partitioned table.

Because it runs in two transactions, it cannot be used in a transaction
block.  This is the main reason to use dedicated syntax: so that users
can choose to use the original mode if they need it.  But also, it
doesn't work when a default partition exists (because an exclusive lock
would still need to be obtained on it, in order to change its partition
constraint.)

In case the second transaction is cancelled or a crash occurs, there's
ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final
steps.

The main trick to make this work is the addition of column
pg_inherits.inhdetachpending, initially false; can only be set true in
the first part of this command.  Once that is committed, concurrent
transactions that use a PartitionDirectory will include or ignore
partitions so marked: in optimizer they are ignored if the row is marked
committed for the snapshot; in executor they are always included.  As a
result, and because of the way PartitionDirectory caches partition
descriptors, queries that were planned before the detach will see the
rows in the detached partition and queries that are planned after the
detach, won't.

A CHECK constraint is created that duplicates the partition constraint.
This is probably not strictly necessary, and some users will prefer to
remove it afterwards, but if the partition is re-attached to a
partitioned table, the constraint needn't be rechecked.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-03-25 18:00:28 -03:00
Amit Kapila 26acb54a13 Revert "Enable parallel SELECT for "INSERT INTO ... SELECT ..."."
To allow inserts in parallel-mode this feature has to ensure that all the
constraints, triggers, etc. are parallel-safe for the partition hierarchy
which is costly and we need to find a better way to do that. Additionally,
we could have used existing cached information in some cases like indexes,
domains, etc. to determine the parallel-safety.

List of commits reverted, in reverse chronological order:

ed62d3737c Doc: Update description for parallel insert reloption.
c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".

Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org
2021-03-24 11:29:15 +05:30
Robert Haas bbe0a81db6 Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.

In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.

Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.

It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT.  It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data.  These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project.  However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.

Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored.  More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.

Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
2021-03-19 15:10:38 -04:00
Tomas Vondra be45be9c33 Implement GROUP BY DISTINCT
With grouping sets, it's possible that some of the grouping sets are
duplicate.  This is especially common with CUBE and ROLLUP clauses. For
example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to

  GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b, c),
    (a, b, c),
    (a, b),
    (a, b),
    (a, b),
    (a),
    (a),
    (a),
    (c, a),
    (c, a),
    (c, a),
    (c),
    (b, c),
    (b),
    ()
  )

Some of the grouping sets are calculated multiple times, which is mostly
unnecessary.  This commit implements a new GROUP BY DISTINCT feature, as
defined in the SQL standard, which eliminates the duplicate sets.

Author: Vik Fearing
Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
2021-03-18 18:22:18 +01:00
Amit Kapila 05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".
Parallel SELECT can't be utilized for INSERT in the following cases:
- INSERT statement uses the ON CONFLICT DO UPDATE clause
- Target table has a parallel-unsafe: trigger, index expression or
  predicate, column default expression or check constraint
- Target table has a parallel-unsafe domain constraint on any column
- Target table is a partitioned table with a parallel-unsafe partition key
  expression or support function

The planner is updated to perform additional parallel-safety checks for
the cases listed above, for determining whether it is safe to run INSERT
in parallel-mode with an underlying parallel SELECT. The planner will
consider using parallel SELECT for "INSERT INTO ... SELECT ...", provided
nothing unsafe is found from the additional parallel-safety checks, or
from the existing parallel-safety checks for SELECT.

While checking parallel-safety, we need to check it for all the partitions
on the table which can be costly especially when we decide not to use a
parallel plan. So, in a separate patch, we will introduce a GUC and or a
reloption to enable/disable parallelism for Insert statements.

Prior to entering parallel-mode for the execution of INSERT with parallel
SELECT, a TransactionId is acquired and assigned to the current
transaction state. This is necessary to prevent the INSERT from attempting
to assign the TransactionId whilst in parallel-mode, which is not allowed.
This approach has a disadvantage in that if the underlying SELECT does not
return any rows, then the TransactionId is not used, however that
shouldn't happen in practice in many cases.

Author: Greg Nancarrow, Amit Langote, Amit Kapila
Reviewed-by: Amit Langote, Hou Zhijie, Takayuki Tsunakawa, Antonin Houska, Bharath Rupireddy, Dilip Kumar, Vignesh C, Zhihong Yu, Amit Kapila
Tested-by: Tang, Haiying
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-03-10 07:38:58 +05:30
David Rowley 977b2c0853 Add missing TidRangeScan readfunc
Mistakenly forgotten in bb437f995
2021-02-27 23:21:21 +13:00
David Rowley bb437f995d Add TID Range Scans to support efficient scanning ranges of TIDs
This adds a new executor node named TID Range Scan.  The query planner
will generate paths for TID Range scans when quals are discovered on base
relations which search for ranges on the table's ctid column.  These
ranges may be open at either end. For example, WHERE ctid >= '(10,0)';
will return all tuples on page 10 and over.

To support this, two new optional callback functions have been added to
table AM.  scan_set_tidrange is used to set the scan range to just the
given range of TIDs.  scan_getnextslot_tidrange fetches the next tuple
in the given range.

For AMs were scanning ranges of TIDs would not make sense, these functions
can be set to NULL in the TableAmRoutine.  The query planner won't
generate TID Range Scan Paths in that case.

Author: Edmund Horner, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
2021-02-27 22:59:36 +13:00
Tom Lane f003a7522b Remove [Merge]AppendPath.partitioned_rels.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

The preceding two commits did away with all use of the partitioned_rels
values; this commit just mechanically removes the fields and the code
that calculated them.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:43:54 -05:00
Peter Eisentraut 3696a600e2 SEARCH and CYCLE clauses
This adds the SQL standard feature that adds the SEARCH and CYCLE
clauses to recursive queries to be able to do produce breadth- or
depth-first search orders and detect cycles.  These clauses can be
rewritten into queries using existing syntax, and that is what this
patch does in the rewriter.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-01 14:32:51 +01:00
Peter Eisentraut 6aaaa76bb4 Allow GRANTED BY clause in normal GRANT and REVOKE statements
The SQL standard allows a GRANTED BY clause on GRANT and
REVOKE (privilege) statements that can specify CURRENT_USER or
CURRENT_ROLE.  In PostgreSQL, both of these are the default behavior.
Since we already have all the parsing support for this for the
GRANT (role) statement, we might as well add basic support for this
for the privilege variant as well.  This allows us to check off SQL
feature T332.  In the future, perhaps more interesting things could be
done with this, too.

Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/f2feac44-b4c5-f38f-3699-2851d6a76dc9@2ndquadrant.com
2021-01-30 09:45:11 +01:00
Tomas Vondra b663a41363 Implement support for bulk inserts in postgres_fdw
Extends the FDW API to allow batching inserts into foreign tables. That
is usually much more efficient than inserting individual rows, due to
high latency for each round-trip to the foreign server.

It was possible to implement something similar in the regular FDW API,
but it was inconvenient and there were issues with reporting the number
of actually inserted rows etc. This extends the FDW API with two new
functions:

* GetForeignModifyBatchSize - allows the FDW picking optimal batch size

* ExecForeignBatchInsert - inserts a batch of rows at once

Currently, only INSERT queries support batching. Support for DELETE and
UPDATE may be added in the future.

This also implements batching for postgres_fdw. The batch size may be
specified using "batch_size" option both at the server and table level.

The initial patch version was written by me, but it was rewritten and
improved in many ways by Takayuki Tsunakawa.

Author: Takayuki Tsunakawa
Reviewed-by: Tomas Vondra, Amit Langote
Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-01-20 23:57:27 +01:00
Tom Lane c9d5298485 Re-implement pl/pgsql's expression and assignment parsing.
Invent new RawParseModes that allow the core grammar to handle
pl/pgsql expressions and assignments directly, and thereby get rid
of a lot of hackery in pl/pgsql's parser.  This moves a good deal
of knowledge about pl/pgsql into the core code: notably, we have to
invent a CoercionContext that matches pl/pgsql's (rather dubious)
historical behavior for assignment coercions.  That's getting away
from the original idea of pl/pgsql as an arm's-length extension of
the core, but really we crossed that bridge a long time ago.

The main advantage of doing this is that we can now use the core
parser to generate FieldStore and/or SubscriptingRef nodes to handle
assignments to pl/pgsql variables that are records or arrays.  That
fixes a number of cases that had never been implemented in pl/pgsql
assignment, such as nested records and array slicing, and it allows
pl/pgsql assignment to support the datatype-specific subscripting
behaviors introduced in commit c7aba7c14.

There are cosmetic benefits too: when a syntax error occurs in a
pl/pgsql expression, the error report no longer includes the confusing
"SELECT" keyword that used to get prefixed to the expression text.
Also, there seem to be some small speed gains.

Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
2021-01-04 11:52:00 -05:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Tom Lane b3817f5f77 Improve hash_create()'s API for some added robustness.
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which
was formerly the default; and add assertions insisting that exactly
one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set.
This is in hopes of preventing recurrences of the type of oversight
fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS).

Also, when HASH_STRINGS is specified, insist that the keysize be
more than 8 bytes.  This is a heuristic, but it should catch
accidental use of HASH_STRINGS for integer or pointer keys.
(Nearly all existing use-cases set the keysize to NAMEDATALEN or
more, so there's little reason to think this restriction should
be problematic.)

Tweak hash_create() to insist that the HASH_ELEM flag be set, and
remove the defaults it had for keysize and entrysize.  Since those
defaults were undocumented and basically useless, no callers
omitted HASH_ELEM anyway.

Also, remove memset's zeroing the HASHCTL parameter struct from
those callers that had one.  This has never been really necessary,
and while it wasn't a bad coding convention it was confusing that
some callers did it and some did not.  We might as well save a few
cycles by standardizing on "not".

Also improve the documentation for hash_create().

In passing, improve reinit.c's usage of a hash table by storing
the key as a binary Oid rather than a string; and, since that's
a temporary hash table, allocate it in CurrentMemoryContext for
neatness.

Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
2020-12-15 11:38:53 -05:00
Tom Lane c7aba7c14e Support subscripting of arbitrary types, not only arrays.
This patch generalizes the subscripting infrastructure so that any
data type can be subscripted, if it provides a handler function to
define what that means.  Traditional variable-length (varlena) arrays
all use array_subscript_handler(), while the existing fixed-length
types that support subscripting use raw_array_subscript_handler().
It's expected that other types that want to use subscripting notation
will define their own handlers.  (This patch provides no such new
features, though; it only lays the foundation for them.)

To do this, move the parser's semantic processing of subscripts
(including coercion to whatever data type is required) into a
method callback supplied by the handler.  On the execution side,
replace the ExecEvalSubscriptingRef* layer of functions with direct
calls to callback-supplied execution routines.  (Thus, essentially
no new run-time overhead should be caused by this patch.  Indeed,
there is room to remove some overhead by supplying specialized
execution routines.  This patch does a little bit in that line,
but more could be done.)

Additional work is required here and there to remove formerly
hard-wired assumptions about the result type, collation, etc
of a SubscriptingRef expression node; and to remove assumptions
that the subscript values must be integers.

One useful side-effect of this is that we now have a less squishy
mechanism for identifying whether a data type is a "true" array:
instead of wiring in weird rules about typlen, we can look to see
if pg_type.typsubscript == F_ARRAY_SUBSCRIPT_HANDLER.  For this
to be bulletproof, we have to forbid user-defined types from using
that handler directly; but there seems no good reason for them to
do so.

This patch also removes assumptions that the number of subscripts
is limited to MAXDIM (6), or indeed has any hard-wired limit.
That limit still applies to types handled by array_subscript_handler
or raw_array_subscript_handler, but to discourage other dependencies
on this constant, I've moved it from c.h to utils/array.h.

Dmitry Dolgov, reviewed at various times by Tom Lane, Arthur Zakirov,
Peter Eisentraut, Pavel Stehule

Discussion: https://postgr.es/m/CA+q6zcVDuGBv=M0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w@mail.gmail.com
Discussion: https://postgr.es/m/CA+q6zcVovR+XY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA@mail.gmail.com
2020-12-09 12:40:37 -05:00
Tom Lane a676386b58 Remove operator_precedence_warning.
This GUC was always intended as a temporary solution to help with
finding 9.4-to-9.5 migration issues.  Now that all pre-9.5 branches
are out of support, and 9.5 will be too before v14 is released,
it seems like it's okay to drop it.  Doing so allows removal of
several hundred lines of poorly-tested code in parse_expr.c,
which have been a fertile source of bugs when people did use this.

Discussion: https://postgr.es/m/2234320.1607117945@sss.pgh.pa.us
2020-12-08 16:29:52 -05:00
Michael Paquier b5913f6120 Refactor CLUSTER and REINDEX grammar to use DefElem for option lists
This changes CLUSTER and REINDEX so as a parenthesized grammar becomes
possible for options, while unifying the grammar parsing rules for
option lists with the existing ones.

This is a follow-up of the work done in 873ea9e for VACUUM, ANALYZE and
EXPLAIN.  This benefits REINDEX for a potential backend-side filtering
for collatable-sensitive indexes and TABLESPACE, while CLUSTER would
benefit from the latter.

Author: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
2020-12-03 10:13:21 +09:00
Tom Lane f7f83a55bf Ensure that expandTableLikeClause() re-examines the same table.
As it stood, expandTableLikeClause() re-did the same relation_openrv
call that transformTableLikeClause() had done.  However there are
scenarios where this would not find the same table as expected.
We hold lock on the LIKE source table, so it can't be renamed or
dropped, but another table could appear before it in the search path.
This explains the odd behavior reported in bug #16758 when cloning a
table as a temp table of the same name.  This case worked as expected
before commit 502898192 introduced the need to open the source table
twice, so we should fix it.

To make really sure we get the same table, let's re-open it by OID not
name.  That requires adding an OID field to struct TableLikeClause,
which is a little nervous-making from an ABI standpoint, but as long
as it's at the end I don't think there's any serious risk.

Per bug #16758 from Marc Boeren.  Like the previous patch,
back-patch to all supported branches.

Discussion: https://postgr.es/m/16758-840e84a6cfab276d@postgresql.org
2020-12-01 14:02:27 -05:00
Tom Lane 8286223f3d Fix missing outfuncs.c support for IncrementalSortPath.
For debugging purposes, Path nodes are supposed to have outfuncs
support, but this was overlooked in the original incremental sort patch.

While at it, clean up a couple other minor oversights, as well as
bizarre choice of return type for create_incremental_sort_path().
(All the existing callers just cast it to "Path *" immediately, so
they don't care, but some future caller might care.)

outfuncs.c fix by Zhijie Hou, the rest by me

Discussion: https://postgr.es/m/324c4d81d8134117972a5b1f6cdf9560@G08CNEXMBPEKD05.g08.fujitsu.local
2020-11-30 16:33:09 -05:00
Heikki Linnakangas 0a2bc5d61e Move per-agg and per-trans duplicate finding to the planner.
This has the advantage that the cost estimates for aggregates can count
the number of calls to transition and final functions correctly.

Bump catalog version, because views can contain Aggrefs.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/b2e3536b-1dbc-8303-c97e-89cb0b4a9a48%40iki.fi
2020-11-24 10:45:00 +02:00
Tom Lane 926fa801ac Remove undocumented IS [NOT] OF syntax.
This feature was added a long time ago, in 7c1e67bd5 and eb121ba2c,
but never documented in any user-facing way.  (Documentation added
in 6126d3e70 was commented out almost immediately, in 8272fc3f7.)
That's because, while this syntax is defined by SQL:99, our
implementation is only vaguely related to the standard's semantics.
The standard appears to intend a run-time not parse-time test, and
it definitely intends that the test should understand subtype
relationships.

No one has stepped up to fix that in the intervening years, but
people keep coming across the code and asking why it's not documented.
Let's just get rid of it: if anyone ever wants to make it work per
spec, they can easily recover whatever parts of this code are still
of value from our git history.

If there's anyone out there who's actually using this despite its
undocumented status, they can switch to using pg_typeof() instead,
eg. "pg_typeof(something) = 'mytype'::regtype".  That gives
essentially the same semantics as what our IS OF code did.
(We didn't have that function last time this was discussed, or
we would have ripped out IS OF then.)

Discussion: https://postgr.es/m/CAKFQuwZ2pTc-DSkOiTfjauqLYkNREeNZvWmeg12Q-_69D+sYZA@mail.gmail.com
Discussion: https://postgr.es/m/BAY20-F23E9F2B4DAB3E4E88D3623F99B0@phx.gbl
Discussion: https://postgr.es/m/3E7CF81D.1000203@joeconway.com
2020-11-19 17:39:39 -05:00
Tom Lane 92bf7e2d02 Provide the OR REPLACE option for CREATE TRIGGER.
This is mostly straightforward.  However, we disallow replacing
constraint triggers or changing the is-constraint property; perhaps
that can be added later, but the complexity versus benefit tradeoff
doesn't look very good.

Also, no special thought is taken here for whether replacing an
existing trigger should result in changes to queued-but-not-fired
trigger actions.  We just document that if you're surprised by the
results, too bad, don't do that.  (Note that any such pending trigger
activity would have to be within the current session.)

Takamichi Osumi, reviewed at various times by Surafel Temesgen,
Peter Smith, and myself

Discussion: https://postgr.es/m/0DDF369B45A1B44B8A687ED43F06557C010BC362@G01JPEXMBYT03
2020-11-14 17:05:34 -05:00
Tom Lane 40c24bfef9 Improve our ability to regurgitate SQL-syntax function calls.
The SQL spec calls out nonstandard syntax for certain function calls,
for example substring() with numeric position info is supposed to be
spelled "SUBSTRING(string FROM start FOR count)".  We accept many
of these things, but up to now would not print them in the same format,
instead simplifying down to "substring"(string, start, count).
That's long annoyed me because it creates an interoperability
problem: we're gratuitously injecting Postgres-specific syntax into
what might otherwise be a perfectly spec-compliant view definition.
However, the real reason for addressing it right now is to support
a planned change in the semantics of EXTRACT() a/k/a date_part().
When we switch that to returning numeric, we'll have the parser
translate EXTRACT() to some new function name (might as well be
"extract" if you ask me) and then teach ruleutils.c to reverse-list
that per SQL spec.  In this way existing calls to date_part() will
continue to have the old semantics.

To implement this, invent a new CoercionForm value COERCE_SQL_SYNTAX,
and make the parser insert that rather than COERCE_EXPLICIT_CALL when
the input has SQL-spec decoration.  (But if the input has the form of
a plain function call, continue to mark it COERCE_EXPLICIT_CALL, even
if it's calling one of these functions.)  Then ruleutils.c recognizes
COERCE_SQL_SYNTAX as a cue to emit SQL call syntax.  It can know
which decoration to emit using hard-wired knowledge about the
functions that could be called this way.  (While this solution isn't
extensible without manual additions, neither is the grammar, so this
doesn't seem unmaintainable.)  Notice that this solution will
reverse-list a function call with SQL decoration only if it was
entered that way; so dump-and-reload will not by itself produce any
changes in the appearance of views.

This requires adding a CoercionForm field to struct FuncCall.
(I couldn't resist the temptation to rearrange that struct's
field order a tad while I was at it.)  FuncCall doesn't appear
in stored rules, so that change isn't a reason for a catversion
bump, but I did one anyway because the new enum value for
CoercionForm fields could confuse old backend code.

Possible future work:

* Perhaps CoercionForm should now be renamed to DisplayForm,
or something like that, to reflect its more general meaning.
This'd require touching a couple hundred places, so it's not
clear it's worth the code churn.

* The SQLValueFunction node type, which was invented partly for
the same goal of improving SQL-compatibility of view output,
could perhaps be replaced with regular function calls marked
with COERCE_SQL_SYNTAX.  It's unclear if this would be a net
code savings, however.

Discussion: https://postgr.es/m/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
2020-11-04 12:34:50 -05:00
Thomas Munro 257836a755 Track collation versions for indexes.
Record the current version of dependent collations in pg_depend when
creating or rebuilding an index.  When accessing the index later, warn
that the index may be corrupted if the current version doesn't match.

Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe,
Michael Paquier, Robert Haas, Tom Lane and others for very helpful
discussion.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> (earlier versions)
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-03 01:19:50 +13:00
Thomas Munro 7d1297df08 Remove pg_collation.collversion.
This model couldn't be extended to cover the default collation, and
didn't have any information about the affected database objects when the
version changed.  Remove, in preparation for a follow-up commit that
will add a new mechanism.

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-03 00:44:59 +13:00
David Rowley a929e17e5a Allow run-time pruning on nested Append/MergeAppend nodes
Previously we only tagged on the required information to allow the
executor to perform run-time partition pruning for Append/MergeAppend
nodes belonging to base relations.  It was thought that nested
Append/MergeAppend nodes were just about always pulled up into the
top-level Append/MergeAppend and that making the run-time pruning info for
any sub Append/MergeAppend nodes was a waste of time.  However, that was
likely badly thought through.

Some examples of cases we're unable to pullup nested Append/MergeAppends
are: 1) Parallel Append nodes with a mix of parallel and non-parallel
paths into a Parallel Append.  2) When planning an ordered Append scan a
sub-partition which is unordered may require a nested MergeAppend path to
ensure sub-partitions don't mix up the order of tuples being fed into the
top-level Append.

Unfortunately, it was not just as simple as removing the lines in
createplan.c which were purposefully not building the run-time pruning
info for anything but RELOPT_BASEREL relations.  The code in
add_paths_to_append_rel() was far too sloppy about which partitioned_rels
it included for the Append/MergeAppend paths.  The original code there
would always assume accumulate_append_subpath() would pull each sub-Append
and sub-MergeAppend path into the top-level path.  While it does not
appear that there were any actual bugs caused by having the additional
partitioned table RT indexes recorded, what it did mean is that later in
planning, when we built the run-time pruning info that we wasted effort
and built PartitionedRelPruneInfos for partitioned tables that we had no
subpaths for the executor to run-time prune.

Here we tighten that up so that partitioned_rels only ever contains the RT
index for partitioned tables which actually have subpaths in the given
Append/MergeAppend.  We can now Assert that every PartitionedRelPruneInfo
has a non-empty present_parts.  That should allow us to catch any weird
corner cases that have been missed.

In passing, it seems there is no longer a good reason to have the
AppendPath and MergeAppendPath's partitioned_rel fields a List of IntList.
We can simply have a List of Relids instead.  This is more compact in
memory and faster to add new members to.  We still know which is the root
level partition as these always have a lower relid than their children.
Previously this field was used for more things, but run-time partition
pruning now remains the only user of it and it has no need for a List of
IntLists.

Here we also get rid of the RelOptInfo partitioned_child_rels field. This
is what was previously used to (sometimes incorrectly) set the
Append/MergeAppend path's partitioned_rels field.  That was the only usage
of that field, so we can happily just remove it.

I also couldn't resist changing some nearby code to make use of the newly
added for_each_from macro so we can skip the first element in the list
without checking if the current item was the first one on each
iteration.

A bug report from Andreas Kretschmer prompted all this work, however,
after some consideration, I'm not personally classing this as a bug fix.
So no backpatch.  In Andreas' test case, it just wasn't that clear that
there was a nested Append since the top-level Append just had a single
sub-path which was pulled up a level, per 8edd0e794.

Author: David Rowley
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/flat/CAApHDvqSchs%2BubdybcfFaSPB%2B%2BEA7kqMaoqajtP0GtZvzOOR3g%40mail.gmail.com
2020-11-02 13:46:56 +13:00
Tom Lane ad1c36b070 Fix foreign-key selectivity estimation in the presence of constants.
get_foreign_key_join_selectivity() looks for join clauses that equate
the two sides of the FK constraint.  However, if we have a query like
"WHERE fktab.a = pktab.a and fktab.a = 1", it won't find any such join
clause, because equivclass.c replaces the given clauses with "fktab.a
= 1 and pktab.a = 1", which can be enforced at the scan level, leaving
nothing to be done for column "a" at the join level.

We can fix that expectation without much trouble, but then a new problem
arises: applying the foreign-key-based selectivity rule produces a
rowcount underestimate, because we're effectively double-counting the
selectivity of the "fktab.a = 1" clause.  So we have to cancel that
selectivity out of the estimate.

To fix, refactor process_implied_equality() so that it can pass back the
new RestrictInfo to its callers in equivclass.c, allowing the generated
"fktab.a = 1" clause to be saved in the EquivalenceClass's ec_derives
list.  Then it's not much trouble to dig out the relevant RestrictInfo
when we need to adjust an FK selectivity estimate.  (While at it, we
can also remove the expensive use of initialize_mergeclause_eclasses()
to set up the new RestrictInfo's left_ec and right_ec pointers.
The equivclass.c code can set those basically for free.)

This seems like clearly a bug fix, but I'm hesitant to back-patch it,
first because there's some API/ABI risk for extensions and second because
we're usually loath to destabilize plan choices in stable branches.

Per report from Sigrid Ehrenreich.

Discussion: https://postgr.es/m/1019549.1603770457@sss.pgh.pa.us
Discussion: https://postgr.es/m/AM6PR02MB5287A0ADD936C1FA80973E72AB190@AM6PR02MB5287.eurprd02.prod.outlook.com
2020-10-28 11:15:47 -04:00
Heikki Linnakangas 178f2d560d Include result relation info in direct modify ForeignScan nodes.
FDWs that can perform an UPDATE/DELETE remotely using the "direct modify"
set of APIs need to access the ResultRelInfo of the target table. That's
currently available in EState.es_result_relation_info, but the next
commit will remove that field.

This commit adds a new resultRelation field in ForeignScan, to store the
target relation's RT index, and the corresponding ResultRelInfo in
ForeignScanState. The FDW's PlanDirectModify callback is expected to set
'resultRelation' along with 'operation'. The core code doesn't need them
for anything, they are for the convenience of FDW's Begin- and
IterateDirectModify callbacks.

Authors: Amit Langote, Etsuro Fujita
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
2020-10-14 10:58:38 +03:00
Heikki Linnakangas 1375422c78 Create ResultRelInfos later in InitPlan, index them by RT index.
Instead of allocating all the ResultRelInfos upfront in one big array,
allocate them in ExecInitModifyTable(). es_result_relations is now an
array of ResultRelInfo pointers, rather than an array of structs, and it
is indexed by the RT index.

This simplifies things: we get rid of the separate concept of a "result
rel index", and don't need to set it in setrefs.c anymore. This also
allows follow-up optimizations (not included in this commit yet) to skip
initializing ResultRelInfos for target relations that were not needed at
runtime, and removal of the es_result_relation_info pointer.

The EState arrays of regular result rels and root result rels are merged
into one array. Similarly, the resultRelations and rootResultRelations
lists in PlannedStmt are merged into one. It's not actually clear to me
why they were kept separate in the first place, but now that the
es_result_relations array is indexed by RT index, it certainly seems
pointless.

The PlannedStmt->resultRelations list is now only needed for
ExecRelationIsTargetRelation(). One visible effect of this change is that
ExecRelationIsTargetRelation() will now return 'true' also for the
partition root, if a partitioned table is updated. That seems like a good
thing, although the function isn't used in core code, and I don't see any
reason for an FDW to call it on a partition root.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
2020-10-13 12:57:02 +03:00
Tom Lane 56fe008996 Add for_each_from, to simplify loops starting from non-first list cells.
We have a dozen or so places that need to iterate over all but the
first cell of a List.  Prior to v13 this was typically written as
	for_each_cell(lc, lnext(list_head(list)))
Commit 1cff1b95a changed these to
	for_each_cell(lc, list, list_second_cell(list))
This patch introduces a new macro for_each_from() which expresses
the start point as a list index, allowing these to be written as
	for_each_from(lc, list, 1)
This is marginally more efficient, since ForEachState.i can be
initialized directly instead of backing into it from a ListCell
address.  It also seems clearer and less typo-prone.

Some of the remaining uses of for_each_cell() look like they could
profitably be changed to for_each_from(), but here I confined myself
to changing uses of list_second_cell().

Also, fix for_each_cell_setup() and for_both_cell_setup() to
const-ify their arguments; that's a simple oversight in 1cff1b95a.

Back-patch into v13, on the grounds that (1) the const-ification
is a minor bug fix, and (2) it's better for back-patching purposes
if we only have two ways to write these loops rather than three.

In HEAD, also remove list_third_cell() and list_fourth_cell(),
which were also introduced in 1cff1b95a, and are unused as of
cc99baa43.  It seems unlikely that any third-party code would
have started to use them already; anyone who has can be directed
to list_nth_cell instead.

Discussion: https://postgr.es/m/CAApHDvpo1zj9KhEpU2cCRZfSM3Q6XGdhzuAS2v79PH7WJBkYVA@mail.gmail.com
2020-09-28 20:33:13 -04:00
Tom Lane 9d299a4924 Minor mop-up for List improvements.
Fix a few places that were using written-out versions of the
pg_list.h macros that commit cc99baa43 just improved, making them
also use those macros so as to gain whatever performance improvement
is to be had.

Discussion: https://postgr.es/m/CAApHDvpo1zj9KhEpU2cCRZfSM3Q6XGdhzuAS2v79PH7WJBkYVA@mail.gmail.com
2020-09-27 22:30:52 -04:00
Tom Lane 41efb83408 Move resolution of AlternativeSubPlan choices to the planner.
When commit bd3daddaf introduced AlternativeSubPlans, I had some
ambitions towards allowing the choice of subplan to change during
execution.  That has not happened, or even been thought about, in the
ensuing twelve years; so it seems like a failed experiment.  So let's
rip that out and resolve the choice of subplan at the end of planning
(in setrefs.c) rather than during executor startup.  This has a number
of positive benefits:

* Removal of a few hundred lines of executor code, since
AlternativeSubPlans need no longer be supported there.

* Removal of executor-startup overhead (particularly, initialization
of subplans that won't be used).

* Removal of incidental costs of having a larger plan tree, such as
tree-scanning and copying costs in the plancache; not to mention
setrefs.c's own costs of processing the discarded subplans.

* EXPLAIN no longer has to print a weird (and undocumented)
representation of an AlternativeSubPlan choice; it sees only the
subplan actually used.  This should mean less confusion for users.

* Since setrefs.c knows which subexpression of a plan node it's
working on at any instant, it's possible to adjust the estimated
number of executions of the subplan based on that.  For example,
we should usually estimate more executions of a qual expression
than a targetlist expression.  The implementation used here is
pretty simplistic, because we don't want to expend a lot of cycles
on the issue; but it's better than ignoring the point entirely,
as the executor had to.

That last point might possibly result in shifting the choice
between hashed and non-hashed EXISTS subplans in a few cases,
but in general this patch isn't meant to change planner choices.
Since we're doing the resolution so late, it's really impossible
to change any plan choices outside the AlternativeSubPlan itself.

Patch by me; thanks to David Rowley for review.

Discussion: https://postgr.es/m/1992952.1592785225@sss.pgh.pa.us
2020-09-27 12:51:28 -04:00
Tom Lane 1ed6b89563 Remove support for postfix (right-unary) operators.
This feature has been a thorn in our sides for a long time, causing
many grammatical ambiguity problems.  It doesn't seem worth the
pain to continue to support it, so remove it.

There are some follow-on improvements we can make in the grammar,
but this commit only removes the bare minimum number of productions,
plus assorted backend support code.

Note that pg_dump and psql continue to have full support, since
they may be used against older servers.  However, pg_dump warns
about postfix operators.  There is also a check in pg_upgrade.

Documentation-wise, I (tgl) largely removed the "left unary"
terminology in favor of saying "prefix operator", which is
a more standard and IMO less confusing term.

I included a catversion bump, although no initial catalog data
changes here, to mark the boundary at which oprkind = 'r'
stopped being valid in pg_operator.

Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor

Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
2020-09-17 19:38:05 -04:00
Peter Eisentraut 3e0242b24c Message fixes and style improvements 2020-09-14 06:42:30 +02:00
Michael Paquier 844c05abc3 Remove variable "concurrent" from ReindexStmt
This node already handles multiple options using a bitmask, so having a
separate boolean flag is not necessary.  This simplifies the code a bit
with less arguments to give to the reindex routines, by replacing the
boolean with an equivalent bitmask value.

Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/20200902110326.GA14963@paquier.xyz
2020-09-04 10:43:32 +09:00
Tom Lane 2072932407 Suppress unnecessary RelabelType nodes in yet more cases.
Commit a477bfc1d fixed eval_const_expressions() to ensure that it
didn't generate unnecessary RelabelType nodes, but I failed to notice
that some other places in the planner had the same issue.  Really
noplace in the planner should be using plain makeRelabelType(), for
fear of generating expressions that should be equal() to semantically
equivalent trees, but aren't.

An example is that because canonicalize_ec_expression() failed
to be careful about this, we could end up with an equivalence class
containing both a plain Const, and a Const-with-RelabelType
representing exactly the same value.  So far as I can tell this led to
no visible misbehavior, but we did waste a bunch of cycles generating
and evaluating "Const = Const-with-RelabelType" to prove such entries
are redundant.

Hence, move the support function added by a477bfc1d to where it can
be more generally useful, and use it in the places where planner code
previously used makeRelabelType.

Back-patch to v12, like the previous patch.  While I have no concrete
evidence of any real misbehavior here, it's certainly possible that
I overlooked a case where equivalent expressions that aren't equal()
could cause a user-visible problem.  In any case carrying extra
RelabelType nodes through planning to execution isn't very desirable.

Discussion: https://postgr.es/m/1311836.1597781384@sss.pgh.pa.us
2020-08-19 14:07:49 -04:00
Heikki Linnakangas 3941eb6341 Make xact.h usable in frontend.
xact.h included utils/datetime.h, which cannot be used in the frontend
(it includes fmgr.h, which needs Datum). But xact.h only needs the
definition of TimestampTz from it, which is available directly in
datatypes/timestamp.h. Change xact.h to include that instead of
utils/datetime.h, so that it can be used in client programs.
2020-08-17 10:50:13 +03:00
Michael Paquier cc35d8933a Rename field "relkind" to "objtype" for CTAS and ALTER TABLE nodes
"relkind" normally refers to the char field from pg_class.  However, in
the parse nodes AlterTableStmt and CreateTableAsStmt, "relkind" was used
for a field of type enum ObjectType, that could refer to other object
types than those possible for a relkind.  Such fields being usually
named "objtype", switch the name in both structures to make things more
consistent.  Note that this led to some confusion in functions that
also operate on a RangeTableEntry object, which also has a field named
"relkind".

This naming goes back to commit 09d4e96, where only OBJECT_TABLE and
OBJECT_INDEX were used.  This got extended later to use as well
OBJECT_TYPE with e440e12, not really a relation kind.

Author: Mark Dilger
Reviewed-by: Daniel Gustafsson, Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/609181AE-E399-47C7-9221-856E0F96BF93@enterprisedb.com
2020-07-11 13:32:28 +09:00
Tom Lane f3faf35f37 Don't create pg_type entries for sequences or toast tables.
Commit f7f70d5e2 left one inconsistency behind: we're still creating
pg_type entries for the composite types of sequences and toast tables,
but not arrays over those composites.  But there seems precious little
reason to have named composite types for toast tables, and not much more
to have them for sequences (especially given the thought that sequences
may someday not be standalone relations at all).

So, let's close that inconsistency by removing these composite types,
rather than adding arrays for them.  This buys back a little bit of
the initial pg_type bloat added by the previous patch, and could be
a significant savings in a large database with many toast tables.

Aside from a small logic rearrangement in heap_create_with_catalog,
this patch mostly needs to clean up some places that were assuming that
pg_class.reltype always has a valid value.  Those are really pre-existing
bugs, given that it's documented otherwise; notably, the plpgsql changes
fix code that gives "cache lookup failed for type 0" on indexes today.
But none of these seem interesting enough to back-patch.

Also, remove the pg_dump/pg_upgrade infrastructure for propagating
a toast table's pg_type OID into the new database, since we no longer
need that.

Discussion: https://postgr.es/m/761F1389-C6A8-4C15-80CE-950C961F5341@gmail.com
2020-07-07 15:43:22 -04:00
Tom Lane 2f48ede080 Avoid using a cursor in plpgsql's RETURN QUERY statement.
plpgsql has always executed the query given in a RETURN QUERY command
by opening it as a cursor and then fetching a few rows at a time,
which it turns around and dumps into the function's result tuplestore.
The point of this was to keep from blowing out memory with an oversized
SPITupleTable result (note that while a tuplestore can spill tuples
to disk, SPITupleTable cannot).  However, it's rather inefficient, both
because of extra data copying and because of executor entry/exit
overhead.  In recent versions, a new performance problem has emerged:
use of a cursor prevents use of a parallel plan for the executed query.

We can improve matters by skipping use of a cursor and having the
executor push result tuples directly into the function's result
tuplestore.  However, a moderate amount of new infrastructure is needed
to make that idea work:

* We can use the existing tstoreReceiver.c DestReceiver code to funnel
executor output to the tuplestore, but it has to be extended to support
plpgsql's requirement for possibly applying a tuple conversion map.

* SPI needs to be extended to allow use of a caller-supplied
DestReceiver instead of its usual receiver that puts tuples into
a SPITupleTable.  Two new API calls are needed to handle both the
RETURN QUERY and RETURN QUERY EXECUTE cases.

I also felt that I didn't want these new API calls to use the legacy
method of specifying query parameter values with "char" null flags
(the old ' '/'n' convention); rather they should accept ParamListInfo
objects containing the parameter type and value info.  This required
a bit of additional new infrastructure since we didn't yet have any
parse analysis callback that would interpret $N parameter symbols
according to type data supplied in a ParamListInfo.  There seems to be
no harm in letting makeParamList install that callback by default,
rather than leaving a new ParamListInfo's parserSetup hook as NULL.
(Indeed, as of HEAD, I couldn't find anyplace that was using the
parserSetup field at all; plpgsql was using parserSetupArg for its
own purposes, but parserSetup seemed to be write-only.)

We can actually get plpgsql out of the business of using legacy null
flags altogether, and using ParamListInfo instead of its ad-hoc
PreparedParamsData structure; but this requires inventing one more
SPI API call that can replace SPI_cursor_open_with_args.  That seems
worth doing, though.

SPI_execute_with_args and SPI_cursor_open_with_args are now unused
anywhere in the core PG distribution.  Perhaps someday we could
deprecate/remove them.  But cleaning up the crufty bits of the SPI
API is a task for a different patch.

Per bug #16040 from Jeremy Smith.  This is unfortunately too invasive to
consider back-patching.  Patch by me; thanks to Hamid Akhtar for review.

Discussion: https://postgr.es/m/16040-eaacad11fecfb198@postgresql.org
2020-06-12 12:14:32 -04:00
Noah Misch 587322de36 Reconcile nodes/*funcs.c.
The stmt_len changes do not affect behavior.  LimitPath has no other
support functions, so that part changes only debugging output.
2020-05-25 16:23:48 -07:00
Tom Lane fa27dd40d5 Run pgindent with new pg_bsd_indent version 2.1.1.
Thomas Munro fixed a longstanding annoyance in pg_bsd_indent, that
it would misformat lines containing IsA() macros on the assumption
that the IsA() call should be treated like a cast.  This improves
some other cases involving field/variable names that match typedefs,
too.  The only places that get worse are a couple of uses of the
OpenSSL macro STACK_OF(); we'll gladly take that trade-off.

Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
2020-05-16 11:54:51 -04:00
Tom Lane 36ac359d36 Rename assorted LWLock tranches.
Choose names that fit into the conventions for wait event names
(particularly, that multi-word names are in the style MultiWordName)
and hopefully convey more information to non-hacker users than the
previous names did.

Also rename SerializablePredicateLockListLock to
SerializablePredicateListLock; the old name was long enough to cause
table formatting problems, plus the double occurrence of "Lock" seems
confusing/error-prone.

Also change a couple of particularly opaque LWLock field names.

Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
2020-05-15 18:11:07 -04:00
Tom Lane 0da06d9faf Get rid of trailing semicolons in C macro definitions.
Writing a trailing semicolon in a macro is almost never the right thing,
because you almost always want to write a semicolon after each macro
call instead.  (Even if there was some reason to prefer not to, pgindent
would probably make a hash of code formatted that way; so within PG the
rule should basically be "don't do it".)  Thus, if we have a semi inside
the macro, the compiler sees "something;;".  Much of the time the extra
empty statement is harmless, but it could lead to mysterious syntax
errors at call sites.  In perhaps an overabundance of neatnik-ism, let's
run around and get rid of the excess semicolons whereever possible.

The only thing worse than a mysterious syntax error is a mysterious
syntax error that only happens in the back branches; therefore,
backpatch these changes where relevant, which is most of them because
most of these mistakes are old.  (The lack of reported problems shows
that this is largely a hypothetical issue, but still, it could bite
us in some future patch.)

John Naylor and Tom Lane

Discussion: https://postgr.es/m/CACPNZCs0qWTqJ2QUSGJ07B7uvAvzMb-KbG2q+oo+J3tsWN5cqw@mail.gmail.com
2020-05-01 17:28:00 -04:00
Alvaro Herrera 5fc703946b
Add ALTER .. NO DEPENDS ON
Commit f2fcad27d5 (9.6 era) added the ability to mark objects as
dependent an extension, but forgot to add a way for such dependencies to
be removed.  This commit fixes that oversight.

Strictly speaking this should be backpatched to 9.6, but due to lack of
demand we're not doing so at this time.

Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
Reviewed-by: ahsan hadi <ahsan.hadi@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
2020-04-20 13:42:12 -04:00
Alexander Korotkov 1aac32df89 Revert 0f5ca02f53
0f5ca02f53 introduces 3 new keywords.  It appears to be too much for relatively
small feature.  Given now we past feature freeze, it's already late for
discussion of the new syntax.  So, revert.

Discussion: https://postgr.es/m/28209.1586294824%40sss.pgh.pa.us
2020-04-08 11:37:27 +03:00
David Rowley 02a2e8b442 Modify additional power 2 calculations to use new helper functions
2nd pass of modifying various places which obtain the next power
of 2 of a number and make them use the new functions added in
f0705bb62.

In passing, also modify num_combinations(). This can be implemented
using simple bitshifting rather than looping.

Reviewed-by: John Naylor
Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
2020-04-08 18:29:51 +12:00
Etsuro Fujita c8434d64ce Allow partitionwise joins in more cases.
Previously, the partitionwise join technique only allowed partitionwise
join when input partitioned tables had exactly the same partition
bounds.  This commit extends the technique to some cases when the tables
have different partition bounds, by using an advanced partition-matching
algorithm introduced by this commit.  For both the input partitioned
tables, the algorithm checks whether every partition of one input
partitioned table only matches one partition of the other input
partitioned table at most, and vice versa.  In such a case the join
between the tables can be broken down into joins between the matching
partitions, so the algorithm produces the pairs of the matching
partitions, plus the partition bounds for the join relation, to allow
partitionwise join for computing the join.  Currently, the algorithm
works for list-partitioned and range-partitioned tables, but not
hash-partitioned tables.  See comments in partition_bounds_merge().

Ashutosh Bapat and Etsuro Fujita, most of regression tests by Rajkumar
Raghuwanshi, some of the tests by Mark Dilger and Amul Sul, reviewed by
Dmitry Dolgov and Amul Sul, with additional review at various points by
Ashutosh Bapat, Mark Dilger, Robert Haas, Antonin Houska, Amit Langote,
Justin Pryzby, and Tomas Vondra

Discussion: https://postgr.es/m/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com
2020-04-08 10:25:00 +09:00
Alexander Korotkov 0f5ca02f53 Implement waiting for given lsn at transaction start
This commit adds following optional clause to BEGIN and START TRANSACTION
commands.

  WAIT FOR LSN lsn [ TIMEOUT timeout ]

New clause pospones transaction start till given lsn is applied on standby.
This clause allows user be sure, that changes previously made on primary would
be visible on standby.

New shared memory struct is used to track awaited lsn per backend.  Recovery
process wakes up backend once required lsn is applied.

Author: Ivan Kartyshov, Anna Akenteva
Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi
Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs
Reviewed-by: Amit Kapila, Alexander Korotkov
Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru
2020-04-07 23:51:10 +03:00
Alvaro Herrera 357889eb17
Support FETCH FIRST WITH TIES
WITH TIES is an option to the FETCH FIRST N ROWS clause (the SQL
standard's spelling of LIMIT), where you additionally get rows that
compare equal to the last of those N rows by the columns in the
mandatory ORDER BY clause.

There was a proposal by Andrew Gierth to implement this functionality in
a more powerful way that would yield more features, but the other patch
had not been finished at this time, so we decided to use this one for
now in the spirit of incremental development.

Author: Surafel Temesgen <surafel3000@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Discussion: https://postgr.es/m/CALAY4q9ky7rD_A4vf=FVQvCGngm3LOes-ky0J6euMrg=_Se+ag@mail.gmail.com
Discussion: https://postgr.es/m/87o8wvz253.fsf@news-spur.riddles.org.uk
2020-04-07 16:22:13 -04:00
Tomas Vondra d2d8a229bc Implement Incremental Sort
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.

This has a number of benefits:

- Reduced memory consumption, because only a single group (determined by
  values in the sorted prefix) needs to be kept in memory. This may also
  eliminate the need to spill to disk.

- Lower startup cost, because Incremental Sort produce results after each
  prefix group, which is beneficial for plans where startup cost matters
  (like for example queries with LIMIT clause).

We consider both Sort and Incremental Sort, and decide based on costing.

The implemented algorithm operates in two different modes:

- Fetching a minimum number of tuples without check of equality on the
  prefix keys, and sorting on all columns when safe.

- Fetching all tuples for a single prefix group and then sorting by
  comparing only the remaining (non-prefix) keys.

We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.

This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.

There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.

Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-06 21:35:10 +02:00
Noah Misch c6b92041d3 Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this.  If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules.  Maintainers of table access
methods should examine that section.

To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL.  A new GUC,
wal_skip_threshold, guides that choice.  If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.

Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node.  Make relcache.c retain entries for certain
dropped relations until end of transaction.

Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.

Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem.  Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.

Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 12:25:34 -07:00
Tom Lane 0b34e7d307 Improve user control over truncation of logged bind-parameter values.
This patch replaces the boolean GUC log_parameters_on_error introduced
by commit ba79cb5dc with an integer log_parameter_max_length_on_error,
adding the ability to specify how many bytes to trim each logged
parameter value to.  (The previous coding hard-wired that choice at
64 bytes.)

In addition, add a new parameter log_parameter_max_length that provides
similar control over truncation of query parameters that are logged in
response to statement-logging options, as opposed to errors.  Previous
releases always logged such parameters in full, possibly causing log
bloat.

For backwards compatibility with prior releases,
log_parameter_max_length defaults to -1 (log in full), while
log_parameter_max_length_on_error defaults to 0 (no logging).

Per discussion, log_parameter_max_length is SUSET since the DBA should
control routine logging behavior, but log_parameter_max_length_on_error
is USERSET because it also affects errcontext data sent back to the
client.

Alexey Bashtanov, editorialized a little by me

Discussion: https://postgr.es/m/b10493cc-a399-a03a-67c7-068f2791ee50@imap.cc
2020-04-02 15:04:51 -04:00
Alexander Korotkov 911e702077 Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing.  These index AMs are GiST, GIN,
SP-GiST and BRIN.  There opclasses define representation of keys, operations on
them and supported search strategies.  So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision.  This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.

This commit doesn't introduce new storage in system catalog.  Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.

In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions.  Options
are set to fn_expr as the constant bytea expression.  It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.

This commit comes with some examples of opclass options usage.  We parametrize
signature length in GiST.  That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops.  Also we parametrize maximum number of integer ranges for
gist__int_ops.  However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.

Catversion is bumped.

Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 19:17:23 +03:00