Commit Graph

4034 Commits

Author SHA1 Message Date
Alexander Korotkov 911e702077 Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing.  These index AMs are GiST, GIN,
SP-GiST and BRIN.  There opclasses define representation of keys, operations on
them and supported search strategies.  So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision.  This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.

This commit doesn't introduce new storage in system catalog.  Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.

In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions.  Options
are set to fn_expr as the constant bytea expression.  It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.

This commit comes with some examples of opclass options usage.  We parametrize
signature length in GiST.  That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops.  Also we parametrize maximum number of integer ranges for
gist__int_ops.  However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.

Catversion is bumped.

Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 19:17:23 +03:00
Fujii Masao 6aba63ef3e Allow the planner-related functions and hook to accept the query string.
This commit adds query_string argument into the planner-related functions
and hook and allows us to pass the query string to them.

Currently there is no user of the query string passed. But the upcoming patch
for the planning counters will add the planning hook function into
pg_stat_statements and the function will need the query string. So this change
will be necessary for that patch.

Also this change is useful for some extensions that want to use the query
string in their planner hook function.

Author: Pascal Legrand, Julien Rouhaud
Reviewed-by: Yoshikazu Imai, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/CAOBaU_bU1m3_XF5qKYtSj1ua4dxd=FWDyh2SH4rSJAUUfsGmAQ@mail.gmail.com
Discussion: https://postgr.es/m/1583789487074-0.post@n3.nabble.com
2020-03-30 13:51:05 +09:00
Andres Freund cedffbdb8b Report wait event for cost-based vacuum delay.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20200321040750.GD13662@telsasoft.com
2020-03-23 22:53:22 -07:00
Jeff Davis 64fe602279 Fixes for Disk-based Hash Aggregation.
Justin Pryzby raised a couple issues with commit 1f39bce0. Fixed.

Also, tweak the way the size of a hash entry is estimated and the
number of buckets is estimated when calling BuildTupleHashTableExt().

Discussion: https://www.postgresql.org/message-id/20200319064222.GR26184@telsasoft.com
2020-03-23 15:43:07 -07:00
Amit Kapila 33753ac9d7 Add object names to partition integrity violations.
All errors of SQLSTATE class 23 should include the name of an object
associated with the error in separate fields of the error report message.
We do this so that applications need not try to extract them from the
possibly-localized human-readable text of the message.

Reported-by: Chris Bandy
Author: Chris Bandy
Reviewed-by: Amit Kapila and Amit Langote
Discussion: https://postgr.es/m/0aa113a3-3c7f-db48-bcd8-f9290b2269ae@gmail.com
2020-03-23 08:09:15 +05:30
Noah Misch de9396326e Revert "Skip WAL for new relfilenodes, under wal_level=minimal."
This reverts commit cb2fd7eac2.  Per
numerous buildfarm members, it was incompatible with parallel query, and
a test case assumed LP64.  Back-patch to 9.5 (all supported versions).

Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
2020-03-22 09:24:09 -07:00
Noah Misch cb2fd7eac2 Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this.  If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules.  Maintainers of table access
methods should examine that section.

To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL.  A new GUC,
wal_skip_threshold, guides that choice.  If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.

Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node.  Make relcache.c retain entries for certain
dropped relations until end of transaction.

Back-patch to 9.5 (all supported versions).  This introduces a new WAL
record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC.  As
always, update standby systems before master systems.  This changes
sizeof(RelationData) and sizeof(IndexStmt), breaking binary
compatibility for affected extensions.  (The most recent commit to
affect the same class of extensions was
089e4d405d0f3b94c74a2c6a54357a84a681754b.)

Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem.  Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.

Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-03-21 09:38:26 -07:00
Noah Misch e629a01f69 During heap rebuild, lock any TOAST index until end of transaction.
swap_relation_files() calls toast_get_valid_index() to find and lock
this index, just before swapping with the rebuilt TOAST index.  The
latter function releases the lock before returning.  Potential for
mischief is low; a concurrent session can issue ALTER INDEX ... SET
(fillfactor = ...), which is not alarming.  Nonetheless, changing
pg_class.relfilenode without a lock is unconventional.  Back-patch to
9.5 (all supported versions), because another fix needs this.

Discussion: https://postgr.es/m/20191226001521.GA1772687@rfd.leadboat.com
2020-03-21 09:38:26 -07:00
Tom Lane 24e2885ee3 Introduce "anycompatible" family of polymorphic types.
This patch adds the pseudo-types anycompatible, anycompatiblearray,
anycompatiblenonarray, and anycompatiblerange.  They work much like
anyelement, anyarray, anynonarray, and anyrange respectively, except
that the actual input values need not match precisely in type.
Instead, if we can find a common supertype (using the same rules
as for UNION/CASE type resolution), then the parser automatically
promotes the input values to that type.  For example,
"myfunc(anycompatible, anycompatible)" can match a call with one
integer and one bigint argument, with the integer automatically
promoted to bigint.  With anyelement in the definition, the user
would have had to cast the integer explicitly.

The new types also provide a second, independent set of type variables
for function matching; thus with "myfunc(anyelement, anyelement,
anycompatible) returns anycompatible" the first two arguments are
constrained to be the same type, but the third can be some other
type, and the result has the type of the third argument.  The need
for more than one set of type variables was foreseen back when we
first invented the polymorphic types, but we never did anything
about it.

Pavel Stehule, revised a bit by me

Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com
2020-03-19 11:43:11 -04:00
Jeff Davis 1f39bce021 Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".

In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.

The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).

Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 15:42:02 -07:00
Alvaro Herrera 487e9861d0
Enable BEFORE row-level triggers for partitioned tables
... with the limitation that the tuple must remain in the same
partition.

Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/20200227165158.GA2071@alvherre.pgsql
2020-03-18 18:58:05 -03:00
Michael Paquier fdeeb524b4 Fix typo in indexcmds.c
Introduced by 61d7c7b.

Backpatch-through: 12
2020-03-18 11:13:12 +09:00
Peter Eisentraut 1cc9c2412c Preserve replica identity index across ALTER TABLE rewrite
If an index was explicitly set as replica identity index, this setting
was lost when a table was rewritten by ALTER TABLE.  Because this
setting is part of pg_index but actually controlled by ALTER
TABLE (not part of CREATE INDEX, say), we have to do some extra work
to restore it.

Based-on-patch-by: Quan Zongliang <quanzongliang@gmail.com>
Reviewed-by: Euler Taveira <euler.taveira@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c70fcab2-4866-0d9f-1d01-e75e189db342@gmail.com
2020-03-13 11:57:06 +01:00
Peter Eisentraut bf68b79e50 Refactor ps_status.c API
The init_ps_display() arguments were mostly lies by now, so to match
typical usage, just use one argument and let the caller assemble it
from multiple sources if necessary.  The only user of the additional
arguments is BackendInitialize(), which was already doing string
assembly on the caller side anyway.

Remove the second argument of set_ps_display() ("force") and just
handle that in init_ps_display() internally.

BackendInitialize() also used to set the initial status as
"authentication", but that was very far from where authentication
actually happened.  So now it's set to "initializing" and then
"authentication" just before the actual call to
ClientAuthentication().

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
2020-03-11 16:38:31 +01:00
Alvaro Herrera 899a04f5ed
Avoid duplicates in ALTER ... DEPENDS ON EXTENSION
If the command is attempted for an extension that the object already
depends on, silently do nothing.

In particular, this means that if a database containing multiple such
entries is dumped, the restore will silently do the right thing and
record just the first one.  (At least, in a world where pg_dump does
dump such entries -- which it doesn't currently, but it will.)

Backpatch to 9.6, where this kind of dependency was introduced.

Reviewed-by: Ibrar Ahmed, Tom Lane (offlist)
Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
2020-03-11 11:04:59 -03:00
Tom Lane d01f03a495 Preserve integer and float values accurately in (de)serialize_deflist.
Previously, this code just smashed all types of DefElem values to
strings, cavalierly reasoning that nobody would care.  But in point of
fact, most of the defGetFoo functions do distinguish among different
input syntaxes; for instance defGetBoolean will accept 1 as an integer
but not "1" as a string.  This led to CREATE/ALTER TEXT SEARCH
DICTIONARY accepting 0 and 1 as values for boolean dictionary
properties, only to have the dictionary fail at runtime.

We can upgrade this behavior by teaching serialize_deflist that it
does not need to quote T_Integer or T_Float nodes' values on output,
and then teaching deserialize_deflist to restore unquoted integer or
float values as the appropriate node type.  This should not break
anything using pg_ts_dict.dictinitoption, since that field is just
defined as being something valid to include in CREATE TEXT SEARCH
DICTIONARY.

deserialize_deflist is also used to parse the options arguments
for the ts_headline family of functions, but so far as I can see
this won't cause any problems there either: the only consumer of
that output is prsd_headline which always uses defGetString.
(Really that's a bad idea, but I won't risk changing it here.)

This is surely a bug fix, but given the lack of field complaints
I don't think it's necessary to back-patch.

Discussion: https://postgr.es/m/CAMkU=1xRcs_BUPzR0+V3WndaCAv0E_m3h6aUEJ8NF-sY1nnHsw@mail.gmail.com
2020-03-10 12:30:02 -04:00
Alvaro Herrera 40b3e2c201
Split out CreateCast into src/backend/catalog/pg_cast.c
This catalog-handling code was previously together with the rest of
CastCreate() in src/backend/commands/functioncmds.c.  A future patch
will need a way to add casts internally, so this will be useful to have
separate.

Also, move the nearby get_cast_oid() function from functioncmds.c to
lsyscache.c, which seems a more natural place for it.

Author: Paul Jungwirth, minor edits by Álvaro
Discussion: https://postgr.es/m/20200309210003.GA19992@alvherre.pgsql
2020-03-10 11:28:23 -03:00
Peter Eisentraut 3c173a53a8 Remove utils/acl.h from catalog/objectaddress.h
The need for this was removed by
8b9e9644dc.

A number of files now need to include utils/acl.h or
parser/parse_node.h explicitly where they previously got it indirectly
somehow.

Since parser/parse_node.h already includes nodes/parsenodes.h, the
latter is then removed where the former was added.  Also, remove
nodes/pg_list.h from objectaddress.h, since that's included via
nodes/parsenodes.h.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com
2020-03-10 10:27:00 +01:00
Peter Eisentraut 17b9e7f9fe Support adding partitioned tables to publication
When a partitioned table is added to a publication, changes of all of
its partitions (current or future) are published via that publication.

This change only affects which tables a publication considers as its
members.  The receiving side still sees the data coming from the
individual leaf partitions.  So existing restrictions that partition
hierarchies can only be replicated one-to-one are not changed by this.

Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
2020-03-10 09:09:32 +01:00
Michael Paquier 61d7c7bce3 Prevent reindex of invalid indexes on TOAST tables
Such indexes can only be duplicated leftovers of a previously failed
REINDEX CONCURRENTLY command, and a valid equivalent is guaranteed to
exist.  As toast indexes can only be dropped if invalid, reindexing
these would lead to useless duplicated indexes that can't be dropped
anymore, except if the parent relation is dropped.

Thanks to Justin Pryzby for reminding that this problem was reported
long ago during the review of the original patch of REINDEX
CONCURRENTLY, but the issue was never addressed.

Reported-by: Sergei Kornilov, Justin Pryzby
Author: Julien Rouhaud
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/36712441546604286%40sas1-890ba5c2334a.qloud-c.yandex.net
Discussion: https://postgr.es/m/20200216190835.GA21832@telsasoft.com
Backpatch-through: 12
2020-03-10 15:38:17 +09:00
Peter Eisentraut 71d60e2aa0 Add tg_updatedcols to TriggerData
This allows a trigger function to determine for an UPDATE trigger
which columns were actually updated.  This allows some optimizations
in generic trigger functions such as lo_manage and
tsvector_update_trigger.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/11c5f156-67a9-0fb5-8200-2a8018eb2e0c@2ndquadrant.com
2020-03-09 09:34:55 +01:00
Peter Eisentraut 8f152b6c50 Code simplification
Initialize TriggerData to 0 for the whole struct together, instead of
each field separately.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/11c5f156-67a9-0fb5-8200-2a8018eb2e0c@2ndquadrant.com
2020-03-09 09:34:55 +01:00
Tom Lane fe30e7ebfa Allow ALTER TYPE to change some properties of a base type.
Specifically, this patch allows ALTER TYPE to:
* Change the default TOAST strategy for a toastable base type;
* Promote a non-toastable type to toastable;
* Add/remove binary I/O functions for a type;
* Add/remove typmod I/O functions for a type;
* Add/remove a custom ANALYZE statistics functions for a type.

The first of these can be done by the type's owner; all the others
require superuser privilege since misuse could cause problems.

The main motivation for this patch is to allow extensions to
upgrade the feature sets of their data types, so the set of
alterable properties is biased towards that use-case.  However
it's also true that changing some other properties would be
a lot harder, as they get baked into physical storage and/or
stored expressions that depend on the type.

Along the way, refactor GenerateTypeDependencies() to make it easier
to call, refactor DefineType's volatility checks so they can be shared
by AlterType, and teach typcache.c that it might have to reload data
from the type's pg_type row, a scenario it never handled before.
Also rearrange alter_type.sgml a bit for clarity (put the
composite-type operations together).

Tomas Vondra and Tom Lane

Discussion: https://postgr.es/m/20200228004440.b23ein4qvmxnlpht@development
2020-03-06 12:19:29 -05:00
Tom Lane bb03010b9f Remove the "opaque" pseudo-type and associated compatibility hacks.
A long time ago, it was necessary to declare datatype I/O functions,
triggers, and language handler support functions in a very type-unsafe
way involving a single pseudo-type "opaque".  We got rid of those
conventions in 7.3, but there was still support in various places to
automatically convert such functions to the modern declaration style,
to be able to transparently re-load dumps from pre-7.3 servers.
It seems unnecessary to continue to support that anymore, so take out
the hacks; whereupon the "opaque" pseudo-type itself is no longer
needed and can be dropped.

This is part of a group of patches removing various server-side kluges
for transparently upgrading pre-8.0 dump files.  Since we've had few
complaints about dropping pg_dump's support for dumping from pre-8.0
servers (commit 64f3524e2), it seems okay to now remove these kluges.

Discussion: https://postgr.es/m/4110.1583255415@sss.pgh.pa.us
2020-03-05 15:48:56 -05:00
Tom Lane 84eca14bc4 Remove ancient hacks to ignore certain opclass names in CREATE INDEX.
Twenty years ago, we removed certain operator classes in favor of
letting indexes over their data types be built with some other
binary-compatible, more standard opclass.  As a hack to allow existing
index definitions to be dumped and reloaded, we made CREATE INDEX ignore
the removed opclass names, so that such indexes would fall back to the
new default opclass for their data types.  This was never intended to
be a long-lived thing; it carries the obvious risk of breaking some
future developer's attempt to re-use those old opclass names.  Since
all of the cases in question are for opclasses that were removed
before PG 8.0, it seems okay to get rid of these hacks now.

This is part of a group of patches removing various server-side kluges
for transparently upgrading pre-8.0 dump files.  Since we've had few
complaints about dropping pg_dump's support for dumping from pre-8.0
servers (commit 64f3524e2), it seems okay to now remove these kluges.

Discussion: https://postgr.es/m/3685.1583422389@sss.pgh.pa.us
2020-03-05 15:36:06 -05:00
Tom Lane e58a599752 Remove ancient support for upgrading pre-7.3 foreign key constraints.
Before 7.3, foreign key constraints had no explicit catalog
representation, so that what pg_dump produced for them was (usually)
a set of three CREATE CONSTRAINT TRIGGER commands.  Commit a2899ebdc
and some follow-on fixes added an ugly hack in CreateTrigger() to
recognize that pattern and reconstruct the foreign key definition.
However, we've never had any test coverage for that code, so that it's
legitimate to wonder if it still works; and having to maintain it in
the face of upcoming trigger-related patches seems rather pointless.
Let's decree that its time has passed, and drop it.

This is part of a group of patches removing various server-side kluges
for transparently upgrading pre-8.0 dump files.  Since we've had few
complaints about dropping pg_dump's support for dumping from pre-8.0
servers (commit 64f3524e2), it seems okay to now remove these kluges.

Daniel Gustafsson

Discussion: https://postgr.es/m/805874E2-999C-4CDA-856F-1AFBD9DFE933@yesql.se
2020-03-05 15:25:45 -05:00
Tom Lane 3ed2005ff5 Introduce macros for typalign and typstorage constants.
Our usual practice for "poor man's enum" catalog columns is to define
macros for the possible values and use those, not literal constants,
in C code.  But for some reason lost in the mists of time, this was
never done for typalign/attalign or typstorage/attstorage.  It's never
too late to make it better though, so let's do that.

The reason I got interested in this right now is the need to duplicate
some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch.
But in general, this sort of change aids greppability and readability,
so it's a good idea even without any specific motivation.

I may have missed a few places that could be converted, and it's even
more likely that pending patches will re-introduce some hard-coded
references.  But that's not fatal --- there's no expectation that
we'd actually change any of these values.  We can clean up stragglers
over time.

Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us
2020-03-04 10:34:25 -05:00
Alvaro Herrera 2f9661311b
Represent command completion tags as structs
The backend was using strings to represent command tags and doing string
comparisons in multiple places, but that's slow and unhelpful.  Create a
new command list with a supporting structure to use instead; this is
stored in a tag-list-file that can be tailored to specific purposes with
a caller-definable C macro, similar to what we do for WAL resource
managers.  The first first such uses are a new CommandTag enum and a
CommandTagBehavior struct.

Replace numerous occurrences of char *completionTag with a
QueryCompletion struct so that the code no longer stores information
about completed queries in a cstring.  Only at the last moment, in
EndCommand(), does this get converted to a string.

EventTriggerCacheItem no longer holds an array of palloc’d tag strings
in sorted order, but rather just a Bitmapset over the CommandTags.

Author: Mark Dilger, with unsolicited help from Álvaro Herrera
Reviewed-by: John Naylor, Tom Lane
Discussion: https://postgr.es/m/981A9DB4-3F0C-4DA5-88AD-CB9CFF4D6CAD@enterprisedb.com
2020-03-02 18:19:51 -03:00
Alvaro Herrera b9b408c487
Record parents of triggers
This let us get rid of a recently introduced ugly hack (commit
1fa846f1c9).

Author: Álvaro Herrera
Reviewed-by: Amit Langote, Tom Lane
Discussion: https://postgr.es/m/20200217215641.GA29784@alvherre.pgsql
2020-02-27 13:23:33 -03:00
Robert Haas 05d8449e73 Move src/backend/utils/hash/hashfn.c to src/common
This also involves renaming src/include/utils/hashutils.h, which
becomes src/include/common/hashfn.h. Perhaps an argument can be
made for keeping the hashutils.h name, but it seemed more
consistent to make it match the name of the file, and also more
descriptive of what is actually going on here.

Patch by me, reviewed by Suraj Kharage and Mark Dilger. Off-list
advice on how not to break the Windows build from Davinder Singh
and Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
2020-02-27 09:25:41 +05:30
Peter Geoghegan 612a1ab767 Add equalimage B-Tree support functions.
Invent the concept of a B-Tree equalimage ("equality implies image
equality") support function, registered as support function 4.  This
indicates whether it is safe (or not safe) to apply optimizations that
assume that any two datums considered equal by an operator class's order
method must be interchangeable without any loss of semantic information.
This is static information about an operator class and a collation.

Register an equalimage routine for almost all of the existing B-Tree
opclasses.  We only need two trivial routines for all of the opclasses
that are included with the core distribution.  There is one routine for
opclasses that index non-collatable types (which returns 'true'
unconditionally), plus another routine for collatable types (which
returns 'true' when the collation is a deterministic collation).

This patch is infrastructure for an upcoming patch that adds B-Tree
deduplication.

Author: Peter Geoghegan, Anastasia Lubennikova
Discussion: https://postgr.es/m/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com
2020-02-26 11:28:25 -08:00
Tom Lane 70a7732007 Remove support for upgrading extensions from "unpackaged" state.
Andres Freund pointed out that allowing non-superusers to run
"CREATE EXTENSION ... FROM unpackaged" has security risks, since
the unpackaged-to-1.0 scripts don't try to verify that the existing
objects they're modifying are what they expect.  Just attaching such
objects to an extension doesn't seem too dangerous, but some of them
do more than that.

We could have resolved this, perhaps, by still requiring superuser
privilege to use the FROM option.  However, it's fair to ask just what
we're accomplishing by continuing to lug the unpackaged-to-1.0 scripts
forward.  None of them have received any real testing since 9.1 days,
so they may not even work anymore (even assuming that one could still
load the previous "loose" object definitions into a v13 database).
And an installation that's trying to go from pre-9.1 to v13 or later
in one jump is going to have worse compatibility problems than whether
there's a trivial way to convert their contrib modules into extension
style.

Hence, let's just drop both those scripts and the core-code support
for "CREATE EXTENSION ... FROM".

Discussion: https://postgr.es/m/20200213233015.r6rnubcvl4egdh5r@alap3.anarazel.de
2020-02-19 16:59:14 -05:00
Fujii Masao b7e51b350c Make inherited LOCK TABLE perform access permission checks on parent table only.
Previously, LOCK TABLE command through a parent table checked
the permissions on not only the parent table but also the children
tables inherited from it. This was a bug and inherited queries should
perform access permission checks on the parent table only. This
commit fixes LOCK TABLE so that it does not check the permission
on children tables.

This patch is applied only in the master branch. We decided not to
back-patch because it's not hard to imagine that there are some
applications expecting the old behavior and the change breaks their
security.

Author: Amit Langote
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CAHGQGwE+GauyG7POtRfRwwthAGwTjPQYdFHR97+LzA4RHGnJxA@mail.gmail.com
2020-02-18 13:13:15 +09:00
Peter Eisentraut c6679e4fca Optimize update of tables with generated columns
When updating a table row with generated columns, only recompute those
generated columns whose base columns have changed in this update and
keep the rest unchanged.  This can result in a significant performance
benefit.  The required information was already kept in
RangeTblEntry.extraUpdatedCols; we just have to make use of it.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b05e781a-fa16-6b52-6738-761181204567@2ndquadrant.com
2020-02-17 15:20:58 +01:00
Tom Lane faade5d4c6 Update obsolete comment.
Noted by Justin Pryzby, though I chose to just rip out the stale text,
as it's in no way relevant to this particular function.

Discussion: https://postgr.es/m/20200212182337.GZ1412@telsasoft.com
2020-02-15 15:22:40 -05:00
Tom Lane 0973f5602c Remove long-dead comments.
These should've been dropped by a8bb8eb58, but evidently were
missed.  Not important enough to back-patch.
2020-02-12 14:33:49 -05:00
Alvaro Herrera b048f558dd Fix priv checks for ALTER <object> DEPENDS ON EXTENSION
Marking an object as dependant on an extension did not have any
privilege check whatsoever; this allowed any user to mark objects as
droppable by anyone able to DROP EXTENSION, which could be used to cause
system-wide havoc.  Disallow by checking that the calling user owns the
mentioned object.

(No constraints are placed on the extension.)

Security: CVE-2020-1720
Reported-by: Tom Lane
Discussion: 31605.1566429043@sss.pgh.pa.us
2020-02-10 11:47:09 -03:00
Alvaro Herrera 55173d2e66 Fix failure to create FKs correctly in partitions
On a multi-level partioned table, when adding a partition not directly
connected to the root table, foreign key constraints referencing the
root were not cloned to the new partition, leading to the FK being
possibly inadvertently violated later on.

This was caused by fuzzy thinking in CloneFkReferenced (commit
f56f8f8da6): it was skipping constraints marked as having parents on
the theory that cloning those would create duplicates; but that's only
correct for the top level of the partitioning hierarchy.  For levels
below that one, such constraints must still be considered and only
skipped if later on we see that we'd create duplicates.  Apparently, I
(Álvaro) wrote the comments right but the code implemented something
slightly different.

Author: Jehan-Guillaume de Rorthais
Discussion: https://postgr.es/m/20200206004948.238352db@firost
2020-02-07 18:27:18 -03:00
Tom Lane 7d91b604d9 Fix handling of "Subplans Removed" field in EXPLAIN output.
Commit 499be013d added this field in a rather poorly-thought-through
manner, with the result being that rather than being a field of the
Append or MergeAppend plan node as intended (and as it seems to be,
in text format), it was actually an element of the "Plans" subgroup.
At least in JSON format, that's flat out invalid syntax, because
"Plans" is an array not an object.

While it's not hard to move the generation of the field so that it
appears where it's supposed to, this does result in a visible change
in field order in text format, in cases where a Append or MergeAppend
plan node has any InitPlans attached.  That's slightly annoying to
do in stable branches; but the alternative of continuing to emit
broken non-text formats seems worse.

Also, since the set of fields emitted is not supposed to be
data-dependent in non-text formats, make sure that "Subplans Removed"
appears in Append and MergeAppend nodes even when it's zero, in those
formats.  (The previous coding made it look like it could appear in
some other node types such as BitmapAnd, but we don't actually support
runtime pruning there, so don't emit it in those cases.)

Per bug #16171 from Mahadevan Ramachandran.  Fix by Daniel Gustafsson
and Tom Lane, reviewed by Hamid Akhtar.  Back-patch to v11 where this
code came in.

Discussion: https://postgr.es/m/16171-b72259ab75505fa2@postgresql.org
2020-02-04 13:07:13 -05:00
Alvaro Herrera c9d2977519 Clean up newlines following left parentheses
We used to strategically place newlines after some function call left
parentheses to make pgindent move the argument list a few chars to the
left, so that the whole line would fit under 80 chars.  However,
pgindent no longer does that, so the newlines just made the code
vertically longer for no reason.  Remove those newlines, and reflow some
of those lines for some extra naturality.

Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
2020-01-30 13:42:14 -03:00
Alvaro Herrera 4e89c79a52 Remove excess parens in ereport() calls
Cosmetic cleanup, not worth backpatching.

Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
Reviewed-by: Tom Lane, Michael Paquier
2020-01-30 13:32:04 -03:00
Fujii Masao e6f1e560e4 Make inherited TRUNCATE perform access permission checks on parent table only.
Previously, TRUNCATE command through a parent table checked the
permissions on not only the parent table but also the children tables
inherited from it. This was a bug and inherited queries should perform
access permission checks on the parent table only. This commit fixes
that bug.

Back-patch to all supported branches.

Author: Amit Langote
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CAHGQGwFHdSvifhJE+-GSNqUHSfbiKxaeQQ7HGcYz6SC2n_oDcg@mail.gmail.com
2020-01-31 00:42:06 +09:00
Tom Lane 50fc694e43 Invent "trusted" extensions, and remove the pg_pltemplate catalog.
This patch creates a new extension property, "trusted".  An extension
that's marked that way in its control file can be installed by a
non-superuser who has the CREATE privilege on the current database,
even if the extension contains objects that normally would have to be
created by a superuser.  The objects within the extension will (by
default) be owned by the bootstrap superuser, but the extension itself
will be owned by the calling user.  This allows replicating the old
behavior around trusted procedural languages, without all the
special-case logic in CREATE LANGUAGE.  We have, however, chosen to
loosen the rules slightly: formerly, only a database owner could take
advantage of the special case that allowed installation of a trusted
language, but now anyone who has CREATE privilege can do so.

Having done that, we can delete the pg_pltemplate catalog, moving the
knowledge it contained into the extension script files for the various
PLs.  This ends up being no change at all for the in-core PLs, but it is
a large step forward for external PLs: they can now have the same ease
of installation as core PLs do.  The old "trusted PL" behavior was only
available to PLs that had entries in pg_pltemplate, but now any
extension can be marked trusted if appropriate.

This also removes one of the stumbling blocks for our Python 2 -> 3
migration, since the association of "plpythonu" with Python 2 is no
longer hard-wired into pg_pltemplate's initial contents.  Exactly where
we go from here on that front remains to be settled, but one problem
is fixed.

Patch by me, reviewed by Peter Eisentraut, Stephen Frost, and others.

Discussion: https://postgr.es/m/5889.1566415762@sss.pgh.pa.us
2020-01-29 18:42:43 -05:00
Amit Kapila 05f18c6b6b Added relation name in error messages for constraint checks.
This gives more information to the user about the error and it makes such
messages consistent with the other similar messages in the code.

Reported-by: Simon Riggs
Author: Mahendra Singh and Simon Riggs
Reviewed-by: Beena Emerson and Amit Kapila
Discussion: https://postgr.es/m/CANP8+j+7YUvQvGxTrCiw77R23enMJ7DFmyA3buR+fa2pKs4XhA@mail.gmail.com
2020-01-28 07:48:10 +05:30
Tom Lane 3ec20c7091 Fix EXPLAIN (SETTINGS) to follow policy about when to print empty fields.
In non-TEXT output formats, the "Settings" field should appear when
requested, even if it would be empty.

Also, get rid of the premature optimization of counting all the
GUC_EXPLAIN variables at startup.  Since there was no provision for
adjusting that count later, all it'd take would be some extension marking
a parameter as GUC_EXPLAIN to risk an assertion failure or memory stomp.
We could make get_explain_guc_options() count those variables on-the-fly,
or dynamically resize its array ... but TBH I do not think that making a
transient array of pointers a bit smaller is worth any extra complication,
especially when you consider all the other transient space EXPLAIN eats.
So just allocate that array at the max possible size.

In HEAD, also add some regression test coverage for this feature.

Because of the memory-stomp hazard, back-patch to v12 where this
feature was added.

Discussion: https://postgr.es/m/19416.1580069629@sss.pgh.pa.us
2020-01-26 16:32:19 -05:00
Tom Lane 1001368497 Clean up EXPLAIN's handling of per-worker details.
Previously, it was possible for EXPLAIN ANALYZE of a parallel query
to produce several different "Workers" fields for a single plan node,
because different portions of explain.c independently generated
per-worker data and wrapped that output in separate fields.  This
is pretty bogus, especially for the structured output formats: even
if it's not technically illegal, most programs would have a hard time
dealing with such data.

To improve matters, add infrastructure that allows redirecting
per-worker values into a side data structure, and then collect that
data into a single "Workers" field after we've finished running all
the relevant code for a given plan node.

There are a few visible side-effects:

* In text format, instead of something like

  Sort Method: external merge  Disk: 4920kB
  Worker 0:  Sort Method: external merge  Disk: 5880kB
  Worker 1:  Sort Method: external merge  Disk: 5920kB
  Buffers: shared hit=682 read=10188, temp read=1415 written=2101
  Worker 0:  actual time=130.058..130.324 rows=1324 loops=1
    Buffers: shared hit=337 read=3489, temp read=505 written=739
  Worker 1:  actual time=130.273..130.512 rows=1297 loops=1
    Buffers: shared hit=345 read=3507, temp read=505 written=744

you get

  Sort Method: external merge  Disk: 4920kB
  Buffers: shared hit=682 read=10188, temp read=1415 written=2101
  Worker 0:  actual time=130.058..130.324 rows=1324 loops=1
    Sort Method: external merge  Disk: 5880kB
    Buffers: shared hit=337 read=3489, temp read=505 written=739
  Worker 1:  actual time=130.273..130.512 rows=1297 loops=1
    Sort Method: external merge  Disk: 5920kB
    Buffers: shared hit=345 read=3507, temp read=505 written=744

* When JIT is enabled, any relevant per-worker JIT stats are attached
to the child node of the Gather or Gather Merge node, which is where
the other per-worker output has always been.  Previously, that info
was attached directly to a Gather node, or missed entirely for Gather
Merge.

* A query's summary JIT data no longer includes a bogus
"Worker Number: -1" field.

A notable code-level change is that indenting for lines of text-format
output should now be handled by calling "ExplainIndentText(es)",
instead of hard-wiring how much space to emit.  This seems a good deal
cleaner anyway.

This patch also adds a new "explain.sql" regression test script that's
dedicated to testing EXPLAIN.  There is more that can be done in that
line, certainly, but for now it just adds some coverage of the XML and
YAML output formats, which had been completely untested.

Although this is surely a bug fix, it's not clear that people would
be happy with rearranging EXPLAIN output in a minor release, so apply
to HEAD only.

Maciek Sakrejda and Tom Lane, based on an idea of Andres Freund's;
reviewed by Georgios Kokolatos

Discussion: https://postgr.es/m/CAOtHd0AvAA8CLB9Xz0wnxu1U=zJCKrr1r4QwwXi_kcQsHDVU=Q@mail.gmail.com
2020-01-25 18:16:42 -05:00
Michael Paquier a904abe2e2 Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work.  Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR:  index "foo" already contains data

Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user.  Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.

The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands.  In all supported versions, this caused only confusing
error messages to be generated.  Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.

The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.

Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 09:49:18 +09:00
Tom Lane 9b9c5f279e Clarify behavior of adding and altering a column in same ALTER command.
The behavior of something like

ALTER TABLE transactions
  ADD COLUMN status varchar(30) DEFAULT 'old',
  ALTER COLUMN status SET default 'current';

is to fill existing table rows with 'old', not 'current'.  That's
intentional and desirable for a couple of reasons:

* It makes the behavior the same whether you merge the sub-commands
into one ALTER command or give them separately;

* If we applied the new default while filling the table, there would
be no way to get the existing behavior in one SQL command.

The same reasoning applies in cases that add a column and then
manipulate its GENERATED/IDENTITY status in a second sub-command,
since the generation expression is really just a kind of default.
However, that wasn't very obvious (at least not to me; earlier in
the referenced discussion thread I'd thought it was a bug to be
fixed).  And it certainly wasn't documented.

Hence, add documentation, code comments, and a test case to clarify
that this behavior is all intentional.

In passing, adjust ATExecAddColumn's defaults-related relkind check
so that it matches up exactly with ATRewriteTables, instead of being
effectively (though not literally) the negated inverse condition.
The reasoning can be explained a lot more concisely that way, too
(not to mention that the comment now matches the code, which it
did not before).

Discussion: https://postgr.es/m/10365.1558909428@sss.pgh.pa.us
2020-01-21 16:17:21 -05:00
Amit Kapila 40d964ec99 Allow vacuum command to process indexes in parallel.
This feature allows the vacuum to leverage multiple CPUs in order to
process indexes.  This enables us to perform index vacuuming and index
cleanup with background workers.  This adds a PARALLEL option to VACUUM
command where the user can specify the number of workers that can be used
to perform the command which is limited by the number of indexes on a
table.  Specifying zero as a number of workers will disable parallelism.
This option can't be used with the FULL option.

Each index is processed by at most one vacuum process.  Therefore parallel
vacuum can be used when the table has at least two indexes.

The parallel degree is either specified by the user or determined based on
the number of indexes that the table has, and further limited by
max_parallel_maintenance_workers.  The index can participate in parallel
vacuum iff it's size is greater than min_parallel_index_scan_size.

Author: Masahiko Sawada and Amit Kapila
Reviewed-by: Dilip Kumar, Amit Kapila, Robert Haas, Tomas Vondra,
Mahendra Singh and Sergei Kornilov
Tested-by: Mahendra Singh and Prabhat Sahu
Discussion:
https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com
2020-01-20 07:57:49 +05:30
Robert Haas 2eb34ac369 Fix problems with "read only query" checks, and refactor the code.
Previously, check_xact_readonly() was responsible for determining
which types of queries could not be run in a read-only transaction,
standard_ProcessUtility() was responsibility for prohibiting things
which were allowed in read only transactions but not in recovery, and
utility commands were basically prohibited in bulk in parallel mode by
calls to CommandIsReadOnly() in functions.c and spi.c.  This situation
was confusing and error-prone. Accordingly, move all the checks to a
new function ClassifyUtilityCommandAsReadOnly(), which determines the
degree to which a given statement is read only.

In the old code, check_xact_readonly() inadvertently failed to handle
several statement types that actually should have been prohibited,
specifically T_CreatePolicyStmt, T_AlterPolicyStmt, T_CreateAmStmt,
T_CreateStatsStmt, T_AlterStatsStmt, and T_AlterCollationStmt.  As a
result, thes statements were erroneously allowed in read only
transactions, parallel queries, and standby operation. Generally, they
would fail anyway due to some lower-level error check, but we
shouldn't rely on that.  In the new code structure, future omissions
of this type should cause ClassifyUtilityCommandAsReadOnly() to
complain about an unrecognized node type.

As a fringe benefit, this means we can allow certain types of utility
commands in parallel mode, where it's safe to do so. This allows
ALTER SYSTEM, CALL, DO, CHECKPOINT, COPY FROM, EXPLAIN, and SHOW.
It might be possible to allow additional commands with more work
and thought.

Along the way, document the thinking process behind the current set
of checks, as per discussion especially with Peter Eisentraut. There
is some interest in revising some of these rules, but that seems
like a job for another patch.

Patch by me, reviewed by Tom Lane, Stephen Frost, and Peter
Eisentraut.

Discussion: http://postgr.es/m/CA+TgmoZ_rLqJt5sYkvh+JpQnfX0Y+B2R+qfi820xNih6x-FQOQ@mail.gmail.com
2020-01-16 12:11:31 -05:00
Tom Lane 1281a5c907 Restructure ALTER TABLE execution to fix assorted bugs.
We've had numerous bug reports about how (1) IF NOT EXISTS clauses in
ALTER TABLE don't behave as-expected, and (2) combining certain actions
into one ALTER TABLE doesn't work, though executing the same actions as
separate statements does.  This patch cleans up all of the cases so far
reported from the field, though there are still some oddities associated
with identity columns.

The core problem behind all of these bugs is that we do parse analysis
of ALTER TABLE subcommands too soon, before starting execution of the
statement.  The root of the bugs in group (1) is that parse analysis
schedules derived commands (such as a CREATE SEQUENCE for a serial
column) before it's known whether the IF NOT EXISTS clause should cause
a subcommand to be skipped.  The root of the bugs in group (2) is that
earlier subcommands may change the catalog state that later subcommands
need to be parsed against.

Hence, postpone parse analysis of ALTER TABLE's subcommands, and do
that one subcommand at a time, during "phase 2" of ALTER TABLE which
is the phase that does catalog rewrites.  Thus the catalog effects
of earlier subcommands are already visible when we analyze later ones.
(The sole exception is that we do parse analysis for ALTER COLUMN TYPE
subcommands during phase 1, so that their USING expressions can be
parsed against the table's original state, which is what we need.
Arguably, these bugs stem from falsely concluding that because ALTER
COLUMN TYPE must do early parse analysis, every other command subtype
can too.)

This means that ALTER TABLE itself must deal with execution of any
non-ALTER-TABLE derived statements that are generated by parse analysis.
Add a suitable entry point to utility.c to accept those recursive
calls, and create a struct to pass through the information needed by
the recursive call, rather than making the argument lists of
AlterTable() and friends even longer.

Getting this to work correctly required a little bit of fiddling
with the subcommand pass structure, in particular breaking up
AT_PASS_ADD_CONSTR into multiple passes.  But otherwise it's mostly
a pretty straightforward application of the above ideas.

Fixing the residual issues for identity columns requires refactoring of
where the dependency link from an identity column to its sequence gets
set up.  So that seems like suitable material for a separate patch,
especially since this one is pretty big already.

Discussion: https://postgr.es/m/10365.1558909428@sss.pgh.pa.us
2020-01-15 18:49:24 -05:00
Alvaro Herrera a166d408eb Report progress of ANALYZE commands
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for ANALYZE.

Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Co-authored-by: Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>
Reviewed-by: Julien Rouhaud, Robert Haas, Anthony Nowocien, Kyotaro Horiguchi,
	Vignesh C, Amit Langote
2020-01-15 11:14:39 -03:00
Peter Eisentraut f595117e24 ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION
Add an ALTER TABLE subcommand for dropping the generated property from
a column, per SQL standard.

Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/2f7f1d9c-946e-0453-d841-4f38eb9d69b6%402ndquadrant.com
2020-01-14 13:36:03 +01:00
Peter Eisentraut c67a55da4e Make lsn argument of walrcv_create_slot() optional
Some callers are not using it, so it's wasteful to have to specify it.

Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/CA+fd4k4BcYrYucNfTnK-CQX3+jsG+PRPEhHAUSo-W4P0Lec57A@mail.gmail.com
2020-01-11 09:07:14 +01:00
Tom Lane 4ac8aaa36f Fix handling of generated columns in ALTER TABLE.
ALTER TABLE failed if a column referenced in a GENERATED expression
had been added or changed in type earlier in the ALTER command.
That's because the GENERATED expression needs to be evaluated
against the table's updated tuples, but it was being evaluated
against the original tuples.  (Fortunately the executor has adequate
cross-checks to notice the mismatch, so we just got an obscure error
message and not anything more dangerous.)

Per report from Andreas Joseph Krogh.  Back-patch to v12 where
GENERATED was added.

Discussion: https://postgr.es/m/VisenaEmail.200.231b0a41523275d0.16ea7f800c7@tc7-visena
2020-01-08 09:42:53 -05:00
Michael Paquier 65192e0244 Revert "Forbid DROP SCHEMA on temporary namespaces"
This reverts commit a052f6c, following complains from Robert Haas and
Tom Lane.  Backpatch down to 9.4, like the previous commit.

Discussion: https://postgr.es/m/CA+TgmobL4npEX5=E5h=5Jm_9mZun3MT39Kq2suJFVeamc9skSQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-08 10:36:12 +09:00
Peter Eisentraut 3fd40b628c Make better use of ParseState in ProcessUtility
Pass ParseState into the functions called from
standard_ProcessUtility() instead passing the query string and query
environment separately.  No functionality change, but it makes the
notation consistent.  We had already started moving things into
that direction piece by piece, and this completes it.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/6e7aa4a1-be6a-1a75-b1f9-83a678e5184a@2ndquadrant.com
2020-01-04 13:12:41 +01:00
Alvaro Herrera 1fa846f1c9 Fix cloning of row triggers to sub-partitions
When row triggers exist in partitioned partitions that are not either
part of FKs or deferred unique constraints, they are not correctly
cloned to their partitions.  That's because they are marked "internal",
and those are purposefully skipped when doing the clone triggers dance.
Fix by relaxing the condition on which internal triggers are skipped.

Amit Langote initially diagnosed the problem and proposed a fix, but I
used a different approach.

Reported-by: Petr Fedorov
Discussion: https://postgr.es/m/6b3f0646-ba8c-b3a9-c62d-1c6651a1920f@phystech.edu
2020-01-02 17:04:24 -03:00
Tom Lane 5815696bc6 Make parser rely more heavily on the ParseNamespaceItem data structure.
When I added the ParseNamespaceItem data structure (in commit 5ebaaa494),
it wasn't very tightly integrated into the parser's APIs.  In the wake of
adding p_rtindex to that struct (commit b541e9acc), there is a good reason
to make more use of it: by passing around ParseNamespaceItem pointers
instead of bare RTE pointers, we can get rid of various messy methods for
passing back or deducing the rangetable index of an RTE during parsing.
Hence, refactor the addRangeTableEntryXXX functions to build and return
a ParseNamespaceItem struct, not just the RTE proper; and replace
addRTEtoQuery with addNSItemToQuery, which is passed a ParseNamespaceItem
rather than building one internally.

Also, add per-column data (a ParseNamespaceColumn array) to each
ParseNamespaceItem.  These arrays are built during addRangeTableEntryXXX,
where we have column type data at hand so that it's nearly free to fill
the data structure.  Later, when we need to build Vars referencing RTEs,
we can use the ParseNamespaceColumn info to avoid the rather expensive
operations done in get_rte_attribute_type() or expandRTE().
get_rte_attribute_type() is indeed dead code now, so I've removed it.
This makes for a useful improvement in parse analysis speed, around 20%
in one moderately-complex test query.

The ParseNamespaceColumn structs also include Var identity information
(varno/varattno).  That info isn't actually being used in this patch,
except that p_varno == 0 is a handy test for a dropped column.
A follow-on patch will make more use of it.

Discussion: https://postgr.es/m/2461.1577764221@sss.pgh.pa.us
2020-01-02 11:29:01 -05:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Michael Paquier a052f6cbb8 Forbid DROP SCHEMA on temporary namespaces
This operation was possible for the owner of the schema or a superuser.
Down to 9.4, doing this operation would cause inconsistencies in a
session whose temporary schema was dropped, particularly if trying to
create new temporary objects after the drop.  A more annoying
consequence is a crash of autovacuum on an assertion failure when
logging information about an orphaned temp table dropped.  Note that
because of 246a6c8 (present in v11~), which has made the removal of
orphaned temporary tables more aggressive, the failure could be
triggered more easily, but it is possible to reproduce down to 9.4.

Reported-by: Mahendra Singh, Prabhat Sahu
Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi, Mahendra Singh
Discussion: https://postgr.es/m/CAKYtNAr9Zq=1-ww4etHo-VCC-k120YxZy5OS01VkaLPaDbv2tg@mail.gmail.com
Backpatch-through: 9.4
2019-12-27 17:58:43 +09:00
Michael Paquier 7854e07f25 Revert "Rename files and headers related to index AM"
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.

Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
2019-12-27 08:09:00 +09:00
Tom Lane bb4114a4e2 Allow whole-row Vars to be used in partitioning expressions.
In the wake of commit 5b9312378, there's no particular reason
for this restriction (previously, it was problematic because of
the implied rowtype reference).  A simple constraint on a whole-row
Var probably isn't that useful, but conceivably somebody would want
to pass one to a function that extracts a partitioning key.  Besides
which, we're expending much more code to enforce the restriction than
we save by having it, since the latter quantity is now zero.
So drop the restriction.

Amit Langote

Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
2019-12-25 15:44:15 -05:00
Tom Lane 5b9312378e Load relcache entries' partitioning data on-demand, not immediately.
Formerly the rd_partkey and rd_partdesc data structures were always
populated immediately when a relcache entry was built or rebuilt.
This patch changes things so that they are populated only when they
are first requested.  (Hence, callers *must* now always use
RelationGetPartitionKey or RelationGetPartitionDesc; just fetching
the pointer directly is no longer acceptable.)

This seems to have some performance benefits, but the main reason to do
it is that it eliminates a recursive-reload failure that occurs if the
partkey or partdesc expressions contain any references to the relation's
rowtype (as discovered by Amit Langote).  In retrospect, since loading
these data structures might result in execution of nearly-arbitrary code
via eval_const_expressions, it was a dumb idea to require that to happen
during relcache entry rebuild.

Also, fix things so that old copies of a relcache partition descriptor
will be dropped when the cache entry's refcount goes to zero.  In the
previous coding it was possible for such copies to survive for the
lifetime of the session, as I'd complained of in a previous discussion.
(This management technique still isn't perfect, but it's better than
before.)  Improve the commentary explaining how that works and why
it's safe to hand out direct pointers to these relcache substructures.

In passing, improve RelationBuildPartitionDesc by using the same
memory-context-parent-swap approach used by RelationBuildPartitionKey,
thereby making it less dependent on strong assumptions about what
partition_bounds_copy does.  Avoid doing get_rel_relkind in the
critical section, too.

Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit
for prior work in the area, too.  Although this is a pre-existing
problem, no back-patch: the patch seems too invasive to be safe to
back-patch, and the bug it fixes is a corner case that seems
relatively unlikely to cause problems in the field.

Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
2019-12-25 14:43:13 -05:00
Michael Paquier 8ce3aa9b59 Rename files and headers related to index AM
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c.  Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.

This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.

Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
2019-12-25 10:23:39 +09:00
Alvaro Herrera c4dcd9144b Avoid splitting C string literals with \-newline
Using \ is unnecessary and ugly, so remove that.  While at it, stitch
the literals back into a single line: we've long discouraged splitting
error message literals even when they go past the 80 chars line limit,
to improve greppability.

Leave contrib/tablefunc alone.

Discussion: https://postgr.es/m/20191223195156.GA12271@alvherre.pgsql
2019-12-24 12:44:12 -03:00
Tom Lane 39ebb943de Disallow partition key expressions that return pseudo-types.
This wasn't checked originally, but it should have been, because
in general pseudo-types can't be stored to and retrieved from disk.
Notably, partition bound values of type "record" would not be
interpretable by another session.

In v12 and HEAD, add another flag to CheckAttributeType's repertoire
so that it can produce a specific error message for this case.  That's
infeasible in older branches without an ABI break, so fall back to
a slightly-less-nicely-worded error message in v10 and v11.

Problem noted by Amit Langote, though this patch is not his initial
solution.  Back-patch to v10 where partitioning was introduced.

Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
2019-12-23 12:53:12 -05:00
Robert Haas 7cdcc747a9 Update neglected comment.
Commit d986d4e87f renamed a variable
but neglected to update the corresponding comment.

Amit Langote
2019-12-19 09:24:44 -05:00
Michael Paquier e1551f96e6 Refactor attribute mappings used in logical tuple conversion
Tuple conversion support in tupconvert.c is able to convert rowtypes
between two relations, inner and outer, which are logically equivalent
but have a different ordering or even dropped columns (used mainly for
inheritance tree and partitions).  This makes use of attribute mappings,
which are simple arrays made of AttrNumber elements with a length
matching the number of attributes of the outer relation.  The length of
the attribute mapping has been treated as completely independent of the
mapping itself until now, making it easy to pass down an incorrect
mapping length.

This commit refactors the code related to attribute mappings and moves
it into an independent facility called attmap.c, extracted from
tupconvert.c.  This merges the attribute mapping with its length,
avoiding to try to guess what is the length of a mapping to use as this
is computed once, when the map is built.

This will avoid mistakes like what has been fixed in dc816e58, which has
used an incorrect mapping length by matching it with the number of
attributes of an inner relation (a child partition) instead of an outer
relation (a partitioned table).

Author: Michael Paquier
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/20191121042556.GD153437@paquier.xyz
2019-12-18 16:23:02 +09:00
Tom Lane b925a00f4e Fix "force_parallel_mode = regress" to work with ANALYZE + VERBOSE.
force_parallel_mode = regress is supposed to force use of a Gather
node without having any impact on EXPLAIN output.  But it failed to
accomplish that if both ANALYZE and VERBOSE are given, because that
enables per-worker output data that you wouldn't see if the Gather
hadn't been inserted.  Improve the logic so that we suppress the
per-worker data too.

This allows putting the new test case added by commit 5935917ce
back into the originally intended form (cf. 776a2c887, 22864f6e0).
We can also get rid of a kluge in subselect.sql, which previously
had to clean up after force_parallel_mode's failure to do what it
said on the tin.

Discussion: https://postgr.es/m/18445.1576177309@sss.pgh.pa.us
2019-12-16 20:14:35 -05:00
Tom Lane 6ef77cf46e Further adjust EXPLAIN's choices of table alias names.
This patch causes EXPLAIN to always assign a separate table alias to the
parent RTE of an append relation (inheritance set); before, such RTEs
were ignored if not actually scanned by the plan.  Since the child RTEs
now always have that same alias to start with (cf. commit 55a1954da),
the net effect is that the parent RTE usually gets the alias used or
implied by the query text, and the children all get that alias with "_N"
appended.  (The exception to "usually" is if there are duplicate aliases
in different subtrees of the original query; then some of those original
RTEs will also have "_N" appended.)

This results in more uniform output for partitioned-table plans than
we had before: the partitioned table itself gets the original alias,
and all child tables have aliases with "_N", rather than the previous
behavior where one of the children would get an alias without "_N".

The reason for giving the parent RTE an alias, even if it isn't scanned
by the plan, is that we now use the parent's alias to qualify Vars that
refer to an appendrel output column and appear above the Append or
MergeAppend that computes the appendrel.  But below the append, Vars
refer to some one of the child relations, and are displayed that way.
This seems clearer than the old behavior where a Var that could carry
values from any child relation was displayed as if it referred to only
one of them.

While at it, change ruleutils.c so that the code paths used by EXPLAIN
deal in Plan trees not PlanState trees.  This effectively reverts a
decision made in commit 1cc29fe7c, which seemed like a good idea at
the time to make ruleutils.c consistent with explain.c.  However,
it's problematic because we'd really like to allow executor startup
pruning to remove all the children of an append node when possible,
leaving no child PlanState to resolve Vars against.  (That's not done
here, but will be in the next patch.)  This requires different handling
of subplans and initplans than before, but is otherwise a pretty
straightforward change.

Discussion: https://postgr.es/m/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp
2019-12-11 17:05:18 -05:00
Peter Eisentraut 105eb360f2 Remove ATPrepSetStatistics
It was once possible to do ALTER TABLE ... SET STATISTICS on system
tables without allow_sytem_table_mods.  This was changed apparently by
accident between PostgreSQL 9.1 and 9.2, but a code comment still
claimed this was possible.  Without that functionality, having a
separate ATPrepSetStatistics() is useless, so use the generic
ATSimplePermissions() instead and move the remaining custom code into
ATExecSetStatistics().

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/cc8d2648-a0ec-7a86-13e5-db473484e19e%402ndquadrant.com
2019-12-11 09:04:04 +01:00
Etsuro Fujita 5a20b0219e Fix handling of multiple AFTER ROW triggers on a foreign table.
AfterTriggerExecute() retrieves a fresh tuple or pair of tuples from a
tuplestore and then stores the tuple(s) in the passed-in slot(s) if
AFTER_TRIGGER_FDW_FETCH, while it uses the most-recently-retrieved
tuple(s) stored in the slot(s) if AFTER_TRIGGER_FDW_REUSE.  This was
done correctly before 12, but commit ff11e7f4b broke it by mistakenly
clearing the tuple(s) stored in the slot(s) in that function, leading to
an assertion failure as reported in bug #16139 from Alexander Lakhin.

Also, fix some other issues with the aforementioned commit in passing:

* For tg_newslot, which is a slot added to the TriggerData struct by the
  commit to store new updated tuples, it didn't ensure the slot was NULL
  if there was no such tuple.
* The commit failed to update the documentation about the trigger
  interface.

Author: Etsuro Fujita
Backpatch-through: 12
Discussion: https://postgr.es/m/16139-94f9ccf0db6119ec%40postgresql.org
2019-12-10 18:00:30 +09:00
Peter Eisentraut d4feadeca1 Add error position to an error message
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/6e7aa4a1-be6a-1a75-b1f9-83a678e5184a@2ndquadrant.com
2019-11-29 09:10:17 +01:00
Tom Lane d3aa114ac4 Doc: improve discussion of race conditions involved in LISTEN.
The user docs didn't really explain how to use LISTEN safely,
so clarify that.  Also clean up some fuzzy-headed explanations
in comments.  No code changes.

Discussion: https://postgr.es/m/3ac7f397-4d5f-be8e-f354-440020675694@gmail.com
2019-11-24 18:03:39 -05:00
Tom Lane 7900269724 Stabilize NOTIFY behavior by transmitting notifies before ReadyForQuery.
This patch ensures that, if any notify messages were received during
a just-finished transaction, they get sent to the frontend just before
not just after the ReadyForQuery message.  With libpq and other client
libraries that act similarly, this guarantees that the client will see
the notify messages as available as soon as it thinks the transaction
is done.

This probably makes no difference in practice, since in realistic
use-cases the application would have to cope with asynchronous
arrival of notify events anyhow.  However, it makes it a lot easier
to build cross-session-notify test cases with stable behavior.
I'm a bit surprised now that we've not seen any buildfarm instability
with the test cases added by commit b10f40bf0.  Tests that I intend
to add in an upcoming bug fix are definitely unstable without this.

Back-patch to 9.6, which is as far back as we can do NOTIFY testing
with the isolationtester infrastructure.

Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
2019-11-24 14:42:59 -05:00
Tom Lane 8b7ae5a82d Stabilize the results of pg_notification_queue_usage().
This function wasn't touched in commit 51004c717, but that turns out
to be a bad idea, because its results now include any dead space
that exists in the NOTIFY queue on account of our being lazy about
advancing the queue tail.  Notably, the isolation tests now fail
if run twice without a server restart between, because async-notify's
first test of the function will already show a positive value.
It seems likely that end users would be equally unhappy about the
result's instability.  To fix, just make the function call
asyncQueueAdvanceTail before computing its result.  That should end
in producing the same value as before, and it's hard to believe that
there's any practical use-case where pg_notification_queue_usage()
is called so often as to create a performance degradation, especially
compared to what we did before.

Out of paranoia, also mark this function parallel-restricted (it
was volatile, but parallel-safe by default, before).  Although the
code seems to work fine when run in a parallel worker, that's outside
the design scope of async.c, and it's a bit scary to have intentional
side-effects happening in a parallel worker.  There seems no plausible
use-case where it'd be important to try to parallelize this, so let's
not take any risk of introducing new bugs.

In passing, re-pgindent async.c and run reformat-dat-files on
pg_proc.dat, just because I'm a neatnik.

Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
2019-11-24 14:09:33 -05:00
Joe Conway f7a2002e82 Add object TRUNCATE hook
All operations with acl permissions checks should have a corresponding hook
so that, for example, mandatory access control (MAC) may be enforced by an
extension. The command TRUNCATE is missing this hook, so add it. Patch by
Yuli Khodorkovskiy with some editorialization by me. Based on the discussion
not back-patched. A separate patch will exercise the hook in the sepgsql
extension.

Author: Yuli Khodorkovskiy
Reviewed-by: Joe Conway
Discussion: https://postgr.es/m/CAFL5wJcomybj1Xdw7qWmPJRpGuFukKgNrDb6uVBaCMgYS9dkaA%40mail.gmail.com
2019-11-23 10:39:20 -05:00
Peter Eisentraut 2e4db241bf Remove configure --disable-float4-byval
This build option was only useful to maintain compatibility for
version-0 functions, but those are no longer supported, so this option
can be removed.

float4 is now always pass-by-value; the pass-by-reference code path is
completely removed.

Discussion: https://www.postgresql.org/message-id/flat/f3e1e576-2749-bbd7-2d57-3f9dcf75255a@2ndquadrant.com
2019-11-21 18:29:21 +01:00
Fujii Masao e6d8069522 Make DROP DATABASE command generate less WAL records.
Previously DROP DATABASE generated as many XLOG_DBASE_DROP WAL records
as the number of tablespaces that the database to drop uses. This caused
the scans of shared_buffers as many times as the number of the tablespaces
during recovery because WAL replay of one XLOG_DBASE_DROP record needs
that full scan. This could make the recovery time longer especially
when shared_buffers is large.

This commit changes DROP DATABASE so that it generates only one
XLOG_DBASE_DROP record, and registers the information of all the tablespaces
into it. Then, WAL replay of XLOG_DBASE_DROP record needs full scan of
shared_buffers only once, and which may improve the recovery performance.

Author: Fujii Masao
Reviewed-by: Kirk Jamison, Simon Riggs
Discussion: https://postgr.es/m/CAHGQGwF8YwNH0ZaL+2wjZPkj+ji9UhC+Z4ScnG97WKtVY5L9iw@mail.gmail.com
2019-11-21 21:10:37 +09:00
Fujii Masao 30840c92ac Allow ALTER VIEW command to rename the column in the view.
ALTER TABLE RENAME COLUMN command always can be used to rename the column
in the view, but it's reasonable to add that syntax to ALTER VIEW too.

Author: Fujii Masao
Reviewed-by: Ibrar Ahmed, Yu Kimura
Discussion: https://postgr.es/m/CAHGQGwHoQMD3b-MqTLcp1MgdhCpOKU7QNRwjFooT4_d+ti5v6g@mail.gmail.com
2019-11-21 19:55:13 +09:00
Michael Paquier 1bbd608fda Split handling of reloptions for partitioned tables
Partitioned tables do not have relation options yet, but, similarly to
what's done for views which have their own parsing table, it could make
sense to introduce new parameters for some of the existing default ones
like fillfactor, autovacuum, etc.  Splitting things has the advantage to
make the information stored in rd_options include only the necessary
information, reducing the amount of memory used for a relcache entry
with partitioned tables if new reloptions are introduced at this level.

Author:  Nikolay Shaplov
Reviewed-by: Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/1627387.Qykg9O6zpu@x200m
2019-11-14 12:34:28 +09:00
Tom Lane 0cafdd03a8 Fix silly initializations (cosmetic only).
Initializing a pointer to "false" isn't per project style,
and reportedly some compilers warn about it (though I've
not seen any such warnings in the buildfarm).

Seems to have come in with commit ff11e7f4b, so back-patch
to v12 where that was added.

Didier Gautheron

Discussion: https://postgr.es/m/CAJRYxu+XQuM0qnSqt1Ujztu6fBPzMMAT3VEn6W32rgKG6A2Fsw@mail.gmail.com
2019-11-13 15:26:54 -05:00
Amit Kapila 1379fd537f Introduce the 'force' option for the Drop Database command.
This new option terminates the other sessions connected to the target
database and then drop it.  To terminate other sessions, the current user
must have desired permissions (same as pg_terminate_backend()).  We don't
allow to terminate the sessions if prepared transactions, active logical
replication slots or subscriptions are present in the target database.

Author: Pavel Stehule with changes by me
Reviewed-by: Dilip Kumar, Vignesh C, Ibrar Ahmed, Anthony Nowocien,
Ryan Lambert and Amit Kapila
Discussion: https://postgr.es/m/CAP_rwwmLJJbn70vLOZFpxGw3XD7nLB_7+NKz46H5EOO2k5H7OQ@mail.gmail.com
2019-11-13 08:25:33 +05:30
Alvaro Herrera dcb7d3cafa Have LookupFuncName accept NULL argtypes for 0 args
Prior to this change, it requires to be passed a valid pointer just to
be able to pass it to a zero-byte memcmp, per 0a52d378b0.  Given the
strange resulting code in callsites, it seems better to test for the
case specifically and remove the requirement.

Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/MN2PR18MB2927F24692485D754794F01BE3740@MN2PR18MB2927.namprd18.prod.outlook.com
Discussion: https://postgr.es/m/MN2PR18MB2927F6873DF2774A505AC298E3740@MN2PR18MB2927.namprd18.prod.outlook.com
2019-11-12 17:06:58 -03:00
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Amit Kapila 9fab25c6cd Rearrange dropdb() to avoid errors after allowing other sessions to exit.
During Drop Database, it is better to error out before allowing other
sessions to exit and forcefully terminating autovacuum workers.  All the
other errors except for checking subscriptions are already done before.

Author: Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1+qhLkCYG2oy9xug9ur_j=G2wQNRYAyd+-kZfZ1z42pLw@mail.gmail.com
2019-11-11 07:42:45 +05:30
Alvaro Herrera b4bcc6bfdf Fix SET CONSTRAINTS .. DEFERRED on partitioned tables
SET CONSTRAINTS ... DEFERRED failed on partitioned tables, because of a
sanity check that ensures that the affected constraints have triggers.
On partitioned tables, the triggers are in the leaf partitions, not in
the partitioned relations themselves, so the sanity check fails.
Removing the sanity check solves the problem, because the code needed to
support the case is already there.

Backpatch to 11.

Note: deferred unique constraints are not affected by this bug, because
they do have triggers in the parent partitioned table.  I did not add a
test for this scenario.

Discussion: https://postgr.es/m/20191105212915.GA11324@alvherre.pgsql
2019-11-07 13:59:24 -03:00
Andres Freund 01368e5d9d Split all OBJS style lines in makefiles into one-line-per-entry style.
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.

By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.

Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05 14:41:07 -08:00
Tom Lane a30531c5c8 Fix "unexpected relkind" error when denying permissions on toast tables.
get_relkind_objtype, and hence get_object_type, failed when applied to a
toast table.  This is not a good thing, because it prevents reporting of
perfectly legitimate permissions errors.  (At present, these functions
are in fact *only* used to determine the ObjectType argument for
acl_error() calls.)  It seems best to have them fall back to returning
OBJECT_TABLE in every case where they can't determine an object type
for a pg_class entry, so do that.

In passing, make some edits to alter.c to make it more obvious that
those calls of get_object_type() are used only for error reporting.
This might save a few cycles in the non-error code path, too.

Back-patch to v11 where this issue originated.

John Hsu, Michael Paquier, Tom Lane

Discussion: https://postgr.es/m/C652D3DF-2B0C-4128-9420-FB5379F6B1E4@amazon.com
2019-11-05 13:40:37 -05:00
Michael Paquier dc816e5815 Fix failure when creating cloned indexes for a partition
When using CREATE TABLE for a new partition, the partitioned indexes of
the parent are created automatically in a fashion similar to LIKE
INDEXES.  The new partition and its parent use a mapping for attribute
numbers for this operation, and while the mapping was correctly built,
its length was defined as the number of attributes of the newly-created
child, and not the parent.  If the parent includes dropped columns, this
could cause failures.

This is wrong since 8b08f7d which has introduced the concept of
partitioned indexes, so backpatch down to 11.

Reported-by: Wyatt Alt
Author: Michael Paquier
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/CAGem3qCcRmhbs4jYMkenYNfP2kEusDXvTfw-q+eOhM0zTceG-g@mail.gmail.com
Backpatch-through: 11
2019-11-02 14:16:04 +09:00
Peter Eisentraut 604bd36711 PG_FINALLY
This gives an alternative way of catching exceptions, for the common
case where the cleanup code is the same in the error and non-error
cases.  So instead of

    PG_TRY();
    {
        ... code that might throw ereport(ERROR) ...
    }
    PG_CATCH();
    {
        cleanup();
	PG_RE_THROW();
    }
    PG_END_TRY();
    cleanup();

one can write

    PG_TRY();
    {
        ... code that might throw ereport(ERROR) ...
    }
    PG_FINALLY();
    {
        cleanup();
    }
    PG_END_TRY();

Discussion: https://www.postgresql.org/message-id/flat/95a822c3-728b-af0e-d7e5-71890507ae0c%402ndquadrant.com
2019-11-01 11:18:03 +01:00
Michael Paquier 8270a0d9a9 Handle interrupts within a transaction context in REINDEX CONCURRENTLY
Phases 2 (building the new index) and 3 (validating the new index)
checked for interrupts outside a transaction context, having as
consequence to not release session-level locks taken on the parent
relation and the old and new indexes processed.  This could for example
be triggered with statement_timeout and a bad timing, and would issue
confusing error messages when shutting down the session still holding
the locks (note that an assertion failure would be triggered first), on
top of more issues with concurrent sessions trying to take a lock that
would interfere with the SHARE UPDATE EXCLUSIVE locks hold here.

This moves all the interruption checks inside a transaction context.
Note that I have manually tested all interruptions to make sure that
invalid indexes can be cleaned up properly.  Partition indexes still
have issues on their own with some missing dependency handling, which
will be dealt with in a follow-up patch.

Reported-by: Justin Pryzby
Author: Michael Paquier
Discussion: https://postgr.es/m/20191013025145.GC4475@telsasoft.com
Backpatch-through: 12
2019-10-25 10:20:08 +09:00
Michael Paquier 5d3500da72 Acquire properly session-level lock on new index in REINDEX CONCURRENTLY
In the first transaction run for REINDEX CONCURRENTLY, a thinko in the
existing logic caused two session locks to be taken on the old index,
causing the session lock on the newly-created index to be missed.  This
made possible concurrent DDL commands (like ALTER INDEX) on the new
index while REINDEX CONCURRENTLY was processing from the point where the
first internal transaction committed.

This issue has been discovered while digging into another bug.

Author: Michael Paquier
Discussion: https://postgr.es/m/20191021074323.GB1869@paquier.xyz
Backpatch-through: 12
2019-10-23 15:04:48 +09:00
Michael Paquier f25968c496 Remove last traces of heap_open/close in the tree
Since pluggable storage has been introduced, those two routines have
been replaced by table_open/close, with some compatibility macros still
present to allow extensions to compile correctly with v12.

Some code paths using the old routines still remained, so replace them.
Based on the discussion done, the consensus reached is that it is better
to remove those compatibility macros so as nothing new uses the old
routines, so remove also the compatibility macros.

Discussion: https://postgr.es/m/20191017014706.GF5605@paquier.xyz
2019-10-19 11:18:15 +09:00
Alvaro Herrera 89403ed228 Fix typo
Apparently while this code was being developed,
ReindexRelationConcurrently operated on multiple relations.  The version
that was ultimately pushed doesn't, so this comment's use of plural is
inaccurate.
2019-10-18 14:49:39 +02:00
Thomas Munro 6bda2af039 Fix bug that could try to freeze running multixacts.
Commits 801c2dc7 and 801c2dc7 made it possible for vacuum to
try to freeze a multixact that is still running.  That was
prevented by a check, but raised an error.  Repair.

Back-patch all the way.

Author: Nathan Bossart, Jeremy Schneider
Reported-by: Jeremy Schneider
Reviewed-by: Jim Nasby, Thomas Munro
Discussion: https://postgr.es/m/DAFB8AFF-2F05-4E33-AD7F-FF8B0F760C17%40amazon.com
2019-10-17 09:59:21 +13:00
Alvaro Herrera 0d21f919eb Fix crash when reporting CREATE INDEX progress
A race condition can make us try to dereference a NULL pointer to the
PGPROC struct of a process that's already finished.  That results in
crashes during REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY.

This was introduced in ab0dfc961b, so backpatch to pg12.

Reported by: Justin Pryzby
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/20191012004446.GT10470@telsasoft.com
2019-10-16 14:51:34 +02:00
Michael Paquier 1df5875d39 Fix dependency handling of column drop with partitioned tables
When dropping a column on a partitioned table which has one or more
partitioned indexes, the operation was failing as dependencies with
partitioned indexes using the column dropped were not getting removed in
a way consistent with the columns involved across all the relations part
of an inheritance tree.

This commit refactors the code executing column drop so as all the
columns from an inheritance tree to remove are gathered first, and
dropped all at the end.  This way, we let the dependency machinery sort
out by itself the deletion of all the columns with the partitioned
indexes across a partition tree.

This issue has been introduced by 1d92a0c, so backpatch down to
REL_12_STABLE.

Author: Amit Langote, Michael Paquier
Reviewed-by: Álvaro Herrera, Ashutosh Sharma
Discussion: https://postgr.es/m/CA+HiwqE9kuBsZ3b5pob2-cvE8ofzPWs-og+g8bKKGnu6b4-yTQ@mail.gmail.com
Backpatch-through: 12
2019-10-13 17:51:55 +09:00
Andres Freund 93765bd956 Fix table rewrites that include a column without a default.
In c2fe139c20 I made ATRewriteTable() use tuple slots. Unfortunately
I did not notice that columns can be added in a rewrite that do not
have a default, when another column is added/altered requiring one.

Initialize columns to NULL again, and add tests.

Bug: #16038
Reported-By: anonymous
Author: Andres Freund
Discussion: https://postgr.es/m/16038-5c974541f2bf6749@postgresql.org
Backpatch: 12, where the bug was introduced in c2fe139c20
2019-10-09 22:00:50 -07:00
Andres Freund d986d4e87f Fix crash caused by EPQ happening with a before update trigger present.
When ExecBRUpdateTriggers()'s GetTupleForTrigger() follows an EPQ
chain the former needs to run the result tuple through the junkfilter
again, and update the slot containing the new version of the tuple to
contain that new version. The input tuple may already be in the
junkfilter's output slot, which used to be OK - we don't need the
previous version anymore. Unfortunately ff11e7f4b9 started to use
ExecCopySlot() to update newslot, and ExecCopySlot() doesn't support
copying a slot into itself, leading to a slot in a corrupt
state, which then can cause crashes or other symptoms.

Fix this by skipping the ExecCopySlot() when copying into itself.

While we could have easily made ExecCopySlot() handle that case, it
seems better to add an assert forbidding doing so instead. As the goal
of copying might be to make the contents of one slot independent from
another, it seems failure prone to handle doing so silently.

A follow-up commit will add tests for the obviously under-covered
combination of EPQ and triggers. Done as a separate commit as it might
make sense to backpatch them further than this bug.

Also remove confusion with confusing variable names for slots in
ExecBRDeleteTriggers() and ExecBRUpdateTriggers().

Bug: #16036
Reported-By: Антон Власов
Author: Andres Freund
Discussion: https://postgr.es/m/16036-28184c90d952fb7f@postgresql.org
Backpatch: 12-, where ff11e7f4b9 was merged
2019-10-04 13:50:49 -07:00
Robert Haas 967e276e9f Remove AtSubStart_Notify.
Allocate notify-related state lazily instead. This makes trivial
subtransactions noticeably faster.

Patch by me, reviewed and tested by Dilip Kumar, Kyotaro Horiguchi,
and Jeevan Ladhe.

Discussion: https://postgr.es/m/CA+TgmobE1J22S1eC-6N-je9LgrcwZypkwp+zH6JXo9mc=4Nk3A@mail.gmail.com
2019-10-04 08:19:25 -04:00
Andres Freund 3f6b3be39c Silence -Wmaybe-uninitialized compiler warnings in dbcommands.c.
When compiling postgres using gcc -O3, there are false-positive
warnings about the now initialized variables. Silence them.

Author: Peter Eisentraut, Andres Freund
Discussion: https://postgr.es/m/15fb2350-b8b8-e188-278f-0b34fdee5210@2ndquadrant.com
2019-09-27 14:14:30 -07:00
Alvaro Herrera 773df883e8 Support reloptions of enum type
All our current in core relation options of type string (not many,
admittedly) behave in reality like enums.  But after seeing an
implementation for enum reloptions, it's clear that strings are messier,
so introduce the new reloption type.  Switch all string options to be
enums instead.

Fortunately we have a recently introduced test module for reloptions, so
we don't lose coverage of string reloptions, which may still be used by
third-party modules.

Authors: Nikolay Shaplov, Álvaro Herrera
Reviewed-by: Nikita Glukhov, Aleksandr Parfenov
Discussion: https://postgr.es/m/43332102.S2V5pIjXRx@x200m
2019-09-25 15:56:52 -03:00
Tom Lane 51004c7172 Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present.  This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date.  It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies.  We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.

Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly.  Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless.  This change can
substantially reduce signal traffic if listeners are spread among
many databases.  It won't help for the common case of only a single
active database, but the extra check costs very little.

Martijn van Oosterhout, with some adjustments by me

Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 11:46:29 -04:00
Alvaro Herrera d1b0007639 Fix progress report of REINDEX INDEX
I (Álvaro) broke that in commit 6212276e43 -- forgot to set the
necessary flag.  Repair.

Author: Amit Langote
Discussion: https://postgr.es/m/CA+HiwqEaM2tV5awKhP1vSbgjQe_uXVU15Oi4sTgwgempwMiT8g@mail.gmail.com
2019-09-20 12:56:00 -03:00
Alvaro Herrera 6212276e43 Fix progress reporting of CLUSTER / VACUUM FULL
The progress state was being clobbered once the first index completed
being rebuilt, causing the final phases of the operation not show
anything in the progress view.  This was inadvertently broken in
03f9e5cba0, which added progress tracking for REINDEX.

(The reason this bugfix is this small is that I had already noticed this
problem when writing monitoring for CREATE INDEX, and had already worked
around it, as can be seen in discussion starting at
https://postgr.es/m/20190329150218.GA25010@alvherre.pgsql Fixing the
problem is just a matter of fixing one place touched by the REINDEX
monitoring.)

Reported by: Álvaro Herrera
Author: Álvaro Herrera
Discussion: https://postgr.es/m/20190801184333.GA21369@alvherre.pgsql
2019-09-13 14:54:26 -03:00
Tomas Vondra d06215d03b Allow setting statistics target for extended statistics
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.

That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.

So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).

  ALTER STATISTICS stat_name SET STATISTICS target_value;

When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.

Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
2019-09-11 00:25:51 +02:00
Tom Lane bca6e64354 Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends.  That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.

A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends.  We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.

In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends.  If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.

Martijn van Oosterhout, hacked a bit more by me

Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-10 18:15:20 -04:00
Andres Freund 27cc7cd2bc Reorder EPQ work, to fix rowmark related bugs and improve efficiency.
In ad0bda5d24 I changed the EvalPlanQual machinery to store
substitution tuples in slot, instead of using plain HeapTuples. The
main motivation for that was that using HeapTuples will be inefficient
for future tableams.  But it turns out that that conversion was buggy
for non-locking rowmarks - the wrong tuple descriptor was used to
create the slot.

As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ
earlier, to allow to fetch the locked rows directly into the EPQ
slots, instead of having to copy tuples around. Unfortunately, as Tom
complained, that forces some expensive initialization to happen
earlier.

As a third issue, the test coverage for EPQ was clearly insufficient.

Fixing the first issue is unfortunately not trivial: Non-locked row
marks were fetched at the start of EPQ, and we don't have the type
information for the rowmarks available at that point. While we could
change that, it's not easy. It might be worthwhile to change that at
some point, but to fix this bug, it seems better to delay fetching
non-locking rowmarks when they're actually needed, rather than
eagerly. They're referenced at most once, and in cases where EPQ
fails, might never be referenced. Fetching them when needed also
increases locality a bit.

To be able to fetch rowmarks during execution, rather than
initialization, we need to be able to access the active EPQState, as
that contains necessary data. To do so move EPQ related data from
EState to EPQState, and, only for EStates creates as part of EPQ,
reference the associated EPQState from EState.

To fix the second issue, change EPQ initialization to allow use of
EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but
obviously still requiring EvalPlanQualInit() to have been done).

As these changes made struct EState harder to understand, e.g. by
adding multiple EStates, significantly reorder the members, and add a
lot more comments.

Also add a few more EPQ tests, including one that fails for the first
issue above. More is needed.

Reported-By: yi huang
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion:
    https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.com
    https://postgr.es/m/24530.1562686693@sss.pgh.pa.us
Backpatch: 12-, where the EPQ changes were introduced
2019-09-09 05:14:11 -07:00
Tom Lane db43831899 Avoid using INFO elevel for what are fundamentally debug messages.
Commit 6f6b99d13 stuck an INFO message into the fast path for
checking partition constraints, for no very good reason except
that it made it easy for the regression tests to verify that
that path was taken.  Assorted later patches did likewise,
increasing the unsuppressable-chatter level from ALTER TABLE
even more.  This isn't good for the user experience, so let's
drop these messages down to DEBUG1 where they belong.  So as
not to have a loss of test coverage, create a TAP test that
runs the relevant queries with client_min_messages = DEBUG1
and greps for the expected messages.

This testing method is a bit brute-force --- in particular,
it duplicates the execution of a fair amount of the core
create_table and alter_table tests.  We experimented with
other solutions, but running any significant amount of
standard testing with client_min_messages = DEBUG1 seems
to have a lot of output-stability pitfalls, cf commits
bbb96c370 and 5655565c0.  Possibly at some point we'll look
into whether we can reduce the amount of test duplication.

Backpatch into v12, because some of these messages are new
in v12 and we don't really want to ship it that way.

Sergei Kornilov

Discussion: https://postgr.es/m/81911511895540@web58j.yandex.ru
Discussion: https://postgr.es/m/4859321552643736@myt5-02b80404fd9e.qloud-c.yandex.net
2019-09-07 19:03:11 -04:00
Robert Haas 8b94dab066 Split tuptoaster.c into three separate files.
detoast.c/h contain functions required to detoast a datum, partially
or completely, plus a few other utility functions for examining the
size of toasted datums.

toast_internals.c/h contain functions that are used internally to the
TOAST subsystem but which (mostly) do not need to be accessed from
outside.

heaptoast.c/h contains code that is intrinsically specific to the
heap AM, either because it operates on HeapTuples or is based on the
layout of a heap page.

detoast.c and toast_internals.c are placed in
src/backend/access/common rather than src/backend/access/heap.  At
present, both files still have dependencies on the heap, but that will
be improved in a future commit.

Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro,
Andres Freund, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
2019-09-05 13:15:10 -04:00
Alvaro Herrera fe66125974 Remove 'msg' parameter from convert_tuples_by_name
The message was included as a parameter when this function was added in
dcb2bda9b7, but I don't think it has ever served any useful purpose.
Let's stop spreading it pointlessly.

Reviewed by Amit Langote and Peter Eisentraut.

Discussion: https://postgr.es/m/20190806224728.GA17233@alvherre.pgsql
2019-09-03 14:47:29 -04:00
Michael Paquier c96581abe4 Fix inconsistencies and typos in the tree, take 11
This fixes various typos in docs and comments, and removes some orphaned
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
2019-08-19 16:21:39 +09:00
Tom Lane 4d4c66addf Disallow changing an inherited column's type if not all parents changed.
If a table inherits from multiple unrelated parents, we must disallow
changing the type of a column inherited from multiple such parents, else
it would be out of step with the other parents.  However, it's possible
for the column to ultimately be inherited from just one common ancestor,
in which case a change starting from that ancestor should still be
allowed.  (I would not be excited about preserving that option, were
it not that we have regression test cases exercising it already ...)

It's slightly annoying that this patch looks different from the logic
with the same end goal in renameatt(), and more annoying that it
requires an extra syscache lookup to make the test.  However, the
recursion logic is quite different in the two functions, and a
back-patched bug fix is no place to be trying to unify them.

Per report from Manuel Rigger.  Back-patch to 9.5.  The bug exists in
9.4 too (and doubtless much further back); but the way the recursion
is done in 9.4 is a good bit different, so that substantial refactoring
would be needed to fix it in 9.4.  I'm disinclined to do that, or risk
introducing new bugs, for a bug that has escaped notice for this long.

Discussion: https://postgr.es/m/CA+u7OA4qogDv9rz1HAb-ADxttXYPqQdUdPY_yd4kCzywNxRQXA@mail.gmail.com
2019-08-18 17:11:57 -04:00
Tom Lane bb5ae8f6c4 Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event.  This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events.  We can improve matters by
using a hash table to probe for duplicates.  So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.

A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)

We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions.  Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.

While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three.  This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.

Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.

Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 12:22:12 -04:00
Michael Paquier 66bde49d96 Fix inconsistencies and typos in the tree, take 10
This addresses some issues with unnecessary code comments, fixes various
typos in docs and comments, and removes some orphaned structures and
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
2019-08-13 13:53:41 +09:00
Tom Lane 5ee190f8ec Rationalize use of list_concat + list_copy combinations.
In the wake of commit 1cff1b95a, the result of list_concat no longer
shares the ListCells of the second input.  Therefore, we can replace
"list_concat(x, list_copy(y))" with just "list_concat(x, y)".

To improve call sites that were list_copy'ing the first argument,
or both arguments, invent "list_concat_copy()" which produces a new
list sharing no ListCells with either input.  (This is a bit faster
than "list_concat(list_copy(x), y)" because it makes the result list
the right size to start with.)

In call sites that were not list_copy'ing the second argument, the new
semantics mean that we are usually leaking the second List's storage,
since typically there is no remaining pointer to it.  We considered
inventing another list_copy variant that would list_free the second
input, but concluded that for most call sites it isn't worth worrying
about, given the relative compactness of the new List representation.
(Note that in cases where such leakage would happen, the old code
already leaked the second List's header; so we're only discussing
the size of the leak not whether there is one.  I did adjust two or
three places that had been troubling to free that header so that
they manually free the whole second List.)

Patch by me; thanks to David Rowley for review.

Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-08-12 11:20:18 -04:00
Tom Lane cabe0f298e Fix "ANALYZE t, t" inside a transaction block.
This failed with either "tuple already updated by self" or "duplicate
key value violates unique constraint", depending on whether the table
had previously been analyzed or not.  The reason is that ANALYZE tried
to insert or update the same pg_statistic rows twice, and there was no
CommandCounterIncrement between.  So add one.  The same case works fine
outside a transaction block, because then there's a whole transaction
boundary between, as a consequence of the way VACUUM works.

This issue has been latent all along, but the problem was unreachable
before commit 11d8d72c2 added the ability to specify multiple tables
in ANALYZE.  We could, perhaps, alternatively fix it by adding code to
de-duplicate the list of VacuumRelations --- but that would add a
lot of overhead to work around dumb commands, so it's not attractive.

Per bug #15946 from Yaroslav Schekin.  Back-patch to v11.

(Note: in v11 I also back-patched the test added by commit 23224563d;
otherwise the problem doesn't manifest in the test I added, because
"vactst" is empty when the tests for multiple ANALYZE targets are
reached.  That seems like not a very good thing anyway, so I did this
rather than rethinking the choice of test case.)

Discussion: https://postgr.es/m/15946-5c7570a2884a26cf@postgresql.org
2019-08-10 11:30:11 -04:00
Michael Paquier 8548ddc61b Fix inconsistencies and typos in the tree, take 9
This addresses more issues with code comments, variable names and
unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
2019-08-05 12:14:58 +09:00
Andres Freund 870b1d6800 Remove superfluous newlines in function prototypes.
These were introduced by pgindent due to fixe to broken
indentation (c.f. 8255c7a5ee). Previously the mis-indentation of
function prototypes was creatively used to reduce indentation in a few
places.

As that formatting only exists in master and REL_12_STABLE, it seems
better to fix it in both, rather than having some odd indentation in
v12 that somebody might copy for future patches or such.

Author: Andres Freund
Discussion: https://postgr.es/m/20190728013754.jwcbe5nfyt3533vx@alap3.anarazel.de
Backpatch: 12-
2019-07-31 00:05:21 -07:00
Tomas Vondra 14ef15a222 Don't build extended statistics on inheritance trees
When performing ANALYZE on inheritance trees, we collect two samples for
each relation - one for the relation alone, and one for the inheritance
subtree (relation and its child relations). And then we build statistics
on each sample, so for each relation we get two sets of statistics.

For regular (per-column) statistics this works fine, because the catalog
includes a flag differentiating statistics built from those two samples.
But we don't have such flag in the extended statistics catalogs, and we
ended up updating the same row twice, triggering this error:

  ERROR:  tuple already updated by self

The simplest solution is to disable extended statistics on inheritance
trees, which is what this commit is doing. In the future we may need to
do something similar to per-column statistics, but that requires adding a
flag to the catalog - and that's not backpatchable. Moreover, the current
selectivity estimation code only works with individual relations, so
building statistics on inheritance trees would be pointless anyway.

Author: Tomas Vondra
Backpatch-to: 10-
Discussion: https://postgr.es/m/20190618231233.GA27470@telsasoft.com
Reported-by: Justin Pryzby
2019-07-30 19:47:33 +02:00
Michael Paquier eb43f3d193 Fix inconsistencies and typos in the tree
This is numbered take 8, and addresses again a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/b137b5eb-9c95-9c2f-586e-38aba7d59788@gmail.com
2019-07-29 12:28:30 +09:00
Michael Paquier 7cce159349 Fix handling of expressions and predicates in REINDEX CONCURRENTLY
When copying the definition of an index rebuilt concurrently for the new
entry, the index information was taken directly from the old index using
the relation cache.  In this case, predicates and expressions have
some post-processing to prepare things for the planner, which loses some
information including the collations added in any of them.

This inconsistency can cause issues when attempting for example a table
rewrite, and makes the new indexes rebuilt concurrently inconsistent
with the old entries.

In order to fix the problem, fetch expressions and predicates directly
from the catalog of the old entry, and fill in IndexInfo for the new
index with that.  This makes the process more consistent with
DefineIndex(), and the code is refactored with the addition of a routine
to create an IndexInfo node.

Reported-by: Manuel Rigger
Author: Michael Paquier
Discussion: https://postgr.es/m/CA+u7OA5Hp0ra235F3czPom_FyAd-3+XwSJmX95r1+sRPOJc9VQ@mail.gmail.com
Backpatch-through: 12
2019-07-29 09:58:49 +09:00
Alvaro Herrera 5562272a42 Check that partitions are not in use when dropping constraints
If the user creates a deferred constraint in a partition, and in a
transaction they cause the constraint's trigger execution to be deferred
until commit time *and* drop the constraint, then when commit time comes
the queued trigger will fail to run because the trigger object will have
been dropped.

This is explained because when a constraint gets dropped in a
partitioned table, the recursion to drop the ones in partitions is done
by the dependency mechanism, not by ALTER TABLE traversing the recursion
tree as in all other cases.  In the non-partitioned case, this problem
is avoided by checking that the table is not "in use" by alter-table;
other alter-table subcommands that recurse to partitions do that check
for each partition.  But the dependency mechanism doesn't have a way to
do that.  Fix the problem by applying the same check to all partitions
during ALTER TABLE's "prep" phase, which correctly raises the necessary
error.

Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Discussion: https://postgr.es/m/CAKcux6nZiO9-eEpr1ZD84bT1mBoVmeZkfont8iSpcmYrjhGWgA@mail.gmail.com
2019-07-23 17:22:15 -04:00
Peter Eisentraut 06140c201b Add CREATE DATABASE LOCALE option
This sets both LC_COLLATE and LC_CTYPE with one option.  Similar
behavior is already supported in initdb, CREATE COLLATION, and
createdb.

Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
Discussion: https://www.postgresql.org/message-id/flat/d9d5043a-dc70-da8a-0166-1e218e6e34d4%402ndquadrant.com
2019-07-23 14:47:24 +02:00
Tom Lane a0555ddab9 Install dependencies to prevent dropping partition key columns.
The logic in ATExecDropColumn that rejects dropping partition key
columns is quite an inadequate defense, because it doesn't execute
in cases where a column needs to be dropped due to cascade from
something that only the column, not the whole partitioned table,
depends on.  That leaves us with a badly broken partitioned table;
even an attempt to load its relcache entry will fail.

We really need to have explicit pg_depend entries that show that the
column can't be dropped without dropping the whole table.  Hence,
add those entries.  In v12 and HEAD, bump catversion to ensure that
partitioned tables will have such entries.  We can't do that in
released branches of course, so in v10 and v11 this patch affords
protection only to partitioned tables created after the patch is
installed.  Given the lack of field complaints (this bug was found
by fuzz-testing not by end users), that's probably good enough.

In passing, fix ATExecDropColumn and ATPrepAlterColumnType
messages to be more specific about which partition key column
they're complaining about.

Per report from Manuel Rigger.  Back-patch to v10 where partitioned
tables were added.

Discussion: https://postgr.es/m/CA+u7OA4JKCPFrdrAbOs7XBiCyD61XJxeNav4LefkSmBLQ-Vobg@mail.gmail.com
Discussion: https://postgr.es/m/31920.1562526703@sss.pgh.pa.us
2019-07-22 14:55:40 -04:00
Peter Eisentraut 19781729f7 Make identity sequence management more robust
Some code could get confused when certain catalog state involving both
identity and serial sequences was present, perhaps during an attempt
to upgrade the latter to the former.  Specifically, dropping the
default of a serial column maintains the ownership of the sequence by
the column, and so it would then be possible to afterwards make the
column an identity column that would now own two sequences.  This
causes the code that looks up the identity sequence to error out,
making the new identity column inoperable until the ownership of the
previous sequence is released.

To fix this, make the identity sequence lookup only consider sequences
with the appropriate dependency type for an identity sequence, so it
only ever finds one (unless something else is broken).  In the above
example, the old serial sequence would then be ignored.  Reorganize
the various owned-sequence-lookup functions a bit to make this
clearer.

Reported-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://www.postgresql.org/message-id/flat/470c54fc8590be4de0f41b0d295fd6390d5e8a6c.camel@cybertec.at
2019-07-22 12:07:10 +02:00
Michael Paquier 23bccc823d Fix inconsistencies and typos in the tree
This is numbered take 7, and addresses a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/dff75442-2468-f74f-568c-6006e141062f@gmail.com
2019-07-22 10:01:50 +09:00
Tomas Vondra a63378a03e Use column collation for extended statistics
The current extended statistics code was a bit confused which collation
to use.  When building the statistics, the collations defined as default
for the data types were used (since commit 5e0928005).  The MCV code was
however using the column collations for MCV serialization, and then
DEFAULT_COLLATION_OID when computing estimates. So overall the code was
using all three possible options, inconsistently.

This uses the column colation everywhere - this makes it consistent with
what 5e0928005 did for regular stats.  We however do not track the
collations in a catalog, because we can derive them from column-level
information.  This may need to change in the future, e.g. after allowing
statistics on expressions.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
Backpatch-to: 12
2019-07-20 16:37:37 +02:00
Tom Lane d97b714a21 Avoid using lcons and list_delete_first where it's easy to do so.
Formerly, lcons was about the same speed as lappend, but with the new
List implementation, that's not so; with a long List, data movement
imposes an O(N) cost on lcons and list_delete_first, but not lappend.

Hence, invent list_delete_last with semantics parallel to
list_delete_first (but O(1) cost), and change various places to use
lappend and list_delete_last where this can be done without much
violence to the code logic.

There are quite a few places that construct result lists using lcons not
lappend.  Some have semantic rationales for that; I added comments about
it to a couple that didn't have them already.  In many such places though,
I think the coding is that way only because back in the dark ages lcons
was faster than lappend.  Hence, switch to lappend where this can be done
without causing semantic changes.

In ExecInitExprRec(), this results in aggregates and window functions that
are in the same plan node being executed in a different order than before.
Generally, the executions of such functions ought to be independent of
each other, so this shouldn't result in visibly different query results.
But if you push it, as one regression test case does, you can show that
the order is different.  The new order seems saner; it's closer to
the order of the functions in the query text.  And we never documented
or promised anything about this, anyway.

Also, in gistfinishsplit(), don't bother building a reverse-order list;
it's easy now to iterate backwards through the original list.

It'd be possible to go further towards removing uses of lcons and
list_delete_first, but it'd require more extensive logic changes,
and I'm not convinced it's worth it.  Most of the remaining uses
deal with queues that probably never get long enough to be worth
sweating over.  (Actually, I doubt that any of the changes in this
patch will have measurable performance effects either.  But better
to have good examples than bad ones in the code base.)

Patch by me, thanks to David Rowley and Daniel Gustafsson for review.

Discussion: https://postgr.es/m/21272.1563318411@sss.pgh.pa.us
2019-07-17 11:15:34 -04:00
Tom Lane 3093eb2b83 Fix thinko in construction of old_conpfeqop list.
This should lappend the OIDs, not lcons them; the existing code produced
a list in reversed order.  This is harmless for single-key FKs or FKs
where all the key columns are of the same type, which probably explains
how it went unnoticed.  But if those conditions are not met,
ATAddForeignKeyConstraint would make the wrong decision about whether an
existing FK needs to be revalidated.  I think it would almost always err
in the safe direction by revalidating a constraint that didn't need it.
You could imagine scenarios where the pfeqop check was fooled by
swapping the types of two FK columns in one ALTER TABLE, but that case
would probably be rejected by other tests, so it might be impossible to
get to the worst-case scenario where an FK should be revalidated and
isn't.  (And even then, it's likely to be fine, unless there are weird
inconsistencies in the equality behavior of the replacement types.)
However, this is a performance bug at least.

Noted while poking around to see whether lcons calls could be converted
to lappend.

This bug is old, dating to commit cb3a7c2b9, so back-patch to all
supported branches.
2019-07-16 18:17:47 -04:00
Tom Lane 4c3d05d875 Remove dead code.
These memory context switches are useless in the wake of commit
1cff1b95a.  Noted by Jesper Pedersen.

Discussion: https://postgr.es/m/f078ce63-9e04-0f3e-d200-d7ee66279abe@redhat.com
2019-07-15 23:27:13 -04:00
Tom Lane 1cff1b95ab Represent Lists as expansible arrays, not chains of cons-cells.
Originally, Postgres Lists were a more or less exact reimplementation of
Lisp lists, which consist of chains of separately-allocated cons cells,
each having a value and a next-cell link.  We'd hacked that once before
(commit d0b4399d8) to add a separate List header, but the data was still
in cons cells.  That makes some operations -- notably list_nth() -- O(N),
and it's bulky because of the next-cell pointers and per-cell palloc
overhead, and it's very cache-unfriendly if the cons cells end up
scattered around rather than being adjacent.

In this rewrite, we still have List headers, but the data is in a
resizable array of values, with no next-cell links.  Now we need at
most two palloc's per List, and often only one, since we can allocate
some values in the same palloc call as the List header.  (Of course,
extending an existing List may require repalloc's to enlarge the array.
But this involves just O(log N) allocations not O(N).)

Of course this is not without downsides.  The key difficulty is that
addition or deletion of a list entry may now cause other entries to
move, which it did not before.

For example, that breaks foreach() and sister macros, which historically
used a pointer to the current cons-cell as loop state.  We can repair
those macros transparently by making their actual loop state be an
integer list index; the exposed "ListCell *" pointer is no longer state
carried across loop iterations, but is just a derived value.  (In
practice, modern compilers can optimize things back to having just one
loop state value, at least for simple cases with inline loop bodies.)
In principle, this is a semantics change for cases where the loop body
inserts or deletes list entries ahead of the current loop index; but
I found no such cases in the Postgres code.

The change is not at all transparent for code that doesn't use foreach()
but chases lists "by hand" using lnext().  The largest share of such
code in the backend is in loops that were maintaining "prev" and "next"
variables in addition to the current-cell pointer, in order to delete
list cells efficiently using list_delete_cell().  However, we no longer
need a previous-cell pointer to delete a list cell efficiently.  Keeping
a next-cell pointer doesn't work, as explained above, but we can improve
matters by changing such code to use a regular foreach() loop and then
using the new macro foreach_delete_current() to delete the current cell.
(This macro knows how to update the associated foreach loop's state so
that no cells will be missed in the traversal.)

There remains a nontrivial risk of code assuming that a ListCell *
pointer will remain good over an operation that could now move the list
contents.  To help catch such errors, list.c can be compiled with a new
define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents
whenever that could possibly happen.  This makes list operations
significantly more expensive so it's not normally turned on (though it
is on by default if USE_VALGRIND is on).

There are two notable API differences from the previous code:

* lnext() now requires the List's header pointer in addition to the
current cell's address.

* list_delete_cell() no longer requires a previous-cell argument.

These changes are somewhat unfortunate, but on the other hand code using
either function needs inspection to see if it is assuming anything
it shouldn't, so it's not all bad.

Programmers should be aware of these significant performance changes:

* list_nth() and related functions are now O(1); so there's no
major access-speed difference between a list and an array.

* Inserting or deleting a list element now takes time proportional to
the distance to the end of the list, due to moving the array elements.
(However, it typically *doesn't* require palloc or pfree, so except in
long lists it's probably still faster than before.)  Notably, lcons()
used to be about the same cost as lappend(), but that's no longer true
if the list is long.  Code that uses lcons() and list_delete_first()
to maintain a stack might usefully be rewritten to push and pop at the
end of the list rather than the beginning.

* There are now list_insert_nth...() and list_delete_nth...() functions
that add or remove a list cell identified by index.  These have the
data-movement penalty explained above, but there's no search penalty.

* list_concat() and variants now copy the second list's data into
storage belonging to the first list, so there is no longer any
sharing of cells between the input lists.  The second argument is
now declared "const List *" to reflect that it isn't changed.

This patch just does the minimum needed to get the new implementation
in place and fix bugs exposed by the regression tests.  As suggested
by the foregoing, there's a fair amount of followup work remaining to
do.

Also, the ENABLE_LIST_COMPAT macros are finally removed in this
commit.  Code using those should have been gone a dozen years ago.

Patch by me; thanks to David Rowley, Jesper Pedersen, and others
for review.

Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-07-15 13:41:58 -04:00
Thomas Munro b31fbe852c Warn if wal_level is too low when creating a publication.
Provide a hint to users that they need to increase wal_level before
subscriptions can work.

Author: Lucas Viecelli, with some adjustments by Thomas Munro
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAPjy-57rn5Y9g4e5u--eSOP-7P4QrE9uOZmT2ZcUebF8qxsYhg%40mail.gmail.com
2019-07-13 10:35:34 +12:00
Tom Lane b5810de3f4 Reduce memory consumption for multi-statement query strings.
Previously, exec_simple_query always ran parse analysis, rewrite, and
planning in MessageContext, allowing all the data generated thereby
to persist until the end of processing of the whole query string.
That's fine for single-command strings, but if a client sends many
commands in a single simple-Query message, this strategy could result
in annoying memory bloat, as complained of by Andreas Seltenreich.

To fix, create a child context to do this work in, and reclaim it
after each command.  But we only do so for parsetrees that are not
last in their query string.  That avoids adding any memory management
overhead for the typical case of a single-command string.  Memory
allocated for the last parsetree would be freed immediately after
finishing the command string anyway.

Similarly, adjust extension.c's execute_sql_string() to reclaim memory
after each command.  In that usage, multi-command strings are the norm,
so it's a bit surprising that no one has yet complained of bloat ---
especially since the bloat extended to whatever data ProcessUtility
execution might leak.

Amit Langote, reviewed by Julien Rouhaud

Discussion: https://postgr.es/m/87ftp6l2qr.fsf@credativ.de
2019-07-10 14:32:38 -04:00
David Rowley f7c830f1ab Fix missing calls to table_finish_bulk_insert during COPY, take 2
86b85044e abstracted calls to heap functions in COPY FROM to support a
generic table AM.  However, when performing a copy into a partitioned
table, this commit neglected to call table_finish_bulk_insert for each
partition.  Before 86b85044e, when we always called the heap functions,
there was no need to call heapam_finish_bulk_insert for partitions since
it only did any work when performing a copy without WAL.  For partitioned
tables, this was unsupported anyway, so there was no issue.  With
pluggable storage, we can't make any assumptions about what the table AM
might want to do in its equivalent function, so we'd better ensure we
always call table_finish_bulk_insert each partition that's received a row.

For now, we make the table_finish_bulk_insert call whenever we evict a
CopyMultiInsertBuffer out of the CopyMultiInsertInfo.  This does mean
that it's possible that we call table_finish_bulk_insert multiple times
per partition, which is not a problem other than being an inefficiency.
Improving this requires a more invasive patch, so let's leave that for
another day.

This also changes things so that we no longer needlessly call
table_finish_bulk_insert when performing a COPY FROM for a non-partitioned
table when not using multi-inserts.

Reported-by: Robert Haas
Backpatch-through: 12
Discussion: https://postgr.es/m/CA+TgmoYK=6BpxiJ0tN-p9wtH0BTAfbdxzHhwou0mdud4+BkYuQ@mail.gmail.com
2019-07-10 16:03:04 +12:00
Alvaro Herrera 2c84ea6cf9 Propagate trigger arguments to partitions
We were creating the cloned triggers with an empty list of arguments,
losing the ones that had been specified by the user when creating the
trigger in the partitioned table.  Repair.

This was forgotten in commit 86f575948c.

Author: Patrick McHardy
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20190709130027.amr2cavjvo7rdvac@access1.trash.net
Discussion: https://postgr.es/m/15752-123bc90287986de4@postgresql.org
2019-07-09 17:16:36 -04:00
Tom Lane 0ab1a2e39b Remove dead encoding-conversion functions.
The code for conversions SQL_ASCII <-> MULE_INTERNAL and
SQL_ASCII <-> UTF8 was unreachable, because we long ago changed
the wrapper functions pg_do_encoding_conversion() et al so that
they have hard-wired behaviors for conversions involving SQL_ASCII.
(At least some of those fast paths date back to 2002, though it
looks like we may not have been totally consistent about this until
later.)  Given the lack of complaints, nobody is dissatisfied with
this state of affairs.  Hence, let's just remove the unreachable code.

Also, change CREATE CONVERSION so that it rejects attempts to
define such conversions.  Since we consider that SQL_ASCII represents
lack of knowledge about the encoding in use, such a conversion would
be semantically dubious even if it were reachable.

Adjust a couple of regression test cases that had randomly decided
to rely on these conversion functions rather than any other ones.

Discussion: https://postgr.es/m/41163.1559156593@sss.pgh.pa.us
2019-07-05 14:17:27 -04:00
David Rowley 8abc13a889 Use appendStringInfoString and appendPQExpBufferStr where possible
This changes various places where appendPQExpBuffer was used in places
where it was possible to use appendPQExpBufferStr, and likewise for
appendStringInfo and appendStringInfoString.  This is really just a
stylistic improvement, but there are also small performance gains to be
had from doing this.

Discussion: http://postgr.es/m/CAKJS1f9P=M-3ULmPvr8iCno8yvfDViHibJjpriHU8+SXUgeZ=w@mail.gmail.com
2019-07-04 13:01:13 +12:00
Tom Lane 9e1c9f9594 pgindent run prior to branching v12.
pgperltidy and reformat-dat-files too, though the latter didn't
find anything to change.
2019-07-01 12:37:52 -04:00
David Rowley f5db56fc4d Revert fix missing call to table_finish_bulk_insert during COPY
This reverts commits 4de60244e and b2d69806d. Further thought is
required to make this work properly.
2019-07-02 03:44:56 +12:00
David Rowley b2d69806d8 Remove surplus call to table_finish_bulk_insert
4de60244e added the call to table_finish_bulk_insert to the
CopyMultiInsertBufferCleanup function.  We use a CopyMultiInsertBuffer even
for non-partitioned tables, so having the cleanup do that meant we would
call table_finsh_bulk_insert twice when performing COPY FROM with
a non-partitioned table.

Here we can just remove the direct call in CopyFrom and let
CopyMultiInsertBufferCleanup handle the call instead.
2019-07-02 03:07:15 +12:00
David Rowley 4de60244e2 Fix missing call to table_finish_bulk_insert during COPY
86b85044e abstracted calls to heap functions in COPY FROM to support a
generic table AM.  However, when performing a copy into a partitioned
table, this commit neglected to call table_finish_bulk_insert for each
partition.  Before 86b85044e, when we always called the heap functions,
there was no need to call heapam_finish_bulk_insert for partitions since
it only did any work when performing a copy without WAL. For partitioned
tables, this was unsupported anyway, so there was no issue. With pluggable
storage, we can't make any assumptions about what the table AM might want
to do in its equivalent function, so we'd better ensure we always call
table_finish_bulk_insert each partition that's received a row.

For now, we make the table_finish_bulk_insert call whenever we evict a
CopyMultiInsertBuffer out of the CopyMultiInsertInfo.  This does mean
that it's possible that we call table_finish_bulk_insert multiple times
per partition, which is not a problem other than being an inefficiency.
Improving this requires a more invasive patch, so let's leave that for
another day.

In passing, move the table_finish_bulk_insert for the target of the COPY
command so that it's only called when we're actually performing bulk
inserts.  We don't need to call this when inserting 1 row at a time.

Reported-by: Robert Haas
Discussion: https://postgr.es/m/CA+TgmoYK=6BpxiJ0tN-p9wtH0BTAfbdxzHhwou0mdud4+BkYuQ@mail.gmail.com
2019-07-02 01:23:26 +12:00
Tom Lane 54100f5c60 Add an enforcement mechanism for global object names in regression tests.
In commit 18555b132 we tentatively established a rule that regression
tests should use names containing "regression" for databases, and names
starting with "regress_" for all other globally-visible object names, so
as to circumscribe the side-effects that "make installcheck" could have
on an existing installation.

This commit adds a simple enforcement mechanism for that rule: if the code
is compiled with ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS defined, it
will emit a warning (not an error) whenever a database, role, tablespace,
subscription, or replication origin name is created that doesn't obey the
rule.  Running one or more buildfarm members with that symbol defined
should be enough to catch new violations, at least in the regular
regression tests.  Most TAP tests wouldn't notice such warnings, but
that's actually fine because TAP tests don't execute against an existing
server anyway.

Since it's already the case that running src/test/modules/ tests in
installcheck mode is deprecated, we can use that as a home for tests
that seem unsafe to run against an existing server, such as tests that
might have side-effects on existing roles.  Document that (though this
commit doesn't in itself make it any less safe than before).

Update regress.sgml to define these restrictions more clearly, and
to clean up assorted lack-of-up-to-date-ness in its descriptions of
the available regression tests.

Discussion: https://postgr.es/m/16638.1468620817@sss.pgh.pa.us
2019-06-29 11:34:00 -04:00
Alvaro Herrera 23cccb17fe Fix for dropped columns in a partitioned table's default partition
We forgot to map column numbers to/from the default partition for
various operations, leading to valid cases failing with spurious
errors, such as
ERROR:  attribute N of type some_partition has been dropped

It was also possible that the search for conflicting rows in the default
partition when attaching another partition would fail to detect some.
Secondarily, it was also possible that such a search should be skipped
(because the constraint was implied) but wasn't.

Fix all this by mapping column numbers when necessary.

Reported by: Daniel Wilches
Author: Amit Langote
Discussion: https://postgr.es/m/15873-8c61945d6b3ef87c@postgresql.org
2019-06-28 14:51:08 -04:00
Tomas Vondra 69fd82fedd Update reference to sampling algorithm in analyze.c
Commit 83e176ec1 moved row sampling functions from analyze.c to
utils/misc/sampling.c, but failed to update comment referring to
the sampling algorithm from Jeff Vitter's paper. Correct the
comment by pointing to utils/misc/sampling.c.

Author: Etsuro Fujita
Discussion: https://postgr.es/m/CAPmGK154gp%2BQd%3DcorQOv%2BPmbyVyZBjp_%2Bhb766UJeD1e_ie6XQ%40mail.gmail.com
2019-06-27 18:01:54 +02:00
Alvaro Herrera 050098b14e Fix use-after-free introduced in 55ed3defc9
Evidenced by failure under RELCACHE_FORCE_RELEASE (buildfarm member
prion).

Author: Amit Langote
Discussion: https://postgr.es/m/CA+HiwqGV=k_Eh4jBiQw66ivvdG+EUkrEYeHTYL1SvDj_YOYV0g@mail.gmail.com
2019-06-27 11:57:10 -04:00
Alvaro Herrera 55ed3defc9 Fix partitioned index creation with foreign partitions
When a partitioned tables contains foreign tables as partitions, it is
not possible to implement unique or primary key indexes -- but when
regular indexes are created, there is no reason to do anything other
than ignoring such partitions.  We were raising errors upon encountering
the foreign partitions, which is unfriendly and doesn't protect against
any actual problems.

Relax this restriction so that index creation is allowed on partitioned
tables containing foreign partitions, becoming a no-op on them.  (We may
later want to redefine this so that the FDW is told to create the
indexes on the foreign side.)  This applies to CREATE INDEX, as well as
ALTER TABLE / ATTACH PARTITION and CREATE TABLE / PARTITION OF.

Backpatch to 11, where indexes on partitioned tables were introduced.

Discussion: https://postgr.es/m/15724-d5a58fa9472eef4f@postgresql.org
Author: Álvaro Herrera
Reviewed-by: Amit Langote
2019-06-26 18:38:51 -04:00
Tom Lane ccfcc8fdbd Purely-cosmetic adjustments in tablecmds.c.
Move ATExecAlterColumnGenericOptions away from where it was unthinkingly
dropped, in the middle of a lot of ALTER COLUMN TYPE code.  I don't have
any high principles about where to put it instead, so let's just put it
after ALTER COLUMN TYPE and before ALTER OWNER, matching existing
decisions about how to order related code stanzas.

Also add the minimal function header comment that the original author
was too cool to bother with.

Along the way, upgrade header comments for nearby ALTER COLUMN TYPE
functions.

Discussion: https://postgr.es/m/14787.1561403130@sss.pgh.pa.us
2019-06-24 17:19:37 -04:00
Tom Lane f946a40914 Further fix ALTER COLUMN TYPE's handling of indexes and index constraints.
This patch reverts all the code changes of commit e76de8861, which turns
out to have been seriously misguided.  We can't wait till later to compute
the definition string for an index; we must capture that before applying
the data type change for any column it depends on, else ruleutils.c will
deliverr wrong/misleading results.  (This fine point was documented
nowhere, of course.)

I'd also managed to forget that ATExecAlterColumnType executes once per
ALTER COLUMN TYPE clause, not once per statement; which resulted in the
code being basically completely broken for any case in which multiple ALTER
COLUMN TYPE clauses are applied to a table having non-constraint indexes
that must be rebuilt.  Through very bad luck, none of the existing test
cases nor the ones added by e76de8861 caught that, but of course it was
soon found in the field.

The previous patch also had an implicit assumption that if a constraint's
index had a dependency on a table column, so would the constraint --- but
that isn't actually true, so it didn't fix such cases.

Instead of trying to delete unneeded index dependencies later, do the
is-there-a-constraint lookup immediately on seeing an index dependency,
and switch to remembering the constraint if so.  In the unusual case of
multiple column dependencies for a constraint index, this will result in
duplicate constraint lookups, but that's not that horrible compared to all
the other work that happens here.  Besides, such cases did not work at all
before, so it's hard to argue that they're performance-critical for anyone.

Per bug #15865 from Keith Fiske.  As before, back-patch to all supported
branches.

Discussion: https://postgr.es/m/15865-17940eacc8f8b081@postgresql.org
2019-06-24 16:43:21 -04:00
Michael Paquier 20e1cc898d Rework some error strings for REINDEX CONCURRENTLY with system catalogs
This makes the whole user experience more consistent when bumping into
failures, and more in line with the rewording done via 508300e.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20190514153252.GA22168@alvherre.pgsql
2019-06-20 13:28:12 +09:00
Michael Paquier 3412030205 Fix more typos and inconsistencies in the tree
Author: Alexander Lakhin
Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
2019-06-17 16:13:16 +09:00
Tomas Vondra 6cbfb784c3 Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic.  That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.

Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics.  That would be a surprising behavior.

Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly.  This changed with introduction
of MCV lists, which do include most common combinations of values.

The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.

Bumped CATVERSION due to changing system catalog definitions.

Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-16 01:20:31 +02:00
Michael Paquier f43608bda2 Fix typos and inconsistencies in code comments
Author: Alexander Lakhin
Discussion: https://postgr.es/m/dec6aae8-2d63-639f-4d50-20e229fb83e3@gmail.com
2019-06-14 09:34:34 +09:00
Tom Lane e76de88615 Fix ALTER COLUMN TYPE failure with a partial exclusion constraint.
ATExecAlterColumnType failed to consider the possibility that an index
that needs to be rebuilt might be a child of a constraint that needs to be
rebuilt.  We missed this so far because usually a constraint index doesn't
have a direct dependency on its table, just on the constraint object.
But if there's a WHERE clause, then dependency analysis of the WHERE
clause results in direct dependencies on the column(s) mentioned in WHERE.
This led to trying to drop and rebuild both the constraint and its
underlying index.

In v11/HEAD, we successfully drop both the index and the constraint,
and then try to rebuild both, and of course the second rebuild hits a
duplicate-index-name problem.  Before v11, it fails with obscure messages
about a missing relation OID, due to trying to drop the index twice.

This is essentially the same kind of problem noted in commit
20bef2c31: the possible dependency linkages are broader than what
ATExecAlterColumnType was designed for.  It was probably OK when
written, but it's certainly been broken since the introduction of
partial exclusion constraints.  Fix by adding an explicit check
for whether any of the indexes-to-be-rebuilt belong to any of the
constraints-to-be-rebuilt, and ignoring any that do.

In passing, fix a latent bug introduced by commit 8b08f7d48: in
get_constraint_index() we must "continue" not "break" when rejecting
a relation of a wrong relkind.  This is harmless today because we don't
expect that code path to be taken anyway; but if there ever were any
relations to be ignored, the existing coding would have an extremely
undesirable dependency on the order of pg_depend entries.

Also adjust a couple of obsolete comments.

Per bug #15835 from Yaroslav Schekin.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/15835-32d9b7a76c06a7a9@postgresql.org
2019-06-12 12:29:39 -04:00
Andres Freund fff2a7d7bd Don't access catalogs to validate GUCs when not connected to a DB.
Vignesh found this bug in the check function for
default_table_access_method's check hook, but that was just copied
from older GUCs. Investigation by Michael and me then found the bug in
further places.

When not connected to a database (e.g. in a walsender connection), we
cannot perform (most) GUC checks that need database access. Even when
only shared tables are needed, unless they're
nailed (c.f. RelationCacheInitializePhase2()), they cannot be accessed
without pg_class etc. being present.

Fix by extending the existing IsTransactionState() checks to also
check for MyDatabaseOid.

Reported-By: Vignesh C, Michael Paquier, Andres Freund
Author: Vignesh C, Andres Freund
Discussion: https://postgr.es/m/CALDaNm1KXK9gbZfY-p_peRFm_XrBh1OwQO1Kk6Gig0c0fVZ2uw%40mail.gmail.com
Backpatch: 9.4-
2019-06-10 23:34:50 -07:00
Amit Kapila 92c4abc736 Fix assorted inconsistencies.
There were a number of issues in the recent commits which include typos,
code and comments mismatch, leftover function declarations.  Fix them.

Reported-by: Alexander Lakhin
Author: Alexander Lakhin, Amit Kapila and Amit Langote
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com
2019-06-08 08:16:38 +05:30
Alvaro Herrera a36c84c3e4 Fix default_tablespace usage for partitioned tables
In commit 87259588d0 I (Álvaro) tried to rationalize the determination
of tablespace to use for partitioned tables, but failed to handle the
default_tablespace case.  Repair and add proper tests.

Author: Amit Langote, Rushabh Lathia
Reported-by: Rushabh Lathia
Reviewed-by: Amit Langote, Álvaro Herrera
Discussion: https://postgr.es/m/CAGPqQf0cYjm1=rjxk_6gU0SjUS70=yFUAdCJLwWzh9bhNJnyVg@mail.gmail.com
2019-06-07 00:44:17 -04:00
David Rowley e24a815c1c Fix confusing NOTICE text in REINDEX CONCURRENTLY
When performing REINDEX TABLE CONCURRENTLY, if all of the table's indexes
could not be reindexed, a NOTICE message claimed that the table had no
indexes.  This was confusing, so let's change the NOTICE text to something
less confusing.

In passing, also mention in the comment before ReindexRelationConcurrently
that materialized views are supported too and also explain what the return
value of the function means.

Author: Ashwin Agrawal
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CALfoeithHvi13p_VyR8kt9o6Pa7Z=Smi6Nfc2anHnQx5Lj8bTQ@mail.gmail.com
2019-06-05 21:05:41 +12:00
David Rowley 56b3b38382 Fix incorrect index behavior in COPY FROM with partitioned tables
86b85044e rewrote how COPY FROM works to allow multiple tuple buffers to
exist to once thus allowing multi-inserts to be used in more cases with
partitioned tables.  That commit neglected to update the estate's
es_result_relation_info when flushing the insert buffer to the partition
making it possible for the index tuples to be added into an index on the
wrong partition.

Fix this and also add an Assert in ExecInsertIndexTuples to help ensure
that we never make this mistake again.

Reported-by: Haruka Takatsuka
Author: Ashutosh Sharma
Discussion: https://postgr.es/m/15832-b1bf336a4ee246b5@postgresql.org
2019-06-05 18:28:38 +12:00
Michael Paquier f7e954ad1c Rework code using list_delete_cell() in MergeAttributes
When merging two attributes, we are sure that at least one remains.
However, when deleting one element in the attribute list we may finish
with an empty list returned as NIL by list_delete_cell(), but the code
failed to track that, which is not project-like.  Adjust the call so as
we check for an empty list, and make use of it in an assertion.

This has been introduced by e7b3349, when adding support for CREATE
TABLE OF.

Author: Mark Dilger
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/CAE-h2TpPDqSWgOvfvSziOaMngMPwW+QZcmPpY8hQ_KOJ2+3hXQ@mail.gmail.com
2019-06-05 15:01:14 +09:00
Peter Eisentraut c880096dc1 Add command column to pg_stat_progress_create_index
This allows determining which command is running, similar to
pg_stat_progress_cluster.

Discussion: https://www.postgresql.org/message-id/flat/f0e56b3b-74b7-6cbc-e207-a5ed6bee18dc%402ndquadrant.com
2019-06-04 09:29:02 +02:00
Tom Lane eaf0292c3b Fix unsafe memory management in CloneRowTriggersToPartition().
It's not really supported to call systable_getnext() in a different
memory context than systable_beginscan() was called in, and it's
*definitely* not safe to do so and then reset that context between
calls.  I'm not very clear on how this code survived
CLOBBER_CACHE_ALWAYS testing ... but Alexander Lakhin found a case
that would crash it pretty reliably.

Per bug #15828.  Fix, and backpatch to v11 where this code came in.

Discussion: https://postgr.es/m/15828-f6ddd7df4852f473@postgresql.org
2019-06-03 16:59:26 -04:00
Alvaro Herrera a100974751 Fix typo in message
I introduced the typo in source code in the course of 75445c1515.
Repair.
2019-05-28 17:36:14 -04:00
Amit Kapila 9679345f3c Fix typos.
Reported-by: Alexander Lakhin
Author: Alexander Lakhin
Reviewed-by: Amit Kapila and Tom Lane
Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com
2019-05-26 18:28:18 +05:30
Andres Freund 73b8c3bd28 tableam: Rename wrapper functions to match callback names.
Some of the wrapper functions didn't match the callback names. Many of
them due to staying "consistent" with historic naming of the wrapped
functionality. We decided that for most cases it's more important to
be for tableam to be consistent going forward, than with the past.

The one exception is beginscan/endscan/...  because it'd have looked
odd to have systable_beginscan/endscan/... with a different naming
scheme, and changing the systable_* APIs would have caused way too
much churn (including breaking a lot of external users).

Author: Ashwin Agrawal, with some small additions by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CALfoeiugyrXZfX7n0ORCa4L-m834dzmaE8eFdbNR6PMpetU4Ww@mail.gmail.com
2019-05-23 16:32:36 -07:00
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Andres Freund fb504c5e4b Remove outdated comment in copy.c. 2019-05-19 20:47:54 -07:00
Alvaro Herrera 75445c1515 More message style fixes
Discussion: https://postgr.es/m/20190515183005.GA26486@alvherre.pgsql
2019-05-16 19:14:31 -04:00
Michael Paquier 508300e2e1 Improve and fix some error handling for REINDEX INDEX/TABLE CONCURRENTLY
This improves the user experience when it comes to restrict several
flavors of REINDEX CONCURRENTLY.  First, for INDEX, remove a restriction
on shared relations as we already check after catalog relations.  Then,
for TABLE, add a proper error message when attempting to run the command
on system catalogs.  The code path of CREATE INDEX CONCURRENTLY already
complains about that, but if a REINDEX is issued then then the error
generated is confusing.

While on it, add more tests to check restrictions on catalog indexes and
on toast table/index for catalogs.  Some error messages are improved,
with wording suggestion coming from Tom Lane.

Reported-by: Tom Lane
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/23694.1556806002@sss.pgh.pa.us
2019-05-10 08:18:46 +09:00
Tom Lane 2d7d946cd3 Clean up the behavior and API of catalog.c's is-catalog-relation tests.
The right way for IsCatalogRelation/Class to behave is to return true
for OIDs less than FirstBootstrapObjectId (not FirstNormalObjectId),
without any of the ad-hoc fooling around with schema membership.

The previous code was wrong because (1) it claimed that
information_schema tables were not catalog relations but their toast
tables were, which is silly; and (2) if you dropped and recreated
information_schema, which is a supported operation, the behavior
changed.  That's even sillier.  With this definition, "catalog
relations" are exactly the ones traceable to the postgres.bki data,
which seems like what we want.

With this simplification, we don't actually need access to the pg_class
tuple to identify a catalog relation; we only need its OID.  Hence,
replace IsCatalogClass with "IsCatalogRelationOid(oid)".  But keep
IsCatalogRelation as a convenience function.

This allows fixing some arguably-wrong semantics in contrib/sepgsql and
ReindexRelationConcurrently, which were using an IsSystemNamespace test
where what they really should be using is IsCatalogRelationOid.  The
previous coding failed to protect toast tables of system catalogs, and
also was not on board with the general principle that user-created tables
do not become catalogs just by virtue of being renamed into pg_catalog.
We can also get rid of a messy hack in ReindexMultipleTables.

While we're at it, also rename IsSystemNamespace to IsCatalogNamespace,
because the previous name invited confusion with the more expansive
semantics used by IsSystemRelation/Class.

Also improve the comments in catalog.c.

There are a few remaining places in replication-related code that are
special-casing OIDs below FirstNormalObjectId.  I'm inclined to think
those are wrong too, and if there should be any special case it should
just extend to FirstBootstrapObjectId.  But first we need to debate
whether a FOR ALL TABLES publication should include information_schema.

Discussion: https://postgr.es/m/21697.1557092753@sss.pgh.pa.us
Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-08 23:27:38 -04:00
Alvaro Herrera 61639816b8 Fix error messages
Some messages related to foreign servers were reporting the server name
without quotes, or not at all; our style is to have all names be quoted,
and the server name already appears quoted in a few other messages, so
just add quotes and make them all consistent.

Remove an extra "s" in other messages (typos introduced by myself in
f56f8f8da6).
2019-05-08 13:20:16 -04:00
Peter Eisentraut add85ead4a Fix table lock levels for REINDEX INDEX CONCURRENTLY
REINDEX CONCURRENTLY locks tables with ShareUpdateExclusiveLock rather
than the ShareLock used by a plain REINDEX.  However,
RangeVarCallbackForReindexIndex() was not updated for that and still
used the ShareLock only.  This would lead to lock upgrades later,
leading to possible deadlocks.

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de
2019-05-08 14:30:23 +02:00
Fujii Masao b84dbc8eb8 Add TRUNCATE parameter to VACUUM.
This commit adds new parameter to VACUUM command, TRUNCATE,
which specifies that VACUUM should attempt to truncate off
any empty pages at the end of the table and allow the disk space
for the truncated pages to be returned to the operating system.

This parameter, if specified, overrides the vacuum_truncate
reloption. If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Author: Fujii Masao
Reviewed-by: Julien Rouhaud, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com
2019-05-08 02:10:33 +09:00
Tom Lane 9691aa72e2 Fix style violations in syscache lookups.
Project style is to check the success of SearchSysCacheN and friends
by applying HeapTupleIsValid to the result.  A tiny minority of calls
creatively did it differently.  Bring them into line with the rest.

This is just cosmetic, since HeapTupleIsValid is indeed just a null
check at the moment ... but that may not be true forever, and in any
case it puts a mental burden on readers who may wonder why these
call sites are not like the rest.

Back-patch to v11 just to keep the branches in sync.  (The bulk of these
errors seem to have originated in v11 or v12, though a few are old.)

Per searching to see if anyplace else had made the same error
repaired in 62148c352.
2019-05-05 13:10:07 -04:00
Tom Lane 62148c3520 Add check for syscache lookup failure in update_relispartition().
Omitted in commit 05b38c7e6 (though it looks like the original blame
belongs to 9e9befac4).  A failure is admittedly unlikely, but if it
did happen, SIGSEGV is not the approved method of reporting it.

Per Coverity.  Back-patch to v11 where the broken code originated.
2019-05-05 12:44:32 -04:00
Alvaro Herrera 9f8b717a80 Message style fixes 2019-04-30 10:33:37 -04:00
Andres Freund 5c1560606d Fix several recently introduced issues around handling new relation forks.
Most of these stem from d25f519107 "tableam: relation creation, VACUUM
FULL/CLUSTER, SET TABLESPACE.".

1) To pass data to the relation_set_new_filenode()
   RelationSetNewRelfilenode() was made to update RelationData.rd_rel
   directly. That's not OK however, as it makes the relcache entries
   temporarily inconsistent. Which among other scenarios is a problem
   if a REINDEX targets an index on pg_class - the
   CatalogTupleUpdate() in RelationSetNewRelfilenode().  Presumably
   that was introduced because other places in the code do so - while
   those aren't "good practice" they don't appear to be actively
   buggy (e.g. because system tables may not be targeted).

   I (Andres) should have caught this while reviewing and signficantly
   evolving the code in that commit, mea culpa.

   Fix that by instead passing in the new RelFileNode as separate
   argument to relation_set_new_filenode() and rely on the relcache to
   update the catalog entry. Also revert that the
   RelationMapUpdateMap() call was changed to immediate, and undo some
   other more unnecessary changes.

2) Document that the relation_set_new_filenode cannot rely on the
   whole relcache entry to be valid. It might be worthwhile to
   refactor the code to never have to rely on that, but given the way
   heap_create() is currently coded, that'd be a large change.

3) ATExecSetTableSpace() shouldn't do FlushRelationBuffers() itself. A
   table AM might not use shared buffers at all. Move to
   index_copy_data() and heapam_relation_copy_data().

4) heapam_relation_set_new_filenode() previously sometimes accessed
   rel->rd_rel->relpersistence rather than the `persistence`
   argument. Code movement mistake.

5) Previously heapam_relation_set_new_filenode() re-opened the smgr
   relation to create the init for, if necesary. Instead have
   RelationCreateStorage() return the SMgrRelation and use it to
   create the init fork.

6) Add a note about the danger of modifying the relcache directly to
   ATExecSetTableSpace() - it's currently not a bug because there's a
   check ERRORing for catalog tables.

Regression tests and assertion improvements that together trigger the
bug described in 1) will be added in a later commit, as there is a
related bug on all branches.

Reported-By: Michael Paquier
Diagnosed-By: Tom Lane and Andres Freund
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
2019-04-29 19:28:05 -07:00
Tom Lane c01eb619a8 Apply stopgap fix for bug #15672.
Fix DefineIndex so that it doesn't attempt to pass down a to-be-reused
index relfilenode to a child index creation, and fix TryReuseIndex
to not think that reuse is sensible for a partitioned index.

In v11, this fixes a problem where ALTER TABLE on a partitioned table
could assign the same relfilenode to several different child indexes,
causing very nasty catalog corruption --- in fact, attempting to DROP
the partitioned table then leads not only to a database crash, but to
inability to restart because the same crash will recur during WAL replay.

Either of these two changes would be enough to prevent the failure, but
since neither action could possibly be sane, let's put in both changes
for future-proofing.

In HEAD, no such bug manifests, but that's just an accidental consequence
of having changed the pg_class representation of partitioned indexes to
have relfilenode = 0.  Both of these changes still seem like smart
future-proofing.

This is only a stop-gap because the code for ALTER TABLE on a partitioned
table with a no-op type change still leaves a great deal to be desired.
As the added regression tests show, it gets things wrong for comments on
child indexes/constraints, and it is regenerating child indexes it doesn't
have to.  However, fixing those problems will take more work which may not
get back-patched into v11.  We need a fix for the corruption problem now.

Per bug #15672 from Jianing Yang.

Patch by me, regression test cases based on work by Amit Langote,
who also did a lot of the investigative work.

Discussion: https://postgr.es/m/15672-b9fa7db32698269f@postgresql.org
2019-04-26 17:18:07 -04:00
Alvaro Herrera 05b38c7e63 Fix partitioned index attachment
When an existing index in a partition is attached to a new index on
its parent, we forgot to set the "relispartition" flag correctly, which
meant that it was not possible to find the index in various operations,
such as adding a foreign key constraint that references that partitioned
table.  One of four places that was assigning the parent index was
forgetting to do that, so fix by shifting responsibility of updating the
flag to the routine that changes the parent.

Author: Amit Langote, Álvaro Herrera
Reported-by: Hubert "depesz" Lubaczewski
Discussion: https://postgr.es/m/CA+HiwqHMsRtRYRWYTWavKJ8x14AFsv7bmAV46mYwnfD3vy8goQ@mail.gmail.com
2019-04-25 11:22:29 -04:00
Alvaro Herrera 87259588d0 Fix tablespace inheritance for partitioned rels
Commit ca4103025d left a few loose ends.  The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8, but some things remained:

* When ALTER TABLE rewrites tables, the indexes must remain in the
  tablespace they were originally in.  This didn't work because
  index recreation during ALTER TABLE runs manufactured SQL (yuck),
  which runs afoul of default_tablespace in competition with the parent
  relation tablespace.  To fix, reset default_tablespace to the empty
  string temporarily, and add the TABLESPACE clause as appropriate.

* Setting a partitioned rel's tablespace to the database default is
  confusing; if it worked, it would direct the partitions to that
  tablespace regardless of default_tablespace.  But in reality it does
  not work, and making it work is a larger project.  Therefore, throw
  an error when this condition is detected, to alert the unwary.

Add some docs and tests, too.

Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 10:31:32 -04:00
Andres Freund fdc7efcc30 Allow pg_class xid & multixid horizons to not be set.
This allows table AMs that don't need these horizons. This was already
documented in the tableam relation_set_new_filenode callback, but an
assert prevented if from actually working (the test AM code contained
the change itself). Defang the asserts in the general code, and move
the stronger ones into heap AM.

Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid /
relminmxid. Change the table_relation_copy_for_cluster() interface to
allow the AM to overwrite the horizons that get set on the pg_class
entry.  This'd also in the future allow AMs like heap to compute a
relfrozenxid during rewrite that's the table's actual minimum rather
than a pre-determined value.  Arguably it'd have been better to move
the whole computation / setting of those values into the callback, but
it seems likely that for other reasons it'd be better to be able to
use one value to vacuum/cluster multiple tables (e.g. a toast's
horizon shouldn't be different than the table's).

Reported-By: Heikki Linnakangas
Author: Andres Freund
Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
2019-04-23 21:42:12 -07:00
Tom Lane f4a3fdfbdc Avoid order-of-execution problems with ALTER TABLE ADD PRIMARY KEY.
Up to now, DefineIndex() was responsible for adding attnotnull constraints
to the columns of a primary key, in any case where it hadn't been
convenient for transformIndexConstraint() to mark those columns as
is_not_null.  It (or rather its minion index_check_primary_key) did this
by executing an ALTER TABLE SET NOT NULL command for the target table.

The trouble with this solution is that if we're creating the index due
to ALTER TABLE ADD PRIMARY KEY, and the outer ALTER TABLE has additional
sub-commands, the inner ALTER TABLE's operations executed at the wrong
time with respect to the outer ALTER TABLE's operations.  In particular,
the inner ALTER would perform a validation scan at a point where the
table's storage might be inconsistent with its catalog entries.  (This is
on the hairy edge of being a security problem, but AFAICS it isn't one
because the inner scan would only be interested in the tuples' null
bitmaps.)  This can result in unexpected failures, such as the one seen
in bug #15580 from Allison Kaptur.

To fix, let's remove the attempt to do SET NOT NULL from DefineIndex(),
reducing index_check_primary_key's role to verifying that the columns are
already not null.  (It shouldn't ever see such a case, but it seems wise
to keep the check for safety.)  Instead, make transformIndexConstraint()
generate ALTER TABLE SET NOT NULL subcommands to be executed ahead of
the ADD PRIMARY KEY operation in every case where it can't force the
column to be created already-not-null.  This requires only minor surgery
in parse_utilcmd.c, and it makes for a much more satisfying spec for
transformIndexConstraint(): it's no longer having to take it on faith
that someone else will handle addition of NOT NULL constraints.

To make that work, we have to move the execution of AT_SetNotNull into
an ALTER pass that executes ahead of AT_PASS_ADD_INDEX.  I moved it to
AT_PASS_COL_ATTRS, and put that after AT_PASS_ADD_COL to avoid failure
when the column is being added in the same command.  This incidentally
fixes a bug in the only previous usage of AT_PASS_COL_ATTRS, for
AT_SetIdentity: it didn't work either for a newly-added column.

Playing around with this exposed a separate bug in ALTER TABLE ONLY ...
ADD PRIMARY KEY for partitioned tables.  The intent of the ONLY modifier
in that context is to prevent doing anything that would require holding
lock for a long time --- but the implied SET NOT NULL would recurse to
the child partitions, and do an expensive validation scan for any child
where the column(s) were not already NOT NULL.  To fix that, invent a
new ALTER subcommand AT_CheckNotNull that just insists that a child
column be already NOT NULL, and apply that, not AT_SetNotNull, when
recursing to children in this scenario.  This results in a slightly laxer
definition of ALTER TABLE ONLY ... SET NOT NULL for partitioned tables,
too: that command will now work as long as all children are already NOT
NULL, whereas before it just threw up its hands if there were any
partitions.

In passing, clean up the API of generateClonedIndexStmt(): remove a
useless argument, ensure that the output argument is not left undefined,
update the header comment.

A small side effect of this change is that no-such-column errors in ALTER
TABLE ADD PRIMARY KEY now produce a different message that includes the
table name, because they are now detected by the SET NOT NULL step which
has historically worded its error that way.  That seems fine to me, so
I didn't make any effort to avoid the wording change.

The basic bug #15580 is of very long standing, and these other bugs
aren't new in v12 either.  However, this is a pretty significant change
in the way ALTER TABLE ADD PRIMARY KEY works.  On balance it seems best
not to back-patch, at least not till we get some more confidence that
this patch has no new bugs.

Patch by me, but thanks to Jie Zhang for a preliminary version.

Discussion: https://postgr.es/m/15580-d1a6de5a3d65da51@postgresql.org
Discussion: https://postgr.es/m/1396E95157071C4EBBA51892C5368521017F2E6E63@G08CNEXMBPEKD02.g08.fujitsu.local
2019-04-23 12:25:27 -04:00
Andres Freund 88e6ad3054 Fix two memory leaks around force-storing tuples in slots.
As reported by Tom, when ExecStoreMinimalTuple() had to perform a
conversion to store the minimal tuple in the slot, it forgot to
respect the shouldFree flag, and leaked the tuple into the current
memory context if true.  Fix that by freeing the tuple in that case.

Looking at the relevant code made me (Andres) realize that not having
the shouldFree parameter to ExecForceStoreHeapTuple() was a bad
idea. Some callers had to locally implement the necessary logic, and
in one case it was missing, creating a potential per-group leak in
non-hashed aggregation.

The choice to not free the tuple in ExecComputeStoredGenerated() is
not pretty, but not introduced by this commit - I'll start a separate
discussion about it.

Reported-By: Tom Lane
Discussion: https://postgr.es/m/366.1555382816@sss.pgh.pa.us
2019-04-19 11:39:56 -07:00
Andres Freund 75e03eabea Fix potential use-after-free for BEFORE UPDATE row triggers on non-core AMs.
When such a trigger returns the old row version, it naturally get
stored in the slot for the trigger result. When a table AMs doesn't
store HeapTuples internally, ExecBRUpdateTriggers() frees the old row
version passed to triggers - but before this fix it might still be
referenced by the slot holding the new tuple.

Noticed when running the out-of-core zheap AM against the in-core
version of tableam.

Author: Andres Freund
2019-04-18 17:53:54 -07:00
Michael Paquier a6dcf9df4d Rework handling of invalid indexes with REINDEX CONCURRENTLY
Per discussion with others, allowing REINDEX INDEX CONCURRENTLY to work
for invalid indexes when working directly on them can have a lot of
value to unlock situations with invalid indexes without having to use a
dance involving DROP INDEX followed by an extra CREATE INDEX
CONCURRENTLY (which would not work for indexes with constraint
dependency anyway).  This also does not create extra bloat on the
relation involved as this works on individual indexes, so let's enable
it.

Note that REINDEX TABLE CONCURRENTLY still bypasses invalid indexes as
we don't want to bloat the number of indexes defined on a relation in
the event of multiple and successive failures of REINDEX CONCURRENTLY.

More regression tests are added to cover those behaviors, using an
invalid index created with CREATE INDEX CONCURRENTLY.

Reported-by: Dagfinn Ilmari Mannsåker, Álvaro Herrera
Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20190411134947.GA22043@alvherre.pgsql
2019-04-17 09:33:51 +09:00
Michael Paquier d527fda621 Fix more strcmp() calls using boolean-like comparisons for result checks
Such calls can confuse the reader as strcmp() uses an integer as result.
The places patched here have been spotted by Thomas Munro, David Rowley
and myself.

Author: Michael Paquier
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20190411021946.GG2728@paquier.xyz
2019-04-12 10:16:49 +09:00
Andres Freund 4c9e1bd0a3 Reset memory context once per tuple in validateForeignKeyConstraint.
When using tableam ExecFetchSlotHeapTuple() might return a separately
allocated tuple. We could use the shouldFree argument to explicitly
free it, but it seems more robust to to protect

Also add a CHECK_FOR_INTERRUPTS() after each tuple. It's likely that
each AM has (heap does) a CFI somewhere in the relevant path, but it
seems more robust to have one in validateForeignKeyConstraint()
itself.

Note that this only affects the cases that couldn't be optimized to be
verified with a query.

Author: Andres Freund
Reviewed-By: Tom Lane (in an earlier version)
Discussion:
    https://postgr.es/m/19030.1554574075@sss.pgh.pa.us
    https://postgr.es/m/CAKJS1f_SHKcPYMsi39An5aUjhAcEMZb6Cx1Sj1QWEWSiKJkBVQ@mail.gmail.com
    https://postgr.es/m/20180711185628.mrvl46bjgk2uxoki@alap3.anarazel.de
2019-04-07 22:42:42 -07:00
Peter Eisentraut 03f9e5cba0 Report progress of REINDEX operations
This uses the same infrastructure that the CREATE INDEX progress
reporting uses.  Add a column to pg_stat_progress_create_index to
report the OID of the index being worked on.  This was not necessary
for CREATE INDEX, but it's useful for REINDEX.

Also edit the phase descriptions a bit to be more consistent with the
source code comments.

Discussion: https://www.postgresql.org/message-id/ef6a6757-c36a-9e81-123f-13b19e36b7d7%402ndquadrant.com
2019-04-07 12:35:29 +02:00
Tom Lane 46e3442c9e Fix failures in validateForeignKeyConstraint's slow path.
The foreign-key-checking loop in ATRewriteTables failed to ignore
relations without storage (e.g., partitioned tables), unlike the
initial loop.  This accidentally worked as long as RI_Initial_Check
succeeded, which it does in most practical cases (including all the
ones exercised in the existing regression tests :-().  However, if
that failed, as for instance when there are permissions issues,
then we entered the slow fire-the-trigger-on-each-tuple path.
And that would try to read from the referencing relation, and fail
if it lacks storage.

A second problem, recently introduced in HEAD, was that this loop
had been broken by sloppy refactoring for the tableam API changes.

Repair both issues, and add a regression test case so we have some
coverage on this code path.  Back-patch as needed to v11.

(It looks like this code could do with additional bulletproofing,
but let's get a working test case in place first.)

Hadi Moshayedi, Tom Lane, Andres Freund

Discussion: https://postgr.es/m/CAK=1=WrnNmBbe5D9sm3t0a6dnAq3cdbF1vXY816j1wsMqzC8bw@mail.gmail.com
Discussion: https://postgr.es/m/19030.1554574075@sss.pgh.pa.us
Discussion: https://postgr.es/m/20190325180405.jytoehuzkeozggxx%40alap3.anarazel.de
2019-04-06 15:09:09 -04:00
Andres Freund 57a7a3adfe Remove unused struct member, enforce multi_insert callback presence.
Author: David Rowley, Andres Freund
Discussion: https://postgr.es/m/CAKJS1f9=9phmm66diAji4gvHnWSrK7BGFoNct+mEUT_c8pPOjw@mail.gmail.com
2019-04-04 17:39:39 -07:00
Andres Freund ea97e440b8 Harden tableam against nonexistant / wrong kind of AMs.
Previously it was allowed to set default_table_access_method to an
empty string. That makes sense for default_tablespace, where that was
copied from, as it signals falling back to the database's default
tablespace. As there is no equivalent for table AMs, forbid that.

Also make sure to throw a usable error when creating a table using an
index AM, by using get_am_type_oid() to implement get_table_am_oid()
instead of a separate copy. Previously we'd error out only later, in
GetTableAmRoutine().

Thirdly remove GetTableAmRoutineByAmId() - it was only used in an
earlier version of 8586bf7ed8.

Add tests for the above (some for index AMs as well).
2019-04-04 17:39:39 -07:00
Andres Freund 86b85044e8 tableam: Add table_multi_insert() and revamp/speed-up COPY FROM buffering.
This adds table_multi_insert(), and converts COPY FROM, the only user
of heap_multi_insert, to it.

A simple conversion of COPY FROM use slots would have yielded a
slowdown when inserting into a partitioned table for some
workloads. Different partitions might need different slots (both slot
types and their descriptors), and dropping / creating slots when
there's constant partition changes is measurable.

Thus instead revamp the COPY FROM buffering for partitioned tables to
allow to buffer inserts into multiple tables, flushing only when
limits are reached across all partition buffers. By only dropping
slots when there've been inserts into too many different partitions,
the aforementioned overhead is gone. By allowing larger batches, even
when there are frequent partition changes, we actuall speed such cases
up significantly.

By using slots COPY of very narrow rows into unlogged / temporary
might slow down very slightly (due to the indirect function calls).

Author: David Rowley, Andres Freund, Haribabu Kommi
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20190327054923.t3epfuewxfqdt22e@alap3.anarazel.de
2019-04-04 16:28:18 -07:00
Robert Haas a96c41feec Allow VACUUM to be run with index cleanup disabled.
This commit adds a new reloption, vacuum_index_cleanup, which
controls whether index cleanup is performed for a particular
relation by default.  It also adds a new option to the VACUUM
command, INDEX_CLEANUP, which can be used to override the
reloption.  If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro
Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me.
The wording of the documentation is mostly due to me.

Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com
2019-04-04 15:04:43 -04:00
Thomas Munro 3eb77eba5a Refactor the fsync queue for wider use.
Previously, md.c and checkpointer.c were tightly integrated so that
fsync calls could be handed off and processed in the background.
Introduce a system of callbacks and file tags, so that other modules
can hand off fsync work in the same way.

For now only md.c uses the new interface, but other users are being
proposed.  Since there may be use cases that are not strictly SMGR
implementations, use a new function table for sync handlers rather
than extending the traditional SMGR one.

Instead of using a bitmapset of segment numbers for each RelFileNode
in the checkpointer's hash table, make the segment number part of the
key.  This requires sending explicit "forget" requests for every
segment individually when relations are dropped, but suits the file
layout schemes of proposed future users better (ie sparse or high
segment numbers).

Author: Shawn Debnath and Thomas Munro
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAEepm=2gTANm=e3ARnJT=n0h8hf88wqmaZxk0JYkxw+b21fNrw@mail.gmail.com
2019-04-04 23:38:38 +13:00
Tomas Vondra ea569d64ac Add SETTINGS option to EXPLAIN, to print modified settings.
Query planning is affected by a number of configuration options, and it
may be crucial to know which of those options were set to non-default
values.  With this patch you can say EXPLAIN (SETTINGS ON) to include
that information in the query plan.  Only options affecting planning,
with values different from the built-in default are printed.

This patch also adds auto_explain.log_settings option, providing the
same capability in auto_explain module.

Author: Tomas Vondra
Reviewed-by: Rafia Sabih, John Naylor
Discussion: https://postgr.es/m/e1791b4c-df9c-be02-edc5-7c8874944be0@2ndquadrant.com
2019-04-04 00:04:31 +02:00
Alvaro Herrera 5f6fc34af5 Copy name when cloning FKs recurses to partitions
We were passing a string owned by a syscache entry, which was released
before recursing.  Fix by pstrdup'ing the string.

Per buildfarm member prion.
2019-04-03 15:35:54 -03:00
Alvaro Herrera f56f8f8da6 Support foreign keys that reference partitioned tables
Previously, while primary keys could be made on partitioned tables, it
was not possible to define foreign keys that reference those primary
keys.  Now it is possible to do that.

Author: Álvaro Herrera
Reviewed-by: Amit Langote, Jesper Pedersen
Discussion: https://postgr.es/m/20181102234158.735b3fevta63msbj@alvherre.pgsql
2019-04-03 14:40:21 -03:00
Alvaro Herrera ab0dfc961b Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.

There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific.  The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.

The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan.  (The index validation index scan requires
patching each AM, which has not been included here.)

Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 15:18:08 -03:00
Andres Freund d45e401586 tableam: Add table_finish_bulk_insert().
This replaces the previous calls of heap_sync() in places using
bulk-insert. By passing in the flags used for bulk-insert the AM can
decide (first at insert time and then during the finish call) which of
the optimizations apply to it, and what operations are necessary to
finish a bulk insert operation.

Also change HEAP_INSERT_* flags to TABLE_INSERT, and rename hi_options
to ti_options.

These changes are made even in copy.c, which hasn't yet been converted
to tableam. There's no harm in doing so.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-04-01 14:41:42 -07:00
Andres Freund 737a292b5d tableam: VACUUM and ANALYZE support.
This is a relatively straightforward move of the current
implementation to sit below tableam. As the current analyze sampling
implementation is pretty inherently block based, the tableam analyze
interface is as well. It might make sense to generalize that at some
point, but that seems like a larger project that shouldn't be
undertaken at the same time as the introduction of tableam.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-30 19:25:58 -07:00
Peter Eisentraut fc22b6623b Generated columns
This is an SQL-standard feature that allows creating columns that are
computed from expressions rather than assigned, similar to a view or
materialized view but on a column basis.

This implements one kind of generated column: stored (computed on
write).  Another kind, virtual (computed on read), is planned for the
future, and some room is left for it.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
2019-03-30 08:15:57 +01:00
Peter Eisentraut 6b8b5364dd Small code simplification for REINDEX CONCURRENTLY
This was left over from an earlier code structure.
2019-03-30 07:16:24 +01:00
Robert Haas 41b54ba78e Allow existing VACUUM options to take a Boolean argument.
This makes VACUUM work more like EXPLAIN already does without changing
the meaning of any commands that already work.  It is intended to
facilitate the addition of future VACUUM options that may take
non-Boolean parameters or that default to false.

Masahiko Sawada, reviewed by me.

Discussion: http://postgr.es/m/CA+TgmobpYrXr5sUaEe_T0boabV0DSm=utSOZzwCUNqfLEEm8Mw@mail.gmail.com
Discussion: http://postgr.es/m/CAD21AoBaFcKBAeL5_++j+Vzir2vBBcF4juW7qH8b3HsQY=Q6+w@mail.gmail.com
2019-03-29 08:22:49 -04:00
Peter Eisentraut bb76134b08 Fix incorrect code in new REINDEX CONCURRENTLY code
The previous code was adding pointers to transient variables to a
list, but by the time the list was read, the variable might be gone,
depending on the compiler.  Fix it by making copies in the proper
memory context.
2019-03-29 10:53:40 +01:00
Peter Eisentraut 5dc92b844e REINDEX CONCURRENTLY
This adds the CONCURRENTLY option to the REINDEX command.  A REINDEX
CONCURRENTLY on a specific index creates a new index (like CREATE
INDEX CONCURRENTLY), then renames the old index away and the new index
in place and adjusts the dependencies, and then drops the old
index (like DROP INDEX CONCURRENTLY).  The REINDEX command also has
the capability to run its other variants (TABLE, DATABASE) with the
CONCURRENTLY option (but not SYSTEM).

The reindexdb command gets the --concurrently option.

Author: Michael Paquier, Andreas Karlsson, Peter Eisentraut
Reviewed-by: Andres Freund, Fujii Masao, Jim Nasby, Sergei Kornilov
Discussion: https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee
2019-03-29 08:26:33 +01:00
Andres Freund d25f519107 tableam: relation creation, VACUUM FULL/CLUSTER, SET TABLESPACE.
This moves the responsibility for:
- creating the storage necessary for a relation, including creating a
  new relfilenode for a relation with existing storage
- non-transactional truncation of a relation
- VACUUM FULL / CLUSTER's rewrite of a table
below tableam.

This is fairly straight forward, with a bit of complexity smattered in
to move the computation of xid / multixid horizons below the AM, as
they don't make sense for every table AM.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-28 20:01:43 -07:00
Tomas Vondra 7300a69950 Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.

Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
Alvaro Herrera 126d631222 Fix partitioned index creation bug with dropped columns
ALTER INDEX .. ATTACH PARTITION fails if the partitioned table where the
index is defined contains more dropped columns than its partition, with
this message:
  ERROR:  incorrect attribute map
The cause was that one caller of CompareIndexInfo was passing the number
of attributes of the partition rather than the parent, which confused
the length check.  Repair.

This can cause pg_upgrade to fail when used on such a database.  Leave
some more objects around after regression tests, so that the case is
detected by pg_upgrade test suite.

Remove some spurious empty lines noticed while looking for other cases
of the same problem.

Discussion: https://postgr.es/m/20190326213924.GA2322@alvherre.pgsql
2019-03-26 20:19:28 -03:00
Robert Haas 6f97457e0d Add progress reporting for CLUSTER and VACUUM FULL.
This uses the same progress reporting infrastructure added in commit
c16dc1aca5 and extends it to these
additional cases.  We lack the ability to track the internal progress
of sorts and index builds so the information reported is
coarse-grained for some parts of the operation, but it still seems
like a significant improvement over having nothing at all.

Tatsuro Yamada, reviewed by Thomas Munro, Masahiko Sawada, Michael
Paquier, Jeff Janes, Alvaro Herrera, Rafia Sabih, and by me.  A fair
amount of polishing also by me.

Discussion: http://postgr.es/m/59A77072.3090401@lab.ntt.co.jp
2019-03-25 10:59:04 -04:00
Andres Freund 9a8ee1dc65 tableam: Add and use table_fetch_row_version().
This is essentially the tableam version of heapam_fetch(),
i.e. fetching a tuple identified by a tid, performing visibility
checks.

Note that this different from table_index_fetch_tuple(), which is for
index lookups. It therefore has to handle a tid pointing to an earlier
version of a tuple if the AM uses an optimization like heap's HOT. Add
comments to that end.

This commit removes the stats_relation argument from heap_fetch, as
it's been unused for a long time.

Author: Andres Freund
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-25 00:17:59 -07:00
Andres Freund 919e48b943 tableam: Use in CREATE TABLE AS and CREATE MATERIALIZED VIEW.
Previously those directly performed a heap_insert(). Use
table_insert() instead.  The input slot of those routines is not of
the target relation - we could fix that by copying if necessary, but
that'd not be beneficial for performance. As those codepaths don't
access any AM specific tuple fields (say xmin/xmax), there's no need
to use an AM specific slot.

Author: Andres Freund
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-24 18:58:37 -07:00
Andres Freund 5db6df0c01 tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.

Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one.  It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch().  Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.

The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.

As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.

The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).

Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.

Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.

Author: Andres Freund and Haribabu Kommi
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-23 19:55:57 -07:00
Peter Eisentraut 5e1963fb76 Collations with nondeterministic comparison
This adds a flag "deterministic" to collations.  If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal.  That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.

This functionality is only supported with the ICU provider.  At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.

The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).

This patch makes changes in three areas:

- CREATE COLLATION DDL changes and system catalog changes to support
  this new flag.

- Many executor nodes and auxiliary code are extended to track
  collations.  Previously, this code would just throw away collation
  information, because the eventually-called user-defined functions
  didn't use it since they only cared about equality, which didn't
  need collation information.

- String data type functions that do equality comparisons and hashing
  are changed to take the (non-)deterministic flag into account.  For
  comparison, this just means skipping various shortcuts and tie
  breakers that use byte-wise comparison.  For hashing, we first need
  to convert the input string to a canonical "sort key" using the ICU
  analogue of strxfrm().

Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
2019-03-22 12:12:43 +01:00
Alvaro Herrera a6da004715 Add index_get_partition convenience function
This new function simplifies some existing coding, as well as supports
future patches.

Discussion: https://postgr.es/m/201901222145.t6wws6t6vrcu@alvherre.pgsql
Reviewed-by: Amit Langote, Jesper Pedersen
2019-03-20 18:18:50 -03:00
Andrew Gierth 01bde4fa4c Implement OR REPLACE option for CREATE AGGREGATE.
Aggregates have acquired a dozen or so optional attributes in recent
years for things like parallel query and moving-aggregate mode; the
lack of an OR REPLACE option to add or change these for an existing
agg makes extension upgrades gratuitously hard. Rectify.
2019-03-19 01:16:50 +00:00
Robert Haas 6776142a07 Revise parse tree representation for VACUUM and ANALYZE.
Like commit f41551f61f, this aims
to make it easier to add non-Boolean options to VACUUM (or, in
this case, to ANALYZE).  Instead of building up a bitmap of
options directly in the parser, build up a list of DefElem
objects and let ExecVacuum() sort it out; right now, we make
no use of the fact that a DefElem can carry an associated value,
but it will be easy to make that change in the future.

Masahiko Sawada

Discussion: http://postgr.es/m/CAD21AoATE4sn0jFFH3NcfUZXkU2BMbjBWB_kDj-XWYA-LXDcQA@mail.gmail.com
2019-03-18 15:14:52 -04:00
Robert Haas f41551f61f Fold vacuum's 'int options' parameter into VacuumParams.
Many places need both, so this allows a few functions to take one
fewer parameter.  More importantly, as soon as we add a VACUUM
option that takes a non-Boolean parameter, we need to replace
'int options' with a struct, and it seems better to think
of adding more fields to VacuumParams rather than passing around
both VacuumParams and a separate struct as well.

Patch by me, reviewed by Masahiko Sawada

Discussion: http://postgr.es/m/CA+Tgmob6g6-s50fyv8E8he7APfwCYYJ4z0wbZC2yZeSz=26CYQ@mail.gmail.com
2019-03-18 13:57:33 -04:00
Peter Eisentraut 61dc407893 Improve code comment 2019-03-14 22:44:21 +01:00
Peter Eisentraut c6ff0b892c Refactor ParamListInfo initialization
There were six copies of identical nontrivial code.  Put it into a
function.
2019-03-14 13:30:09 +01:00
Peter Eisentraut f177660ab0 Include all columns in default names for foreign key constraints
When creating a name for a foreign key constraint when none is
specified, use all column names instead of only the first one, similar
to how it is already done for index names.

Author: Paul Martinez <hellopfm@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CAF+2_SFjky6XRfLNRXpkG97W6PRbOO_mjAxqXzAAimU=c7w7_A@mail.gmail.com
2019-03-13 14:25:42 +01:00
Robert Haas bbb96c3704 Allow ALTER TABLE .. SET NOT NULL to skip provably unnecessary scans.
If existing CHECK or NOT NULL constraints preclude the presence
of nulls, we need not look to see whether any are present.

Sergei Kornilov, reviewed by Stephen Frost, Ildar Musin, David Rowley,
and by me.

Discussion: http://postgr.es/m/81911511895540@web58j.yandex.ru
2019-03-13 08:55:00 -04:00
Andres Freund c2fe139c20 tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:

1) Heap scans need to be generalized into table scans. Do this by
   introducing TableScanDesc, which will be the "base class" for
   individual AMs. This contains the AM independent fields from
   HeapScanDesc.

   The previous heap_{beginscan,rescan,endscan} et al. have been
   replaced with a table_ version.

   There's no direct replacement for heap_getnext(), as that returned
   a HeapTuple, which is undesirable for a other AMs. Instead there's
   table_scan_getnextslot().  But note that heap_getnext() lives on,
   it's still used widely to access catalog tables.

   This is achieved by new scan_begin, scan_end, scan_rescan,
   scan_getnextslot callbacks.

2) The portion of parallel scans that's shared between backends need
   to be able to do so without the user doing per-AM work. To achieve
   that new parallelscan_{estimate, initialize, reinitialize}
   callbacks are introduced, which operate on a new
   ParallelTableScanDesc, which again can be subclassed by AMs.

   As it is likely that several AMs are going to be block oriented,
   block oriented callbacks that can be shared between such AMs are
   provided and used by heap. table_block_parallelscan_{estimate,
   intiialize, reinitialize} as callbacks, and
   table_block_parallelscan_{nextpage, init} for use in AMs. These
   operate on a ParallelBlockTableScanDesc.

3) Index scans need to be able to access tables to return a tuple, and
   there needs to be state across individual accesses to the heap to
   store state like buffers. That's now handled by introducing a
   sort-of-scan IndexFetchTable, which again is intended to be
   subclassed by individual AMs (for heap IndexFetchHeap).

   The relevant callbacks for an AM are index_fetch_{end, begin,
   reset} to create the necessary state, and index_fetch_tuple to
   retrieve an indexed tuple.  Note that index_fetch_tuple
   implementations need to be smarter than just blindly fetching the
   tuples for AMs that have optimizations similar to heap's HOT - the
   currently alive tuple in the update chain needs to be fetched if
   appropriate.

   Similar to table_scan_getnextslot(), it's undesirable to continue
   to return HeapTuples. Thus index_fetch_heap (might want to rename
   that later) now accepts a slot as an argument. Core code doesn't
   have a lot of call sites performing index scans without going
   through the systable_* API (in contrast to loads of heap_getnext
   calls and working directly with HeapTuples).

   Index scans now store the result of a search in
   IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
   target is not generally a HeapTuple anymore that seems cleaner.

To be able to sensible adapt code to use the above, two further
callbacks have been introduced:

a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
   slots capable of holding a tuple of the AMs
   type. table_slot_callbacks() and table_slot_create() are based
   upon that, but have additional logic to deal with views, foreign
   tables, etc.

   While this change could have been done separately, nearly all the
   call sites that needed to be adapted for the rest of this commit
   also would have been needed to be adapted for
   table_slot_callbacks(), making separation not worthwhile.

b) tuple_satisfies_snapshot checks whether the tuple in a slot is
   currently visible according to a snapshot. That's required as a few
   places now don't have a buffer + HeapTuple around, but a
   slot (which in heap's case internally has that information).

Additionally a few infrastructure changes were needed:

I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
   internally uses a slot to keep track of tuples. While
   systable_getnext() still returns HeapTuples, and will so for the
   foreseeable future, the index API (see 1) above) now only deals with
   slots.

The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.

Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 12:46:41 -07:00
Alvaro Herrera af38498d4c Move hash_any prototype from access/hash.h to utils/hashutils.h
... as well as its implementation from backend/access/hash/hashfunc.c to
backend/utils/hash/hashfn.c.

access/hash is the place for the hash index AM, not really appropriate
for generic facilities, which is what hash_any is; having things the old
way meant that anything using hash_any had to include the AM's include
file, pointlessly polluting its namespace with unrelated, unnecessary
cruft.

Also move the HTEqual strategy number to access/stratnum.h from
access/hash.h.

To avoid breaking third-party extension code, add an #include
"utils/hashutils.h" to access/hash.h.  (An easily removed line by
committers who enjoy their asbestos suits to protect them from angry
extension authors.)

Discussion: https://postgr.es/m/201901251935.ser5e4h6djt2@alvherre.pgsql
2019-03-11 13:17:50 -03:00
Tom Lane caf626b2cd Convert [autovacuum_]vacuum_cost_delay into floating-point GUCs.
This change makes it possible to specify sub-millisecond delays,
which work well on most modern platforms, though that was not true
when the cost-delay feature was designed.

To support this without breaking existing configuration entries,
improve guc.c to allow floating-point GUCs to have units.  Also,
allow "us" (microseconds) as an input/output unit for time-unit GUCs.
(It's not allowed as a base unit, at least not yet.)

Likewise change the autovacuum_vacuum_cost_delay reloption to be
floating-point; this forces a catversion bump because the layout of
StdRdOptions changes.

This patch doesn't in itself change the default values or allowed
ranges for these parameters, and it should not affect the behavior
for any already-allowed setting for them.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-10 15:01:39 -04:00
Noah Misch 3c5926301a Avoid some table rewrites for ALTER TABLE .. SET DATA TYPE timestamp.
When the timezone is UTC, timestamptz and timestamp are binary coercible
in both directions.  See b8a18ad485 and
c22ecc6562 for the previous attempt in
this problem space.  Skip the table rewrite; for now, continue to
needlessly rewrite any index on an affected column.

Reviewed by Simon Riggs and Tom Lane.

Discussion: https://postgr.es/m/20190226061450.GA1665944@rfd.leadboat.com
2019-03-08 20:16:27 -08:00
Robert Haas 898e5e3290 Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.
We still require AccessExclusiveLock on the partition itself, because
otherwise an insert that violates the newly-imposed partition
constraint could be in progress at the same time that we're changing
that constraint; only the lock level on the parent relation is
weakened.

To make this safe, we have to cope with (at least) three separate
problems. First, relevant DDL might commit while we're in the process
of building a PartitionDesc.  If so, find_inheritance_children() might
see a new partition while the RELOID system cache still has the old
partition bound cached, and even before invalidation messages have
been queued.  To fix that, if we see that the pg_class tuple seems to
be missing or to have a null relpartbound, refetch the value directly
from the table. We can't get the wrong value, because DETACH PARTITION
still requires AccessExclusiveLock throughout; if we ever want to
change that, this will need more thought. In testing, I found it quite
difficult to hit even the null-relpartbound case; the race condition
is extremely tight, but the theoretical risk is there.

Second, successive calls to RelationGetPartitionDesc might not return
the same answer.  The query planner will get confused if lookup up the
PartitionDesc for a particular relation does not return a consistent
answer for the entire duration of query planning.  Likewise, query
execution will get confused if the same relation seems to have a
different PartitionDesc at different times.  Invent a new
PartitionDirectory concept and use it to ensure consistency.  This
ensures that a single invocation of either the planner or the executor
sees the same view of the PartitionDesc from beginning to end, but it
does not guarantee that the planner and the executor see the same
view.  Since this allows pointers to old PartitionDesc entries to
survive even after a relcache rebuild, also postpone removing the old
PartitionDesc entry until we're certain no one is using it.

For the most part, it seems to be OK for the planner and executor to
have different views of the PartitionDesc, because the executor will
just ignore any concurrently added partitions which were unknown at
plan time; those partitions won't be part of the inheritance
expansion, but invalidation messages will trigger replanning at some
point.  Normally, this happens by the time the very next command is
executed, but if the next command acquires no locks and executes a
prepared query, it can manage not to notice until a new transaction is
started.  We might want to tighten that up, but it's material for a
separate patch.  There would still be a small window where a query
that started just after an ATTACH PARTITION command committed might
fail to notice its results -- but only if the command starts before
the commit has been acknowledged to the user. All in all, the warts
here around serializability seem small enough to be worth accepting
for the considerable advantage of being able to add partitions without
a full table lock.

Although in general the consequences of new partitions showing up
between planning and execution are limited to the query not noticing
the new partitions, run-time partition pruning will get confused in
that case, so that's the third problem that this patch fixes.
Run-time partition pruning assumes that indexes into the PartitionDesc
are stable between planning and execution.  So, add code so that if
new partitions are added between plan time and execution time, the
indexes stored in the subplan_map[] and subpart_map[] arrays within
the plan's PartitionedRelPruneInfo get adjusted accordingly.  There
does not seem to be a simple way to generalize this scheme to cope
with partitions that are removed, mostly because they could then get
added back again with different bounds, but it works OK for added
partitions.

This code does not try to ensure that every backend participating in
a parallel query sees the same view of the PartitionDesc.  That
currently doesn't matter, because we never pass PartitionDesc
indexes between backends.  Each backend will ignore the concurrently
added partitions which it notices, and it doesn't matter if different
backends are ignoring different sets of concurrently added partitions.
If in the future that matters, for example because we allow writes in
parallel query and want all participants to do tuple routing to the same
set of partitions, the PartitionDirectory concept could be improved to
share PartitionDescs across backends.  There is a draft patch to
serialize and restore PartitionDescs on the thread where this patch
was discussed, which may be a useful place to start.

Patch by me.  Thanks to Alvaro Herrera, David Rowley, Simon Riggs,
Amit Langote, and Michael Paquier for discussion, and to Alvaro
Herrera for some review.

Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
2019-03-07 11:13:12 -05:00
Andres Freund 8586bf7ed8 tableam: introduce table AM infrastructure.
This introduces the concept of table access methods, i.e. CREATE
  ACCESS METHOD ... TYPE TABLE and
  CREATE TABLE ... USING (storage-engine).
No table access functionality is delegated to table AMs as of this
commit, that'll be done in following commits.

Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.

Docs will be updated at the end, as adding them incrementally would
likely make them less coherent, and definitely is a lot more work,
without a lot of benefit.

Table access methods are specified similar to index access methods,
i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with
callbacks. In contrast to index AMs that struct needs to live as long
as a backend, typically that's achieved by just returning a pointer to
a constant struct.

Psql's \d+ now displays a table's access method. That can be disabled
with HIDE_TABLEAM=true, which is mainly useful so regression tests can
be run against different AMs.  It's quite possible that this behaviour
still needs to be fine tuned.

For now it's not allowed to set a table AM for a partitioned table, as
we've not resolved how partitions would inherit that. Disallowing
allows us to introduce, if we decide that's the way forward, such a
behaviour without a compatibility break.

Catversion bumped, to add the heap table AM and references to it.

Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
    https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de
    https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 09:54:38 -08:00
Andres Freund 5963b29e03 Initialize variable to silence compiler warning.
After ff11e7f4b9 Tom's compiler warns about accessing a potentially
uninitialized rInfo. That's not actually possible, but it's
understandable the compiler would get this wrong. NULL initialize too.

Reported-By: Tom Lane
Discussion: https://postgr.es/m/11199.1551285318@sss.pgh.pa.us
2019-02-27 09:14:34 -08:00
Michael Paquier 414a9d3cf3 Fix memory leak when inserting tuple at relation creation for CTAS
The leak has been introduced by 763f2ed which has addressed the problem
for transient tables, and forgot CREATE TABLE AS which shares a similar
logic when receiving a new tuple to store into the newly-created
relation.

Author: Jeff Janes
Discussion: https://postgr.es/m/CAMkU=1xZXtz3mziPEPD2Fubbas4G2RWkZm5HHABtfKVcbu1=Sg@mail.gmail.com
2019-02-27 14:14:06 +09:00
Andres Freund ff11e7f4b9 Use slots in trigger infrastructure, except for the actual invocation.
In preparation for abstracting table storage, convert trigger.c to
track tuples in slots. Which also happens to make code calling
triggers simpler.

As the calling interface for triggers themselves is not changed in
this patch, HeapTuples still are extracted from the slot at that
time. But that's handled solely inside trigger.c, not visible to
callers. It's quite likely that we'll want to revise the external
trigger interface, but that's a separate large project.

As part of this work the slots used for old/new/return tuples are
moved from EState into ResultRelInfo, as different updated tables
might need different slots. The slots are now also now created
on-demand, which is good both from an efficiency POV, but also makes
the modifying code simpler.

Author: Andres Freund, Amit Khandekar and Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26 20:31:38 -08:00
Andres Freund b8d71745ea Store table oid and tuple's tid in tuple slots directly.
After the introduction of tuple table slots all table AMs need to
support returning the table oid of the tuple stored in a slot created
by said AM. It does not make sense to re-implement that in every AM,
therefore move handling of table OIDs into the TupleTableSlot
structure itself.  It's possible that we, at a later date, might want
to get rid of HeapTupleData.t_tableOid entirely, but doing so before
the abstractions for table AMs are integrated turns out to be too
hard, so delay that for now.

Similarly, every AM needs to support the concept of a tuple
identifier (tid / item pointer) for its tuples. It's quite possible
that we'll generalize the exact form of a tid at a future point (to
allow for things like index organized tables), but for now many parts
of the code know about tids, so there's not much point in abstracting
tids away. Therefore also move into slot (rather than providing API to
set/get the tid associated with the tuple in a slot).

Once table AM includes insert/updating/deleting tuples, the
responsibility to set the correct tid after such an action will move
into that. After that change, code doing such modifications, should
not have to deal with HeapTuples directly anymore.

Author: Andres Freund, Haribabu Kommi and Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26 20:31:16 -08:00
Peter Eisentraut bc09d5e4cc Remove unnecessary use of PROCEDURAL
Remove some unnecessary, legacy-looking use of the PROCEDURAL keyword
before LANGUAGE.  We mostly don't use this anymore, so some of these
look a bit old.

There is still some use in pg_dump, which is harder to remove because
it's baked into the archive format, so I'm not touching that.

Discussion: https://www.postgresql.org/message-id/2330919b-62d9-29ac-8de3-58c024fdcb96@2ndquadrant.com
2019-02-25 08:38:59 +01:00
Robert Haas 1bb5e78218 Move code for managing PartitionDescs into a new file, partdesc.c
This is similar in spirit to the existing partbounds.c file in the
same directory, except that there's a lot less code in the new file
created by this commit.  Pending work in this area proposes to add a
bunch more code related to PartitionDescs, though, and this will give
us a good place to put it.

Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
2019-02-21 11:45:02 -05:00
Tom Lane a32ca78836 Fix CREATE VIEW to allow zero-column views.
We should logically have allowed this case when we allowed zero-column
tables, but it was overlooked.

Although this might be thought a feature addition, it's really a bug
fix, because it was possible to create a zero-column view via
the convert-table-to-view code path, and then you'd have a situation
where dump/reload would fail.  Hence, back-patch to all supported
branches.

Arrange the added test cases to provide coverage of the related
pg_dump code paths (since these views will be dumped and reloaded
during the pg_upgrade regression test).  I also made them test
the case where pg_dump has to postpone the view rule into post-data,
which disturbingly had no regression coverage before.

Report and patch by Ashutosh Sharma (test case by me)

Discussion: https://postgr.es/m/CAE9k0PkmHdeSaeZt2ujnb_cKucmK3sDDceDzw7+d5UZoNJPYOg@mail.gmail.com
2019-02-17 12:37:31 -05:00
Tom Lane 1d92a0c9f7 Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:

1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents.  Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation.  (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension.  I'm not sure if any less-artificial
cases exist.)

2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.

Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies.  This is
cosmetic to users but causes testing problems.

To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted.  To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.

To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes.  (They behave identically except for having different
priorities for being cited in error messages.)  This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.

This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.

Catversion bump because the contents of pg_depend change for
partitioning relationships.

Patch HEAD only.  It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.

Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 14:41:17 -05:00
Peter Eisentraut 256fc004af Adjust gratuitously different error message wording 2019-02-11 14:01:05 +01:00
Alvaro Herrera cb90de1aac Fix trigger drop procedure
After commit 123cc697a8, we remove redundant FK action triggers during
partition ATTACH by merely deleting the catalog tuple, but that's wrong:
it should use performDeletion() instead.  Repair, and make the comments
more explicit.

Per code review from Tom Lane.

Discussion: https://postgr.es/m/18885.1549642539@sss.pgh.pa.us
2019-02-10 10:00:11 -03:00
Tom Lane 1fb57af920 Create the infrastructure for planner support functions.
Rename/repurpose pg_proc.protransform as "prosupport".  The idea is
still that it names an internal function that provides knowledge to
the planner about the behavior of the function it's attached to;
but redesign the API specification so that it's not limited to doing
just one thing, but can support an extensible set of requests.

The original purpose of simplifying a function call is handled by
the first request type to be invented, SupportRequestSimplify.
Adjust all the existing transform functions to handle this API,
and rename them fron "xxx_transform" to "xxx_support" to reflect
the potential generalization of what they do.  (Since we never
previously provided any way for extensions to add transform functions,
this change doesn't create an API break for them.)

Also add DDL and pg_dump support for attaching a support function to a
user-defined function.  Unfortunately, DDL access has to be restricted
to superusers, at least for now; but seeing that support functions
will pretty much have to be written in C, that limitation is just
theoretical.  (This support is untested in this patch, but a follow-on
patch will add cases that exercise it.)

Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
2019-02-09 18:08:48 -05:00
Andres Freund 171e0418b0 Fix heap_getattr() handling of fast defaults.
Previously heap_getattr() returned NULL for attributes with a fast
default value (c.f. 16828d5c02), as it had no handling whatsoever
for that case.

A previous fix, 7636e5c60f, attempted to fix issues caused by this
oversight, but just expanding OLD tuples for triggers doesn't actually
solve the underlying issue.

One known consequence of this bug is that the check for HOT updates
can return the wrong result, when a previously fast-default'ed column
is set to NULL. Which in turn means that an index over a column with
fast default'ed columns might be corrupt if the underlying column(s)
allow NULLs.

Fix by handling fast default columns in heap_getattr(), remove now
superfluous expansion in GetTupleForTrigger().

Author: Andres Freund
Discussion: https://postgr.es/m/20190201162404.onngi77f26baem4g@alap3.anarazel.de
Backpatch: 11, where fast defaults were introduced
2019-02-06 01:09:32 -08:00
Tom Lane 5f5c014590 Allow RECORD and RECORD[] to be specified in function coldeflists.
We can't allow these pseudo-types to be used as table column types,
because storing an anonymous record value in a table would result
in data that couldn't be understood by other sessions.  However,
it seems like there's no harm in allowing the case in a column
definition list that's specifying what a function-returning-record
returns.  The data involved is all local to the current session,
so we should be just as able to resolve its actual tuple type as
we are for the function-returning-record's top-level tuple output.

Elvis Pranskevichus, with cosmetic changes by me

Discussion: https://postgr.es/m/11038447.kQ5A9Uj5xi@hammer.magicstack.net
2019-01-30 19:25:33 -05:00
Tom Lane f09346a9c6 Refactor planner's header files.
Create a new header optimizer/optimizer.h, which exposes just the
planner functions that can be used "at arm's length", without need
to access Paths or the other planner-internal data structures defined
in nodes/relation.h.  This is intended to provide the whole planner
API seen by most of the rest of the system; although FDWs still need
to use additional stuff, and more thought is also needed about just
what selfuncs.c should rely on.

The main point of doing this now is to limit the amount of new
#include baggage that will be needed by "planner support functions",
which I expect to introduce later, and which will be in relevant
datatype modules rather than anywhere near the planner.

This commit just moves relevant declarations into optimizer.h from
other header files (a couple of which go away because everything
got moved), and adjusts #include lists to match.  There's further
cleanup that could be done if we want to decide that some stuff
being exposed by optimizer.h doesn't belong in the planner at all,
but I'll leave that for another day.

Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
2019-01-29 15:48:51 -05:00
Tom Lane a1b8c41e99 Make some small planner API cleanups.
Move a few very simple node-creation and node-type-testing functions
from the planner's clauses.c to nodes/makefuncs and nodes/nodeFuncs.
There's nothing planner-specific about them, as evidenced by the
number of other places that were using them.

While at it, rename and_clause() etc to is_andclause() etc, to clarify
that they are node-type-testing functions not node-creation functions.
And use "static inline" implementations for the shortest ones.

Also, modify flatten_join_alias_vars() and some subsidiary functions
to take a Query not a PlannerInfo to define the join structure that
Vars should be translated according to.  They were only using the
"parse" field of the PlannerInfo anyway, so this just requires removing
one level of indirection.  The advantage is that now parse_agg.c can
use flatten_join_alias_vars() without the horrid kluge of creating an
incomplete PlannerInfo, which will allow that file to be decoupled from
relation.h in a subsequent patch.

Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
2019-01-29 15:26:44 -05:00
Tomas Vondra 36a1281f86 Separate per-batch and per-tuple memory contexts in COPY
In batching mode, COPY was using the same (per-tuple) memory context for
allocations with longer lifetime. This was confusing but harmless, until
commit 31f3817402 added COPY FROM ... WHERE feature, introducing a risk
of memory leak.

The "per-tuple" memory context was reset only when starting new batch,
but as the rows may be filtered out by the WHERE clauses, that may not
happen at all.  The WHERE clause however has to be evaluated for all
rows, before filtering them out.

This commit separates the per-tuple and per-batch contexts, removing the
ambiguity.  Expressions (both defaults and WHERE clause) are evaluated
in the per-tuple context, while tuples are formed in the batch context.
This allows resetting the contexts at appropriate times.

The main complexity is related to partitioning, in which case we need to
reset the batch context after forming the tuple (which happens before
routing to leaf partition).  Instead of switching between two contexts
as before, we simply copy the last tuple aside, reset the context and
then copy the tuple back.  The performance impact is negligible, and
juggling with two contexts is not free either.

Discussion: https://www.postgresql.org/message-id/flat/CALAY4q_DdpWDuB5-Zyi-oTtO2uSk8pmy+dupiRe3AvAc++1imA@mail.gmail.com
2019-01-29 00:00:47 +01:00
Andres Freund a9c35cf85c Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays.  For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.

Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.

Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win.  It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.

Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.

Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.

Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.

This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.

Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 14:17:52 -08:00
Michael Paquier c9b75c5838 Simplify restriction handling of two-phase commit for temporary objects
There were two flags used to track the access to temporary tables and
to the temporary namespace of a session which are used to restrict
PREPARE TRANSACTION, however the first control flag is a concept
included in the second.  This removes the flag for temporary table
tracking, keeping around only the one at namespace level.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20190118053126.GH1883@paquier.xyz
2019-01-26 10:45:23 +09:00
Heikki Linnakangas a5be6e9a1d Fix comments that claimed that mblen() only looks at first byte.
GB18030's mblen() function looks at the first and the second byte of the
multibyte character, to determine its length. copy.c had made the
assumption that mblen() only looks at the first byte, but it turns out to
work out fine, because of the way the GB18030 encoding works. COPY will
see a 4-byte encoded character as two 2-byte encoded characters, which is
enough for COPY's purposes. It cannot mix those up with delimiter or
escaping characters, because only single-byte ASCII characters are
supported as delimiters or escape characters.

Discussion: https://www.postgresql.org/message-id/7704d099-9643-2a55-fb0e-becd64400dcb%40iki.fi
2019-01-25 14:54:38 +02:00
Peter Eisentraut 7c079d7417 Allow generalized expression syntax for partition bounds
Previously, only literals were allowed.  This change allows general
expressions, including functions calls, which are evaluated at the
time the DDL command is executed.

Besides offering some more functionality, it simplifies the parser
structures and removes some inconsistencies in how the literals were
handled.

Author: Kyotaro Horiguchi, Tom Lane, Amit Langote
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/9f88b5e0-6da2-5227-20d0-0d7012beaa1c@lab.ntt.co.jp/
2019-01-25 11:28:49 +01:00