Commit Graph

4117 Commits

Author SHA1 Message Date
Robert Haas 7cdcc747a9 Update neglected comment.
Commit d986d4e87f renamed a variable
but neglected to update the corresponding comment.

Amit Langote
2019-12-19 09:24:44 -05:00
Michael Paquier e1551f96e6 Refactor attribute mappings used in logical tuple conversion
Tuple conversion support in tupconvert.c is able to convert rowtypes
between two relations, inner and outer, which are logically equivalent
but have a different ordering or even dropped columns (used mainly for
inheritance tree and partitions).  This makes use of attribute mappings,
which are simple arrays made of AttrNumber elements with a length
matching the number of attributes of the outer relation.  The length of
the attribute mapping has been treated as completely independent of the
mapping itself until now, making it easy to pass down an incorrect
mapping length.

This commit refactors the code related to attribute mappings and moves
it into an independent facility called attmap.c, extracted from
tupconvert.c.  This merges the attribute mapping with its length,
avoiding to try to guess what is the length of a mapping to use as this
is computed once, when the map is built.

This will avoid mistakes like what has been fixed in dc816e58, which has
used an incorrect mapping length by matching it with the number of
attributes of an inner relation (a child partition) instead of an outer
relation (a partitioned table).

Author: Michael Paquier
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/20191121042556.GD153437@paquier.xyz
2019-12-18 16:23:02 +09:00
Tom Lane b925a00f4e Fix "force_parallel_mode = regress" to work with ANALYZE + VERBOSE.
force_parallel_mode = regress is supposed to force use of a Gather
node without having any impact on EXPLAIN output.  But it failed to
accomplish that if both ANALYZE and VERBOSE are given, because that
enables per-worker output data that you wouldn't see if the Gather
hadn't been inserted.  Improve the logic so that we suppress the
per-worker data too.

This allows putting the new test case added by commit 5935917ce
back into the originally intended form (cf. 776a2c887, 22864f6e0).
We can also get rid of a kluge in subselect.sql, which previously
had to clean up after force_parallel_mode's failure to do what it
said on the tin.

Discussion: https://postgr.es/m/18445.1576177309@sss.pgh.pa.us
2019-12-16 20:14:35 -05:00
Tom Lane 6ef77cf46e Further adjust EXPLAIN's choices of table alias names.
This patch causes EXPLAIN to always assign a separate table alias to the
parent RTE of an append relation (inheritance set); before, such RTEs
were ignored if not actually scanned by the plan.  Since the child RTEs
now always have that same alias to start with (cf. commit 55a1954da),
the net effect is that the parent RTE usually gets the alias used or
implied by the query text, and the children all get that alias with "_N"
appended.  (The exception to "usually" is if there are duplicate aliases
in different subtrees of the original query; then some of those original
RTEs will also have "_N" appended.)

This results in more uniform output for partitioned-table plans than
we had before: the partitioned table itself gets the original alias,
and all child tables have aliases with "_N", rather than the previous
behavior where one of the children would get an alias without "_N".

The reason for giving the parent RTE an alias, even if it isn't scanned
by the plan, is that we now use the parent's alias to qualify Vars that
refer to an appendrel output column and appear above the Append or
MergeAppend that computes the appendrel.  But below the append, Vars
refer to some one of the child relations, and are displayed that way.
This seems clearer than the old behavior where a Var that could carry
values from any child relation was displayed as if it referred to only
one of them.

While at it, change ruleutils.c so that the code paths used by EXPLAIN
deal in Plan trees not PlanState trees.  This effectively reverts a
decision made in commit 1cc29fe7c, which seemed like a good idea at
the time to make ruleutils.c consistent with explain.c.  However,
it's problematic because we'd really like to allow executor startup
pruning to remove all the children of an append node when possible,
leaving no child PlanState to resolve Vars against.  (That's not done
here, but will be in the next patch.)  This requires different handling
of subplans and initplans than before, but is otherwise a pretty
straightforward change.

Discussion: https://postgr.es/m/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp
2019-12-11 17:05:18 -05:00
Peter Eisentraut 105eb360f2 Remove ATPrepSetStatistics
It was once possible to do ALTER TABLE ... SET STATISTICS on system
tables without allow_sytem_table_mods.  This was changed apparently by
accident between PostgreSQL 9.1 and 9.2, but a code comment still
claimed this was possible.  Without that functionality, having a
separate ATPrepSetStatistics() is useless, so use the generic
ATSimplePermissions() instead and move the remaining custom code into
ATExecSetStatistics().

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/cc8d2648-a0ec-7a86-13e5-db473484e19e%402ndquadrant.com
2019-12-11 09:04:04 +01:00
Etsuro Fujita 5a20b0219e Fix handling of multiple AFTER ROW triggers on a foreign table.
AfterTriggerExecute() retrieves a fresh tuple or pair of tuples from a
tuplestore and then stores the tuple(s) in the passed-in slot(s) if
AFTER_TRIGGER_FDW_FETCH, while it uses the most-recently-retrieved
tuple(s) stored in the slot(s) if AFTER_TRIGGER_FDW_REUSE.  This was
done correctly before 12, but commit ff11e7f4b broke it by mistakenly
clearing the tuple(s) stored in the slot(s) in that function, leading to
an assertion failure as reported in bug #16139 from Alexander Lakhin.

Also, fix some other issues with the aforementioned commit in passing:

* For tg_newslot, which is a slot added to the TriggerData struct by the
  commit to store new updated tuples, it didn't ensure the slot was NULL
  if there was no such tuple.
* The commit failed to update the documentation about the trigger
  interface.

Author: Etsuro Fujita
Backpatch-through: 12
Discussion: https://postgr.es/m/16139-94f9ccf0db6119ec%40postgresql.org
2019-12-10 18:00:30 +09:00
Peter Eisentraut d4feadeca1 Add error position to an error message
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/6e7aa4a1-be6a-1a75-b1f9-83a678e5184a@2ndquadrant.com
2019-11-29 09:10:17 +01:00
Tom Lane d3aa114ac4 Doc: improve discussion of race conditions involved in LISTEN.
The user docs didn't really explain how to use LISTEN safely,
so clarify that.  Also clean up some fuzzy-headed explanations
in comments.  No code changes.

Discussion: https://postgr.es/m/3ac7f397-4d5f-be8e-f354-440020675694@gmail.com
2019-11-24 18:03:39 -05:00
Tom Lane 7900269724 Stabilize NOTIFY behavior by transmitting notifies before ReadyForQuery.
This patch ensures that, if any notify messages were received during
a just-finished transaction, they get sent to the frontend just before
not just after the ReadyForQuery message.  With libpq and other client
libraries that act similarly, this guarantees that the client will see
the notify messages as available as soon as it thinks the transaction
is done.

This probably makes no difference in practice, since in realistic
use-cases the application would have to cope with asynchronous
arrival of notify events anyhow.  However, it makes it a lot easier
to build cross-session-notify test cases with stable behavior.
I'm a bit surprised now that we've not seen any buildfarm instability
with the test cases added by commit b10f40bf0.  Tests that I intend
to add in an upcoming bug fix are definitely unstable without this.

Back-patch to 9.6, which is as far back as we can do NOTIFY testing
with the isolationtester infrastructure.

Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
2019-11-24 14:42:59 -05:00
Tom Lane 8b7ae5a82d Stabilize the results of pg_notification_queue_usage().
This function wasn't touched in commit 51004c717, but that turns out
to be a bad idea, because its results now include any dead space
that exists in the NOTIFY queue on account of our being lazy about
advancing the queue tail.  Notably, the isolation tests now fail
if run twice without a server restart between, because async-notify's
first test of the function will already show a positive value.
It seems likely that end users would be equally unhappy about the
result's instability.  To fix, just make the function call
asyncQueueAdvanceTail before computing its result.  That should end
in producing the same value as before, and it's hard to believe that
there's any practical use-case where pg_notification_queue_usage()
is called so often as to create a performance degradation, especially
compared to what we did before.

Out of paranoia, also mark this function parallel-restricted (it
was volatile, but parallel-safe by default, before).  Although the
code seems to work fine when run in a parallel worker, that's outside
the design scope of async.c, and it's a bit scary to have intentional
side-effects happening in a parallel worker.  There seems no plausible
use-case where it'd be important to try to parallelize this, so let's
not take any risk of introducing new bugs.

In passing, re-pgindent async.c and run reformat-dat-files on
pg_proc.dat, just because I'm a neatnik.

Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
2019-11-24 14:09:33 -05:00
Joe Conway f7a2002e82 Add object TRUNCATE hook
All operations with acl permissions checks should have a corresponding hook
so that, for example, mandatory access control (MAC) may be enforced by an
extension. The command TRUNCATE is missing this hook, so add it. Patch by
Yuli Khodorkovskiy with some editorialization by me. Based on the discussion
not back-patched. A separate patch will exercise the hook in the sepgsql
extension.

Author: Yuli Khodorkovskiy
Reviewed-by: Joe Conway
Discussion: https://postgr.es/m/CAFL5wJcomybj1Xdw7qWmPJRpGuFukKgNrDb6uVBaCMgYS9dkaA%40mail.gmail.com
2019-11-23 10:39:20 -05:00
Peter Eisentraut 2e4db241bf Remove configure --disable-float4-byval
This build option was only useful to maintain compatibility for
version-0 functions, but those are no longer supported, so this option
can be removed.

float4 is now always pass-by-value; the pass-by-reference code path is
completely removed.

Discussion: https://www.postgresql.org/message-id/flat/f3e1e576-2749-bbd7-2d57-3f9dcf75255a@2ndquadrant.com
2019-11-21 18:29:21 +01:00
Fujii Masao e6d8069522 Make DROP DATABASE command generate less WAL records.
Previously DROP DATABASE generated as many XLOG_DBASE_DROP WAL records
as the number of tablespaces that the database to drop uses. This caused
the scans of shared_buffers as many times as the number of the tablespaces
during recovery because WAL replay of one XLOG_DBASE_DROP record needs
that full scan. This could make the recovery time longer especially
when shared_buffers is large.

This commit changes DROP DATABASE so that it generates only one
XLOG_DBASE_DROP record, and registers the information of all the tablespaces
into it. Then, WAL replay of XLOG_DBASE_DROP record needs full scan of
shared_buffers only once, and which may improve the recovery performance.

Author: Fujii Masao
Reviewed-by: Kirk Jamison, Simon Riggs
Discussion: https://postgr.es/m/CAHGQGwF8YwNH0ZaL+2wjZPkj+ji9UhC+Z4ScnG97WKtVY5L9iw@mail.gmail.com
2019-11-21 21:10:37 +09:00
Fujii Masao 30840c92ac Allow ALTER VIEW command to rename the column in the view.
ALTER TABLE RENAME COLUMN command always can be used to rename the column
in the view, but it's reasonable to add that syntax to ALTER VIEW too.

Author: Fujii Masao
Reviewed-by: Ibrar Ahmed, Yu Kimura
Discussion: https://postgr.es/m/CAHGQGwHoQMD3b-MqTLcp1MgdhCpOKU7QNRwjFooT4_d+ti5v6g@mail.gmail.com
2019-11-21 19:55:13 +09:00
Michael Paquier 1bbd608fda Split handling of reloptions for partitioned tables
Partitioned tables do not have relation options yet, but, similarly to
what's done for views which have their own parsing table, it could make
sense to introduce new parameters for some of the existing default ones
like fillfactor, autovacuum, etc.  Splitting things has the advantage to
make the information stored in rd_options include only the necessary
information, reducing the amount of memory used for a relcache entry
with partitioned tables if new reloptions are introduced at this level.

Author:  Nikolay Shaplov
Reviewed-by: Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/1627387.Qykg9O6zpu@x200m
2019-11-14 12:34:28 +09:00
Tom Lane 0cafdd03a8 Fix silly initializations (cosmetic only).
Initializing a pointer to "false" isn't per project style,
and reportedly some compilers warn about it (though I've
not seen any such warnings in the buildfarm).

Seems to have come in with commit ff11e7f4b, so back-patch
to v12 where that was added.

Didier Gautheron

Discussion: https://postgr.es/m/CAJRYxu+XQuM0qnSqt1Ujztu6fBPzMMAT3VEn6W32rgKG6A2Fsw@mail.gmail.com
2019-11-13 15:26:54 -05:00
Amit Kapila 1379fd537f Introduce the 'force' option for the Drop Database command.
This new option terminates the other sessions connected to the target
database and then drop it.  To terminate other sessions, the current user
must have desired permissions (same as pg_terminate_backend()).  We don't
allow to terminate the sessions if prepared transactions, active logical
replication slots or subscriptions are present in the target database.

Author: Pavel Stehule with changes by me
Reviewed-by: Dilip Kumar, Vignesh C, Ibrar Ahmed, Anthony Nowocien,
Ryan Lambert and Amit Kapila
Discussion: https://postgr.es/m/CAP_rwwmLJJbn70vLOZFpxGw3XD7nLB_7+NKz46H5EOO2k5H7OQ@mail.gmail.com
2019-11-13 08:25:33 +05:30
Alvaro Herrera dcb7d3cafa Have LookupFuncName accept NULL argtypes for 0 args
Prior to this change, it requires to be passed a valid pointer just to
be able to pass it to a zero-byte memcmp, per 0a52d378b0.  Given the
strange resulting code in callsites, it seems better to test for the
case specifically and remove the requirement.

Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/MN2PR18MB2927F24692485D754794F01BE3740@MN2PR18MB2927.namprd18.prod.outlook.com
Discussion: https://postgr.es/m/MN2PR18MB2927F6873DF2774A505AC298E3740@MN2PR18MB2927.namprd18.prod.outlook.com
2019-11-12 17:06:58 -03:00
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Amit Kapila 9fab25c6cd Rearrange dropdb() to avoid errors after allowing other sessions to exit.
During Drop Database, it is better to error out before allowing other
sessions to exit and forcefully terminating autovacuum workers.  All the
other errors except for checking subscriptions are already done before.

Author: Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1+qhLkCYG2oy9xug9ur_j=G2wQNRYAyd+-kZfZ1z42pLw@mail.gmail.com
2019-11-11 07:42:45 +05:30
Alvaro Herrera b4bcc6bfdf Fix SET CONSTRAINTS .. DEFERRED on partitioned tables
SET CONSTRAINTS ... DEFERRED failed on partitioned tables, because of a
sanity check that ensures that the affected constraints have triggers.
On partitioned tables, the triggers are in the leaf partitions, not in
the partitioned relations themselves, so the sanity check fails.
Removing the sanity check solves the problem, because the code needed to
support the case is already there.

Backpatch to 11.

Note: deferred unique constraints are not affected by this bug, because
they do have triggers in the parent partitioned table.  I did not add a
test for this scenario.

Discussion: https://postgr.es/m/20191105212915.GA11324@alvherre.pgsql
2019-11-07 13:59:24 -03:00
Andres Freund 01368e5d9d Split all OBJS style lines in makefiles into one-line-per-entry style.
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.

By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.

Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05 14:41:07 -08:00
Tom Lane a30531c5c8 Fix "unexpected relkind" error when denying permissions on toast tables.
get_relkind_objtype, and hence get_object_type, failed when applied to a
toast table.  This is not a good thing, because it prevents reporting of
perfectly legitimate permissions errors.  (At present, these functions
are in fact *only* used to determine the ObjectType argument for
acl_error() calls.)  It seems best to have them fall back to returning
OBJECT_TABLE in every case where they can't determine an object type
for a pg_class entry, so do that.

In passing, make some edits to alter.c to make it more obvious that
those calls of get_object_type() are used only for error reporting.
This might save a few cycles in the non-error code path, too.

Back-patch to v11 where this issue originated.

John Hsu, Michael Paquier, Tom Lane

Discussion: https://postgr.es/m/C652D3DF-2B0C-4128-9420-FB5379F6B1E4@amazon.com
2019-11-05 13:40:37 -05:00
Michael Paquier dc816e5815 Fix failure when creating cloned indexes for a partition
When using CREATE TABLE for a new partition, the partitioned indexes of
the parent are created automatically in a fashion similar to LIKE
INDEXES.  The new partition and its parent use a mapping for attribute
numbers for this operation, and while the mapping was correctly built,
its length was defined as the number of attributes of the newly-created
child, and not the parent.  If the parent includes dropped columns, this
could cause failures.

This is wrong since 8b08f7d which has introduced the concept of
partitioned indexes, so backpatch down to 11.

Reported-by: Wyatt Alt
Author: Michael Paquier
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/CAGem3qCcRmhbs4jYMkenYNfP2kEusDXvTfw-q+eOhM0zTceG-g@mail.gmail.com
Backpatch-through: 11
2019-11-02 14:16:04 +09:00
Peter Eisentraut 604bd36711 PG_FINALLY
This gives an alternative way of catching exceptions, for the common
case where the cleanup code is the same in the error and non-error
cases.  So instead of

    PG_TRY();
    {
        ... code that might throw ereport(ERROR) ...
    }
    PG_CATCH();
    {
        cleanup();
	PG_RE_THROW();
    }
    PG_END_TRY();
    cleanup();

one can write

    PG_TRY();
    {
        ... code that might throw ereport(ERROR) ...
    }
    PG_FINALLY();
    {
        cleanup();
    }
    PG_END_TRY();

Discussion: https://www.postgresql.org/message-id/flat/95a822c3-728b-af0e-d7e5-71890507ae0c%402ndquadrant.com
2019-11-01 11:18:03 +01:00
Michael Paquier 8270a0d9a9 Handle interrupts within a transaction context in REINDEX CONCURRENTLY
Phases 2 (building the new index) and 3 (validating the new index)
checked for interrupts outside a transaction context, having as
consequence to not release session-level locks taken on the parent
relation and the old and new indexes processed.  This could for example
be triggered with statement_timeout and a bad timing, and would issue
confusing error messages when shutting down the session still holding
the locks (note that an assertion failure would be triggered first), on
top of more issues with concurrent sessions trying to take a lock that
would interfere with the SHARE UPDATE EXCLUSIVE locks hold here.

This moves all the interruption checks inside a transaction context.
Note that I have manually tested all interruptions to make sure that
invalid indexes can be cleaned up properly.  Partition indexes still
have issues on their own with some missing dependency handling, which
will be dealt with in a follow-up patch.

Reported-by: Justin Pryzby
Author: Michael Paquier
Discussion: https://postgr.es/m/20191013025145.GC4475@telsasoft.com
Backpatch-through: 12
2019-10-25 10:20:08 +09:00
Michael Paquier 5d3500da72 Acquire properly session-level lock on new index in REINDEX CONCURRENTLY
In the first transaction run for REINDEX CONCURRENTLY, a thinko in the
existing logic caused two session locks to be taken on the old index,
causing the session lock on the newly-created index to be missed.  This
made possible concurrent DDL commands (like ALTER INDEX) on the new
index while REINDEX CONCURRENTLY was processing from the point where the
first internal transaction committed.

This issue has been discovered while digging into another bug.

Author: Michael Paquier
Discussion: https://postgr.es/m/20191021074323.GB1869@paquier.xyz
Backpatch-through: 12
2019-10-23 15:04:48 +09:00
Michael Paquier f25968c496 Remove last traces of heap_open/close in the tree
Since pluggable storage has been introduced, those two routines have
been replaced by table_open/close, with some compatibility macros still
present to allow extensions to compile correctly with v12.

Some code paths using the old routines still remained, so replace them.
Based on the discussion done, the consensus reached is that it is better
to remove those compatibility macros so as nothing new uses the old
routines, so remove also the compatibility macros.

Discussion: https://postgr.es/m/20191017014706.GF5605@paquier.xyz
2019-10-19 11:18:15 +09:00
Alvaro Herrera 89403ed228 Fix typo
Apparently while this code was being developed,
ReindexRelationConcurrently operated on multiple relations.  The version
that was ultimately pushed doesn't, so this comment's use of plural is
inaccurate.
2019-10-18 14:49:39 +02:00
Thomas Munro 6bda2af039 Fix bug that could try to freeze running multixacts.
Commits 801c2dc7 and 801c2dc7 made it possible for vacuum to
try to freeze a multixact that is still running.  That was
prevented by a check, but raised an error.  Repair.

Back-patch all the way.

Author: Nathan Bossart, Jeremy Schneider
Reported-by: Jeremy Schneider
Reviewed-by: Jim Nasby, Thomas Munro
Discussion: https://postgr.es/m/DAFB8AFF-2F05-4E33-AD7F-FF8B0F760C17%40amazon.com
2019-10-17 09:59:21 +13:00
Alvaro Herrera 0d21f919eb Fix crash when reporting CREATE INDEX progress
A race condition can make us try to dereference a NULL pointer to the
PGPROC struct of a process that's already finished.  That results in
crashes during REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY.

This was introduced in ab0dfc961b, so backpatch to pg12.

Reported by: Justin Pryzby
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/20191012004446.GT10470@telsasoft.com
2019-10-16 14:51:34 +02:00
Michael Paquier 1df5875d39 Fix dependency handling of column drop with partitioned tables
When dropping a column on a partitioned table which has one or more
partitioned indexes, the operation was failing as dependencies with
partitioned indexes using the column dropped were not getting removed in
a way consistent with the columns involved across all the relations part
of an inheritance tree.

This commit refactors the code executing column drop so as all the
columns from an inheritance tree to remove are gathered first, and
dropped all at the end.  This way, we let the dependency machinery sort
out by itself the deletion of all the columns with the partitioned
indexes across a partition tree.

This issue has been introduced by 1d92a0c, so backpatch down to
REL_12_STABLE.

Author: Amit Langote, Michael Paquier
Reviewed-by: Álvaro Herrera, Ashutosh Sharma
Discussion: https://postgr.es/m/CA+HiwqE9kuBsZ3b5pob2-cvE8ofzPWs-og+g8bKKGnu6b4-yTQ@mail.gmail.com
Backpatch-through: 12
2019-10-13 17:51:55 +09:00
Andres Freund 93765bd956 Fix table rewrites that include a column without a default.
In c2fe139c20 I made ATRewriteTable() use tuple slots. Unfortunately
I did not notice that columns can be added in a rewrite that do not
have a default, when another column is added/altered requiring one.

Initialize columns to NULL again, and add tests.

Bug: #16038
Reported-By: anonymous
Author: Andres Freund
Discussion: https://postgr.es/m/16038-5c974541f2bf6749@postgresql.org
Backpatch: 12, where the bug was introduced in c2fe139c20
2019-10-09 22:00:50 -07:00
Andres Freund d986d4e87f Fix crash caused by EPQ happening with a before update trigger present.
When ExecBRUpdateTriggers()'s GetTupleForTrigger() follows an EPQ
chain the former needs to run the result tuple through the junkfilter
again, and update the slot containing the new version of the tuple to
contain that new version. The input tuple may already be in the
junkfilter's output slot, which used to be OK - we don't need the
previous version anymore. Unfortunately ff11e7f4b9 started to use
ExecCopySlot() to update newslot, and ExecCopySlot() doesn't support
copying a slot into itself, leading to a slot in a corrupt
state, which then can cause crashes or other symptoms.

Fix this by skipping the ExecCopySlot() when copying into itself.

While we could have easily made ExecCopySlot() handle that case, it
seems better to add an assert forbidding doing so instead. As the goal
of copying might be to make the contents of one slot independent from
another, it seems failure prone to handle doing so silently.

A follow-up commit will add tests for the obviously under-covered
combination of EPQ and triggers. Done as a separate commit as it might
make sense to backpatch them further than this bug.

Also remove confusion with confusing variable names for slots in
ExecBRDeleteTriggers() and ExecBRUpdateTriggers().

Bug: #16036
Reported-By: Антон Власов
Author: Andres Freund
Discussion: https://postgr.es/m/16036-28184c90d952fb7f@postgresql.org
Backpatch: 12-, where ff11e7f4b9 was merged
2019-10-04 13:50:49 -07:00
Robert Haas 967e276e9f Remove AtSubStart_Notify.
Allocate notify-related state lazily instead. This makes trivial
subtransactions noticeably faster.

Patch by me, reviewed and tested by Dilip Kumar, Kyotaro Horiguchi,
and Jeevan Ladhe.

Discussion: https://postgr.es/m/CA+TgmobE1J22S1eC-6N-je9LgrcwZypkwp+zH6JXo9mc=4Nk3A@mail.gmail.com
2019-10-04 08:19:25 -04:00
Andres Freund 3f6b3be39c Silence -Wmaybe-uninitialized compiler warnings in dbcommands.c.
When compiling postgres using gcc -O3, there are false-positive
warnings about the now initialized variables. Silence them.

Author: Peter Eisentraut, Andres Freund
Discussion: https://postgr.es/m/15fb2350-b8b8-e188-278f-0b34fdee5210@2ndquadrant.com
2019-09-27 14:14:30 -07:00
Alvaro Herrera 773df883e8 Support reloptions of enum type
All our current in core relation options of type string (not many,
admittedly) behave in reality like enums.  But after seeing an
implementation for enum reloptions, it's clear that strings are messier,
so introduce the new reloption type.  Switch all string options to be
enums instead.

Fortunately we have a recently introduced test module for reloptions, so
we don't lose coverage of string reloptions, which may still be used by
third-party modules.

Authors: Nikolay Shaplov, Álvaro Herrera
Reviewed-by: Nikita Glukhov, Aleksandr Parfenov
Discussion: https://postgr.es/m/43332102.S2V5pIjXRx@x200m
2019-09-25 15:56:52 -03:00
Tom Lane 51004c7172 Make some efficiency improvements in LISTEN/NOTIFY.
Move the responsibility for advancing the NOTIFY queue tail pointer
from the listener(s) to the notification sender, and only have the
sender do it once every few queue pages, rather than after every batch
of notifications as at present.  This reduces the number of times we
execute asyncQueueAdvanceTail, and reduces contention when there are
multiple listeners (since that function requires exclusive lock).
This change relies on the observation that we don't really need the tail
pointer to be exactly up-to-date.  It's certainly not necessary to
attempt to release disk space more often than once per SLRU segment.
The only other usage of the tail pointer is that an incoming listener,
if it's the only listener in its database, will need to scan the queue
forward from the tail; but that's surely a less performance-critical
path than routine sending and receiving of notifies.  We compromise by
advancing the tail pointer after every 4 pages of output, so that it
shouldn't get more than a few pages behind.

Also, when sending signals to other backends after adding notify
message(s) to the queue, recognize that only backends in our own
database are going to care about those messages, so only such
backends really need to be awakened promptly.  Backends in other
databases should get kicked if they're well behind on reading the
queue, else they'll hold back the global tail pointer; but wakening
them for every single message is pointless.  This change can
substantially reduce signal traffic if listeners are spread among
many databases.  It won't help for the common case of only a single
active database, but the extra check costs very little.

Martijn van Oosterhout, with some adjustments by me

Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
Discussion: https://postgr.es/m/CADWG95uFj8rLM52Er80JnhRsTbb_AqPP1ANHS8XQRGbqLrU+jA@mail.gmail.com
2019-09-22 11:46:29 -04:00
Alvaro Herrera d1b0007639 Fix progress report of REINDEX INDEX
I (Álvaro) broke that in commit 6212276e43 -- forgot to set the
necessary flag.  Repair.

Author: Amit Langote
Discussion: https://postgr.es/m/CA+HiwqEaM2tV5awKhP1vSbgjQe_uXVU15Oi4sTgwgempwMiT8g@mail.gmail.com
2019-09-20 12:56:00 -03:00
Alvaro Herrera 6212276e43 Fix progress reporting of CLUSTER / VACUUM FULL
The progress state was being clobbered once the first index completed
being rebuilt, causing the final phases of the operation not show
anything in the progress view.  This was inadvertently broken in
03f9e5cba0, which added progress tracking for REINDEX.

(The reason this bugfix is this small is that I had already noticed this
problem when writing monitoring for CREATE INDEX, and had already worked
around it, as can be seen in discussion starting at
https://postgr.es/m/20190329150218.GA25010@alvherre.pgsql Fixing the
problem is just a matter of fixing one place touched by the REINDEX
monitoring.)

Reported by: Álvaro Herrera
Author: Álvaro Herrera
Discussion: https://postgr.es/m/20190801184333.GA21369@alvherre.pgsql
2019-09-13 14:54:26 -03:00
Tomas Vondra d06215d03b Allow setting statistics target for extended statistics
When building statistics, we need to decide how many rows to sample and
how accurate the resulting statistics should be. Until now, it was not
possible to explicitly define statistics target for extended statistics
objects, the value was always computed from the per-attribute targets
with a fallback to the system-wide default statistics target.

That's a bit inconvenient, as it ties together the statistics target set
for per-column and extended statistics. In some cases it may be useful
to require larger sample / higher accuracy for extended statics (or the
other way around), but with this approach that's not possible.

So this commit introduces a new command, allowing to specify statistics
target for individual extended statistics objects, overriding the value
derived from per-attribute targets (and the system default).

  ALTER STATISTICS stat_name SET STATISTICS target_value;

When determining statistics target for an extended statistics object we
first look at this explicitly set value. When this value is -1, we fall
back to the old formula, looking at the per-attribute targets first and
then the system default. This means the behavior is backwards compatible
with older PostgreSQL releases.

Author: Tomas Vondra
Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development
Reviewed-by: Kirk Jamison, Dean Rasheed
2019-09-11 00:25:51 +02:00
Tom Lane bca6e64354 Reduce overhead of scanning the backend[] array in LISTEN/NOTIFY.
Up to now, async.c scanned its whole array of per-backend state
whenever it needed to find listening backends.  That's expensive
if MaxBackends is large, so extend the data structure with list
links that thread the active entries together.

A downside of this change is that asyncQueueUnregister (unregister
a listening backend at backend exit) now requires exclusive not shared
lock, and it can take awhile if there are many other listening
backends.  We could improve the latter issue by using a doubly- not
singly-linked list, but it's probably not worth the storage space;
typical usage patterns for LISTEN/NOTIFY have fairly long-lived
listeners.

In return for that, Exec_ListenPreCommit (initially register a
listening backend), SignalBackends, and asyncQueueAdvanceTail
get significantly faster when MaxBackends is much larger than
the number of listening backends.  If most of the potential
backend slots are listening, we don't win, but that's a case
where the actual interprocess-signal overhead is going to swamp
these considerations anyway.

Martijn van Oosterhout, hacked a bit more by me

Discussion: https://postgr.es/m/CADWG95vtRBFDdrx1JdT1_9nhOFw48KaeTev6F_LtDQAFVpSPhA@mail.gmail.com
2019-09-10 18:15:20 -04:00
Andres Freund 27cc7cd2bc Reorder EPQ work, to fix rowmark related bugs and improve efficiency.
In ad0bda5d24 I changed the EvalPlanQual machinery to store
substitution tuples in slot, instead of using plain HeapTuples. The
main motivation for that was that using HeapTuples will be inefficient
for future tableams.  But it turns out that that conversion was buggy
for non-locking rowmarks - the wrong tuple descriptor was used to
create the slot.

As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ
earlier, to allow to fetch the locked rows directly into the EPQ
slots, instead of having to copy tuples around. Unfortunately, as Tom
complained, that forces some expensive initialization to happen
earlier.

As a third issue, the test coverage for EPQ was clearly insufficient.

Fixing the first issue is unfortunately not trivial: Non-locked row
marks were fetched at the start of EPQ, and we don't have the type
information for the rowmarks available at that point. While we could
change that, it's not easy. It might be worthwhile to change that at
some point, but to fix this bug, it seems better to delay fetching
non-locking rowmarks when they're actually needed, rather than
eagerly. They're referenced at most once, and in cases where EPQ
fails, might never be referenced. Fetching them when needed also
increases locality a bit.

To be able to fetch rowmarks during execution, rather than
initialization, we need to be able to access the active EPQState, as
that contains necessary data. To do so move EPQ related data from
EState to EPQState, and, only for EStates creates as part of EPQ,
reference the associated EPQState from EState.

To fix the second issue, change EPQ initialization to allow use of
EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but
obviously still requiring EvalPlanQualInit() to have been done).

As these changes made struct EState harder to understand, e.g. by
adding multiple EStates, significantly reorder the members, and add a
lot more comments.

Also add a few more EPQ tests, including one that fails for the first
issue above. More is needed.

Reported-By: yi huang
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion:
    https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.com
    https://postgr.es/m/24530.1562686693@sss.pgh.pa.us
Backpatch: 12-, where the EPQ changes were introduced
2019-09-09 05:14:11 -07:00
Tom Lane db43831899 Avoid using INFO elevel for what are fundamentally debug messages.
Commit 6f6b99d13 stuck an INFO message into the fast path for
checking partition constraints, for no very good reason except
that it made it easy for the regression tests to verify that
that path was taken.  Assorted later patches did likewise,
increasing the unsuppressable-chatter level from ALTER TABLE
even more.  This isn't good for the user experience, so let's
drop these messages down to DEBUG1 where they belong.  So as
not to have a loss of test coverage, create a TAP test that
runs the relevant queries with client_min_messages = DEBUG1
and greps for the expected messages.

This testing method is a bit brute-force --- in particular,
it duplicates the execution of a fair amount of the core
create_table and alter_table tests.  We experimented with
other solutions, but running any significant amount of
standard testing with client_min_messages = DEBUG1 seems
to have a lot of output-stability pitfalls, cf commits
bbb96c370 and 5655565c0.  Possibly at some point we'll look
into whether we can reduce the amount of test duplication.

Backpatch into v12, because some of these messages are new
in v12 and we don't really want to ship it that way.

Sergei Kornilov

Discussion: https://postgr.es/m/81911511895540@web58j.yandex.ru
Discussion: https://postgr.es/m/4859321552643736@myt5-02b80404fd9e.qloud-c.yandex.net
2019-09-07 19:03:11 -04:00
Robert Haas 8b94dab066 Split tuptoaster.c into three separate files.
detoast.c/h contain functions required to detoast a datum, partially
or completely, plus a few other utility functions for examining the
size of toasted datums.

toast_internals.c/h contain functions that are used internally to the
TOAST subsystem but which (mostly) do not need to be accessed from
outside.

heaptoast.c/h contains code that is intrinsically specific to the
heap AM, either because it operates on HeapTuples or is based on the
layout of a heap page.

detoast.c and toast_internals.c are placed in
src/backend/access/common rather than src/backend/access/heap.  At
present, both files still have dependencies on the heap, but that will
be improved in a future commit.

Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro,
Andres Freund, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
2019-09-05 13:15:10 -04:00
Alvaro Herrera fe66125974 Remove 'msg' parameter from convert_tuples_by_name
The message was included as a parameter when this function was added in
dcb2bda9b7, but I don't think it has ever served any useful purpose.
Let's stop spreading it pointlessly.

Reviewed by Amit Langote and Peter Eisentraut.

Discussion: https://postgr.es/m/20190806224728.GA17233@alvherre.pgsql
2019-09-03 14:47:29 -04:00
Michael Paquier c96581abe4 Fix inconsistencies and typos in the tree, take 11
This fixes various typos in docs and comments, and removes some orphaned
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
2019-08-19 16:21:39 +09:00
Tom Lane 4d4c66addf Disallow changing an inherited column's type if not all parents changed.
If a table inherits from multiple unrelated parents, we must disallow
changing the type of a column inherited from multiple such parents, else
it would be out of step with the other parents.  However, it's possible
for the column to ultimately be inherited from just one common ancestor,
in which case a change starting from that ancestor should still be
allowed.  (I would not be excited about preserving that option, were
it not that we have regression test cases exercising it already ...)

It's slightly annoying that this patch looks different from the logic
with the same end goal in renameatt(), and more annoying that it
requires an extra syscache lookup to make the test.  However, the
recursion logic is quite different in the two functions, and a
back-patched bug fix is no place to be trying to unify them.

Per report from Manuel Rigger.  Back-patch to 9.5.  The bug exists in
9.4 too (and doubtless much further back); but the way the recursion
is done in 9.4 is a good bit different, so that substantial refactoring
would be needed to fix it in 9.4.  I'm disinclined to do that, or risk
introducing new bugs, for a bug that has escaped notice for this long.

Discussion: https://postgr.es/m/CA+u7OA4qogDv9rz1HAb-ADxttXYPqQdUdPY_yd4kCzywNxRQXA@mail.gmail.com
2019-08-18 17:11:57 -04:00
Tom Lane bb5ae8f6c4 Use a hash table to de-duplicate NOTIFY events faster.
Previously, async.c got rid of duplicate notifications by scanning
the list of pending events to compare each one to the proposed new
event.  This works okay for very small numbers of distinct events,
but degrades as O(N^2) for many events.  We can improve matters by
using a hash table to probe for duplicates.  So as not to add a
lot of overhead for the simple cases that the code did handle well
before, create the hash table only once a (sub)transaction has
queued more than 16 distinct notify events.

A downside is that we now have to do per-event work to propagate
a successful subtransaction's notify events up to its parent.
(But this isn't significant unless the subtransaction had many
events, in which case the O(N^2) behavior would have been in
play already, so we still come out ahead.)

We can make some lemonade out of this lemon, though: since we must
examine each event anyway, it's now possible to de-duplicate events
fully, rather than skipping that for events merged up from
subtransactions.  Hence, remove the old weasel wording in notify.sgml
about whether de-duplication happens or not, and adjust the test
case in async-notify.spec that exhibited the old behavior.

While at it, rearrange the definition of struct Notification to make
it more compact and require just one palloc per event, rather than
two or three.  This saves space when there are a lot of events,
in fact more than enough to buy back the space needed for the hash
table.

Patch by me, based on discussions around a different patch
submitted by Filip Rembiałkowski.

Discussion: https://postgr.es/m/17822.1564186806@sss.pgh.pa.us
2019-08-15 12:22:12 -04:00
Michael Paquier 66bde49d96 Fix inconsistencies and typos in the tree, take 10
This addresses some issues with unnecessary code comments, fixes various
typos in docs and comments, and removes some orphaned structures and
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
2019-08-13 13:53:41 +09:00
Tom Lane 5ee190f8ec Rationalize use of list_concat + list_copy combinations.
In the wake of commit 1cff1b95a, the result of list_concat no longer
shares the ListCells of the second input.  Therefore, we can replace
"list_concat(x, list_copy(y))" with just "list_concat(x, y)".

To improve call sites that were list_copy'ing the first argument,
or both arguments, invent "list_concat_copy()" which produces a new
list sharing no ListCells with either input.  (This is a bit faster
than "list_concat(list_copy(x), y)" because it makes the result list
the right size to start with.)

In call sites that were not list_copy'ing the second argument, the new
semantics mean that we are usually leaking the second List's storage,
since typically there is no remaining pointer to it.  We considered
inventing another list_copy variant that would list_free the second
input, but concluded that for most call sites it isn't worth worrying
about, given the relative compactness of the new List representation.
(Note that in cases where such leakage would happen, the old code
already leaked the second List's header; so we're only discussing
the size of the leak not whether there is one.  I did adjust two or
three places that had been troubling to free that header so that
they manually free the whole second List.)

Patch by me; thanks to David Rowley for review.

Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-08-12 11:20:18 -04:00
Tom Lane cabe0f298e Fix "ANALYZE t, t" inside a transaction block.
This failed with either "tuple already updated by self" or "duplicate
key value violates unique constraint", depending on whether the table
had previously been analyzed or not.  The reason is that ANALYZE tried
to insert or update the same pg_statistic rows twice, and there was no
CommandCounterIncrement between.  So add one.  The same case works fine
outside a transaction block, because then there's a whole transaction
boundary between, as a consequence of the way VACUUM works.

This issue has been latent all along, but the problem was unreachable
before commit 11d8d72c2 added the ability to specify multiple tables
in ANALYZE.  We could, perhaps, alternatively fix it by adding code to
de-duplicate the list of VacuumRelations --- but that would add a
lot of overhead to work around dumb commands, so it's not attractive.

Per bug #15946 from Yaroslav Schekin.  Back-patch to v11.

(Note: in v11 I also back-patched the test added by commit 23224563d;
otherwise the problem doesn't manifest in the test I added, because
"vactst" is empty when the tests for multiple ANALYZE targets are
reached.  That seems like not a very good thing anyway, so I did this
rather than rethinking the choice of test case.)

Discussion: https://postgr.es/m/15946-5c7570a2884a26cf@postgresql.org
2019-08-10 11:30:11 -04:00
Michael Paquier 8548ddc61b Fix inconsistencies and typos in the tree, take 9
This addresses more issues with code comments, variable names and
unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
2019-08-05 12:14:58 +09:00
Andres Freund 870b1d6800 Remove superfluous newlines in function prototypes.
These were introduced by pgindent due to fixe to broken
indentation (c.f. 8255c7a5ee). Previously the mis-indentation of
function prototypes was creatively used to reduce indentation in a few
places.

As that formatting only exists in master and REL_12_STABLE, it seems
better to fix it in both, rather than having some odd indentation in
v12 that somebody might copy for future patches or such.

Author: Andres Freund
Discussion: https://postgr.es/m/20190728013754.jwcbe5nfyt3533vx@alap3.anarazel.de
Backpatch: 12-
2019-07-31 00:05:21 -07:00
Tomas Vondra 14ef15a222 Don't build extended statistics on inheritance trees
When performing ANALYZE on inheritance trees, we collect two samples for
each relation - one for the relation alone, and one for the inheritance
subtree (relation and its child relations). And then we build statistics
on each sample, so for each relation we get two sets of statistics.

For regular (per-column) statistics this works fine, because the catalog
includes a flag differentiating statistics built from those two samples.
But we don't have such flag in the extended statistics catalogs, and we
ended up updating the same row twice, triggering this error:

  ERROR:  tuple already updated by self

The simplest solution is to disable extended statistics on inheritance
trees, which is what this commit is doing. In the future we may need to
do something similar to per-column statistics, but that requires adding a
flag to the catalog - and that's not backpatchable. Moreover, the current
selectivity estimation code only works with individual relations, so
building statistics on inheritance trees would be pointless anyway.

Author: Tomas Vondra
Backpatch-to: 10-
Discussion: https://postgr.es/m/20190618231233.GA27470@telsasoft.com
Reported-by: Justin Pryzby
2019-07-30 19:47:33 +02:00
Michael Paquier eb43f3d193 Fix inconsistencies and typos in the tree
This is numbered take 8, and addresses again a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/b137b5eb-9c95-9c2f-586e-38aba7d59788@gmail.com
2019-07-29 12:28:30 +09:00
Michael Paquier 7cce159349 Fix handling of expressions and predicates in REINDEX CONCURRENTLY
When copying the definition of an index rebuilt concurrently for the new
entry, the index information was taken directly from the old index using
the relation cache.  In this case, predicates and expressions have
some post-processing to prepare things for the planner, which loses some
information including the collations added in any of them.

This inconsistency can cause issues when attempting for example a table
rewrite, and makes the new indexes rebuilt concurrently inconsistent
with the old entries.

In order to fix the problem, fetch expressions and predicates directly
from the catalog of the old entry, and fill in IndexInfo for the new
index with that.  This makes the process more consistent with
DefineIndex(), and the code is refactored with the addition of a routine
to create an IndexInfo node.

Reported-by: Manuel Rigger
Author: Michael Paquier
Discussion: https://postgr.es/m/CA+u7OA5Hp0ra235F3czPom_FyAd-3+XwSJmX95r1+sRPOJc9VQ@mail.gmail.com
Backpatch-through: 12
2019-07-29 09:58:49 +09:00
Alvaro Herrera 5562272a42 Check that partitions are not in use when dropping constraints
If the user creates a deferred constraint in a partition, and in a
transaction they cause the constraint's trigger execution to be deferred
until commit time *and* drop the constraint, then when commit time comes
the queued trigger will fail to run because the trigger object will have
been dropped.

This is explained because when a constraint gets dropped in a
partitioned table, the recursion to drop the ones in partitions is done
by the dependency mechanism, not by ALTER TABLE traversing the recursion
tree as in all other cases.  In the non-partitioned case, this problem
is avoided by checking that the table is not "in use" by alter-table;
other alter-table subcommands that recurse to partitions do that check
for each partition.  But the dependency mechanism doesn't have a way to
do that.  Fix the problem by applying the same check to all partitions
during ALTER TABLE's "prep" phase, which correctly raises the necessary
error.

Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Discussion: https://postgr.es/m/CAKcux6nZiO9-eEpr1ZD84bT1mBoVmeZkfont8iSpcmYrjhGWgA@mail.gmail.com
2019-07-23 17:22:15 -04:00
Peter Eisentraut 06140c201b Add CREATE DATABASE LOCALE option
This sets both LC_COLLATE and LC_CTYPE with one option.  Similar
behavior is already supported in initdb, CREATE COLLATION, and
createdb.

Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
Discussion: https://www.postgresql.org/message-id/flat/d9d5043a-dc70-da8a-0166-1e218e6e34d4%402ndquadrant.com
2019-07-23 14:47:24 +02:00
Tom Lane a0555ddab9 Install dependencies to prevent dropping partition key columns.
The logic in ATExecDropColumn that rejects dropping partition key
columns is quite an inadequate defense, because it doesn't execute
in cases where a column needs to be dropped due to cascade from
something that only the column, not the whole partitioned table,
depends on.  That leaves us with a badly broken partitioned table;
even an attempt to load its relcache entry will fail.

We really need to have explicit pg_depend entries that show that the
column can't be dropped without dropping the whole table.  Hence,
add those entries.  In v12 and HEAD, bump catversion to ensure that
partitioned tables will have such entries.  We can't do that in
released branches of course, so in v10 and v11 this patch affords
protection only to partitioned tables created after the patch is
installed.  Given the lack of field complaints (this bug was found
by fuzz-testing not by end users), that's probably good enough.

In passing, fix ATExecDropColumn and ATPrepAlterColumnType
messages to be more specific about which partition key column
they're complaining about.

Per report from Manuel Rigger.  Back-patch to v10 where partitioned
tables were added.

Discussion: https://postgr.es/m/CA+u7OA4JKCPFrdrAbOs7XBiCyD61XJxeNav4LefkSmBLQ-Vobg@mail.gmail.com
Discussion: https://postgr.es/m/31920.1562526703@sss.pgh.pa.us
2019-07-22 14:55:40 -04:00
Peter Eisentraut 19781729f7 Make identity sequence management more robust
Some code could get confused when certain catalog state involving both
identity and serial sequences was present, perhaps during an attempt
to upgrade the latter to the former.  Specifically, dropping the
default of a serial column maintains the ownership of the sequence by
the column, and so it would then be possible to afterwards make the
column an identity column that would now own two sequences.  This
causes the code that looks up the identity sequence to error out,
making the new identity column inoperable until the ownership of the
previous sequence is released.

To fix this, make the identity sequence lookup only consider sequences
with the appropriate dependency type for an identity sequence, so it
only ever finds one (unless something else is broken).  In the above
example, the old serial sequence would then be ignored.  Reorganize
the various owned-sequence-lookup functions a bit to make this
clearer.

Reported-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://www.postgresql.org/message-id/flat/470c54fc8590be4de0f41b0d295fd6390d5e8a6c.camel@cybertec.at
2019-07-22 12:07:10 +02:00
Michael Paquier 23bccc823d Fix inconsistencies and typos in the tree
This is numbered take 7, and addresses a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/dff75442-2468-f74f-568c-6006e141062f@gmail.com
2019-07-22 10:01:50 +09:00
Tomas Vondra a63378a03e Use column collation for extended statistics
The current extended statistics code was a bit confused which collation
to use.  When building the statistics, the collations defined as default
for the data types were used (since commit 5e0928005).  The MCV code was
however using the column collations for MCV serialization, and then
DEFAULT_COLLATION_OID when computing estimates. So overall the code was
using all three possible options, inconsistently.

This uses the column colation everywhere - this makes it consistent with
what 5e0928005 did for regular stats.  We however do not track the
collations in a catalog, because we can derive them from column-level
information.  This may need to change in the future, e.g. after allowing
statistics on expressions.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
Backpatch-to: 12
2019-07-20 16:37:37 +02:00
Tom Lane d97b714a21 Avoid using lcons and list_delete_first where it's easy to do so.
Formerly, lcons was about the same speed as lappend, but with the new
List implementation, that's not so; with a long List, data movement
imposes an O(N) cost on lcons and list_delete_first, but not lappend.

Hence, invent list_delete_last with semantics parallel to
list_delete_first (but O(1) cost), and change various places to use
lappend and list_delete_last where this can be done without much
violence to the code logic.

There are quite a few places that construct result lists using lcons not
lappend.  Some have semantic rationales for that; I added comments about
it to a couple that didn't have them already.  In many such places though,
I think the coding is that way only because back in the dark ages lcons
was faster than lappend.  Hence, switch to lappend where this can be done
without causing semantic changes.

In ExecInitExprRec(), this results in aggregates and window functions that
are in the same plan node being executed in a different order than before.
Generally, the executions of such functions ought to be independent of
each other, so this shouldn't result in visibly different query results.
But if you push it, as one regression test case does, you can show that
the order is different.  The new order seems saner; it's closer to
the order of the functions in the query text.  And we never documented
or promised anything about this, anyway.

Also, in gistfinishsplit(), don't bother building a reverse-order list;
it's easy now to iterate backwards through the original list.

It'd be possible to go further towards removing uses of lcons and
list_delete_first, but it'd require more extensive logic changes,
and I'm not convinced it's worth it.  Most of the remaining uses
deal with queues that probably never get long enough to be worth
sweating over.  (Actually, I doubt that any of the changes in this
patch will have measurable performance effects either.  But better
to have good examples than bad ones in the code base.)

Patch by me, thanks to David Rowley and Daniel Gustafsson for review.

Discussion: https://postgr.es/m/21272.1563318411@sss.pgh.pa.us
2019-07-17 11:15:34 -04:00
Tom Lane 3093eb2b83 Fix thinko in construction of old_conpfeqop list.
This should lappend the OIDs, not lcons them; the existing code produced
a list in reversed order.  This is harmless for single-key FKs or FKs
where all the key columns are of the same type, which probably explains
how it went unnoticed.  But if those conditions are not met,
ATAddForeignKeyConstraint would make the wrong decision about whether an
existing FK needs to be revalidated.  I think it would almost always err
in the safe direction by revalidating a constraint that didn't need it.
You could imagine scenarios where the pfeqop check was fooled by
swapping the types of two FK columns in one ALTER TABLE, but that case
would probably be rejected by other tests, so it might be impossible to
get to the worst-case scenario where an FK should be revalidated and
isn't.  (And even then, it's likely to be fine, unless there are weird
inconsistencies in the equality behavior of the replacement types.)
However, this is a performance bug at least.

Noted while poking around to see whether lcons calls could be converted
to lappend.

This bug is old, dating to commit cb3a7c2b9, so back-patch to all
supported branches.
2019-07-16 18:17:47 -04:00
Tom Lane 4c3d05d875 Remove dead code.
These memory context switches are useless in the wake of commit
1cff1b95a.  Noted by Jesper Pedersen.

Discussion: https://postgr.es/m/f078ce63-9e04-0f3e-d200-d7ee66279abe@redhat.com
2019-07-15 23:27:13 -04:00
Tom Lane 1cff1b95ab Represent Lists as expansible arrays, not chains of cons-cells.
Originally, Postgres Lists were a more or less exact reimplementation of
Lisp lists, which consist of chains of separately-allocated cons cells,
each having a value and a next-cell link.  We'd hacked that once before
(commit d0b4399d8) to add a separate List header, but the data was still
in cons cells.  That makes some operations -- notably list_nth() -- O(N),
and it's bulky because of the next-cell pointers and per-cell palloc
overhead, and it's very cache-unfriendly if the cons cells end up
scattered around rather than being adjacent.

In this rewrite, we still have List headers, but the data is in a
resizable array of values, with no next-cell links.  Now we need at
most two palloc's per List, and often only one, since we can allocate
some values in the same palloc call as the List header.  (Of course,
extending an existing List may require repalloc's to enlarge the array.
But this involves just O(log N) allocations not O(N).)

Of course this is not without downsides.  The key difficulty is that
addition or deletion of a list entry may now cause other entries to
move, which it did not before.

For example, that breaks foreach() and sister macros, which historically
used a pointer to the current cons-cell as loop state.  We can repair
those macros transparently by making their actual loop state be an
integer list index; the exposed "ListCell *" pointer is no longer state
carried across loop iterations, but is just a derived value.  (In
practice, modern compilers can optimize things back to having just one
loop state value, at least for simple cases with inline loop bodies.)
In principle, this is a semantics change for cases where the loop body
inserts or deletes list entries ahead of the current loop index; but
I found no such cases in the Postgres code.

The change is not at all transparent for code that doesn't use foreach()
but chases lists "by hand" using lnext().  The largest share of such
code in the backend is in loops that were maintaining "prev" and "next"
variables in addition to the current-cell pointer, in order to delete
list cells efficiently using list_delete_cell().  However, we no longer
need a previous-cell pointer to delete a list cell efficiently.  Keeping
a next-cell pointer doesn't work, as explained above, but we can improve
matters by changing such code to use a regular foreach() loop and then
using the new macro foreach_delete_current() to delete the current cell.
(This macro knows how to update the associated foreach loop's state so
that no cells will be missed in the traversal.)

There remains a nontrivial risk of code assuming that a ListCell *
pointer will remain good over an operation that could now move the list
contents.  To help catch such errors, list.c can be compiled with a new
define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents
whenever that could possibly happen.  This makes list operations
significantly more expensive so it's not normally turned on (though it
is on by default if USE_VALGRIND is on).

There are two notable API differences from the previous code:

* lnext() now requires the List's header pointer in addition to the
current cell's address.

* list_delete_cell() no longer requires a previous-cell argument.

These changes are somewhat unfortunate, but on the other hand code using
either function needs inspection to see if it is assuming anything
it shouldn't, so it's not all bad.

Programmers should be aware of these significant performance changes:

* list_nth() and related functions are now O(1); so there's no
major access-speed difference between a list and an array.

* Inserting or deleting a list element now takes time proportional to
the distance to the end of the list, due to moving the array elements.
(However, it typically *doesn't* require palloc or pfree, so except in
long lists it's probably still faster than before.)  Notably, lcons()
used to be about the same cost as lappend(), but that's no longer true
if the list is long.  Code that uses lcons() and list_delete_first()
to maintain a stack might usefully be rewritten to push and pop at the
end of the list rather than the beginning.

* There are now list_insert_nth...() and list_delete_nth...() functions
that add or remove a list cell identified by index.  These have the
data-movement penalty explained above, but there's no search penalty.

* list_concat() and variants now copy the second list's data into
storage belonging to the first list, so there is no longer any
sharing of cells between the input lists.  The second argument is
now declared "const List *" to reflect that it isn't changed.

This patch just does the minimum needed to get the new implementation
in place and fix bugs exposed by the regression tests.  As suggested
by the foregoing, there's a fair amount of followup work remaining to
do.

Also, the ENABLE_LIST_COMPAT macros are finally removed in this
commit.  Code using those should have been gone a dozen years ago.

Patch by me; thanks to David Rowley, Jesper Pedersen, and others
for review.

Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-07-15 13:41:58 -04:00
Thomas Munro b31fbe852c Warn if wal_level is too low when creating a publication.
Provide a hint to users that they need to increase wal_level before
subscriptions can work.

Author: Lucas Viecelli, with some adjustments by Thomas Munro
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAPjy-57rn5Y9g4e5u--eSOP-7P4QrE9uOZmT2ZcUebF8qxsYhg%40mail.gmail.com
2019-07-13 10:35:34 +12:00
Tom Lane b5810de3f4 Reduce memory consumption for multi-statement query strings.
Previously, exec_simple_query always ran parse analysis, rewrite, and
planning in MessageContext, allowing all the data generated thereby
to persist until the end of processing of the whole query string.
That's fine for single-command strings, but if a client sends many
commands in a single simple-Query message, this strategy could result
in annoying memory bloat, as complained of by Andreas Seltenreich.

To fix, create a child context to do this work in, and reclaim it
after each command.  But we only do so for parsetrees that are not
last in their query string.  That avoids adding any memory management
overhead for the typical case of a single-command string.  Memory
allocated for the last parsetree would be freed immediately after
finishing the command string anyway.

Similarly, adjust extension.c's execute_sql_string() to reclaim memory
after each command.  In that usage, multi-command strings are the norm,
so it's a bit surprising that no one has yet complained of bloat ---
especially since the bloat extended to whatever data ProcessUtility
execution might leak.

Amit Langote, reviewed by Julien Rouhaud

Discussion: https://postgr.es/m/87ftp6l2qr.fsf@credativ.de
2019-07-10 14:32:38 -04:00
David Rowley f7c830f1ab Fix missing calls to table_finish_bulk_insert during COPY, take 2
86b85044e abstracted calls to heap functions in COPY FROM to support a
generic table AM.  However, when performing a copy into a partitioned
table, this commit neglected to call table_finish_bulk_insert for each
partition.  Before 86b85044e, when we always called the heap functions,
there was no need to call heapam_finish_bulk_insert for partitions since
it only did any work when performing a copy without WAL.  For partitioned
tables, this was unsupported anyway, so there was no issue.  With
pluggable storage, we can't make any assumptions about what the table AM
might want to do in its equivalent function, so we'd better ensure we
always call table_finish_bulk_insert each partition that's received a row.

For now, we make the table_finish_bulk_insert call whenever we evict a
CopyMultiInsertBuffer out of the CopyMultiInsertInfo.  This does mean
that it's possible that we call table_finish_bulk_insert multiple times
per partition, which is not a problem other than being an inefficiency.
Improving this requires a more invasive patch, so let's leave that for
another day.

This also changes things so that we no longer needlessly call
table_finish_bulk_insert when performing a COPY FROM for a non-partitioned
table when not using multi-inserts.

Reported-by: Robert Haas
Backpatch-through: 12
Discussion: https://postgr.es/m/CA+TgmoYK=6BpxiJ0tN-p9wtH0BTAfbdxzHhwou0mdud4+BkYuQ@mail.gmail.com
2019-07-10 16:03:04 +12:00
Alvaro Herrera 2c84ea6cf9 Propagate trigger arguments to partitions
We were creating the cloned triggers with an empty list of arguments,
losing the ones that had been specified by the user when creating the
trigger in the partitioned table.  Repair.

This was forgotten in commit 86f575948c.

Author: Patrick McHardy
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/20190709130027.amr2cavjvo7rdvac@access1.trash.net
Discussion: https://postgr.es/m/15752-123bc90287986de4@postgresql.org
2019-07-09 17:16:36 -04:00
Tom Lane 0ab1a2e39b Remove dead encoding-conversion functions.
The code for conversions SQL_ASCII <-> MULE_INTERNAL and
SQL_ASCII <-> UTF8 was unreachable, because we long ago changed
the wrapper functions pg_do_encoding_conversion() et al so that
they have hard-wired behaviors for conversions involving SQL_ASCII.
(At least some of those fast paths date back to 2002, though it
looks like we may not have been totally consistent about this until
later.)  Given the lack of complaints, nobody is dissatisfied with
this state of affairs.  Hence, let's just remove the unreachable code.

Also, change CREATE CONVERSION so that it rejects attempts to
define such conversions.  Since we consider that SQL_ASCII represents
lack of knowledge about the encoding in use, such a conversion would
be semantically dubious even if it were reachable.

Adjust a couple of regression test cases that had randomly decided
to rely on these conversion functions rather than any other ones.

Discussion: https://postgr.es/m/41163.1559156593@sss.pgh.pa.us
2019-07-05 14:17:27 -04:00
David Rowley 8abc13a889 Use appendStringInfoString and appendPQExpBufferStr where possible
This changes various places where appendPQExpBuffer was used in places
where it was possible to use appendPQExpBufferStr, and likewise for
appendStringInfo and appendStringInfoString.  This is really just a
stylistic improvement, but there are also small performance gains to be
had from doing this.

Discussion: http://postgr.es/m/CAKJS1f9P=M-3ULmPvr8iCno8yvfDViHibJjpriHU8+SXUgeZ=w@mail.gmail.com
2019-07-04 13:01:13 +12:00
Tom Lane 9e1c9f9594 pgindent run prior to branching v12.
pgperltidy and reformat-dat-files too, though the latter didn't
find anything to change.
2019-07-01 12:37:52 -04:00
David Rowley f5db56fc4d Revert fix missing call to table_finish_bulk_insert during COPY
This reverts commits 4de60244e and b2d69806d. Further thought is
required to make this work properly.
2019-07-02 03:44:56 +12:00
David Rowley b2d69806d8 Remove surplus call to table_finish_bulk_insert
4de60244e added the call to table_finish_bulk_insert to the
CopyMultiInsertBufferCleanup function.  We use a CopyMultiInsertBuffer even
for non-partitioned tables, so having the cleanup do that meant we would
call table_finsh_bulk_insert twice when performing COPY FROM with
a non-partitioned table.

Here we can just remove the direct call in CopyFrom and let
CopyMultiInsertBufferCleanup handle the call instead.
2019-07-02 03:07:15 +12:00
David Rowley 4de60244e2 Fix missing call to table_finish_bulk_insert during COPY
86b85044e abstracted calls to heap functions in COPY FROM to support a
generic table AM.  However, when performing a copy into a partitioned
table, this commit neglected to call table_finish_bulk_insert for each
partition.  Before 86b85044e, when we always called the heap functions,
there was no need to call heapam_finish_bulk_insert for partitions since
it only did any work when performing a copy without WAL. For partitioned
tables, this was unsupported anyway, so there was no issue. With pluggable
storage, we can't make any assumptions about what the table AM might want
to do in its equivalent function, so we'd better ensure we always call
table_finish_bulk_insert each partition that's received a row.

For now, we make the table_finish_bulk_insert call whenever we evict a
CopyMultiInsertBuffer out of the CopyMultiInsertInfo.  This does mean
that it's possible that we call table_finish_bulk_insert multiple times
per partition, which is not a problem other than being an inefficiency.
Improving this requires a more invasive patch, so let's leave that for
another day.

In passing, move the table_finish_bulk_insert for the target of the COPY
command so that it's only called when we're actually performing bulk
inserts.  We don't need to call this when inserting 1 row at a time.

Reported-by: Robert Haas
Discussion: https://postgr.es/m/CA+TgmoYK=6BpxiJ0tN-p9wtH0BTAfbdxzHhwou0mdud4+BkYuQ@mail.gmail.com
2019-07-02 01:23:26 +12:00
Tom Lane 54100f5c60 Add an enforcement mechanism for global object names in regression tests.
In commit 18555b132 we tentatively established a rule that regression
tests should use names containing "regression" for databases, and names
starting with "regress_" for all other globally-visible object names, so
as to circumscribe the side-effects that "make installcheck" could have
on an existing installation.

This commit adds a simple enforcement mechanism for that rule: if the code
is compiled with ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS defined, it
will emit a warning (not an error) whenever a database, role, tablespace,
subscription, or replication origin name is created that doesn't obey the
rule.  Running one or more buildfarm members with that symbol defined
should be enough to catch new violations, at least in the regular
regression tests.  Most TAP tests wouldn't notice such warnings, but
that's actually fine because TAP tests don't execute against an existing
server anyway.

Since it's already the case that running src/test/modules/ tests in
installcheck mode is deprecated, we can use that as a home for tests
that seem unsafe to run against an existing server, such as tests that
might have side-effects on existing roles.  Document that (though this
commit doesn't in itself make it any less safe than before).

Update regress.sgml to define these restrictions more clearly, and
to clean up assorted lack-of-up-to-date-ness in its descriptions of
the available regression tests.

Discussion: https://postgr.es/m/16638.1468620817@sss.pgh.pa.us
2019-06-29 11:34:00 -04:00
Alvaro Herrera 23cccb17fe Fix for dropped columns in a partitioned table's default partition
We forgot to map column numbers to/from the default partition for
various operations, leading to valid cases failing with spurious
errors, such as
ERROR:  attribute N of type some_partition has been dropped

It was also possible that the search for conflicting rows in the default
partition when attaching another partition would fail to detect some.
Secondarily, it was also possible that such a search should be skipped
(because the constraint was implied) but wasn't.

Fix all this by mapping column numbers when necessary.

Reported by: Daniel Wilches
Author: Amit Langote
Discussion: https://postgr.es/m/15873-8c61945d6b3ef87c@postgresql.org
2019-06-28 14:51:08 -04:00
Tomas Vondra 69fd82fedd Update reference to sampling algorithm in analyze.c
Commit 83e176ec1 moved row sampling functions from analyze.c to
utils/misc/sampling.c, but failed to update comment referring to
the sampling algorithm from Jeff Vitter's paper. Correct the
comment by pointing to utils/misc/sampling.c.

Author: Etsuro Fujita
Discussion: https://postgr.es/m/CAPmGK154gp%2BQd%3DcorQOv%2BPmbyVyZBjp_%2Bhb766UJeD1e_ie6XQ%40mail.gmail.com
2019-06-27 18:01:54 +02:00
Alvaro Herrera 050098b14e Fix use-after-free introduced in 55ed3defc9
Evidenced by failure under RELCACHE_FORCE_RELEASE (buildfarm member
prion).

Author: Amit Langote
Discussion: https://postgr.es/m/CA+HiwqGV=k_Eh4jBiQw66ivvdG+EUkrEYeHTYL1SvDj_YOYV0g@mail.gmail.com
2019-06-27 11:57:10 -04:00
Alvaro Herrera 55ed3defc9 Fix partitioned index creation with foreign partitions
When a partitioned tables contains foreign tables as partitions, it is
not possible to implement unique or primary key indexes -- but when
regular indexes are created, there is no reason to do anything other
than ignoring such partitions.  We were raising errors upon encountering
the foreign partitions, which is unfriendly and doesn't protect against
any actual problems.

Relax this restriction so that index creation is allowed on partitioned
tables containing foreign partitions, becoming a no-op on them.  (We may
later want to redefine this so that the FDW is told to create the
indexes on the foreign side.)  This applies to CREATE INDEX, as well as
ALTER TABLE / ATTACH PARTITION and CREATE TABLE / PARTITION OF.

Backpatch to 11, where indexes on partitioned tables were introduced.

Discussion: https://postgr.es/m/15724-d5a58fa9472eef4f@postgresql.org
Author: Álvaro Herrera
Reviewed-by: Amit Langote
2019-06-26 18:38:51 -04:00
Tom Lane ccfcc8fdbd Purely-cosmetic adjustments in tablecmds.c.
Move ATExecAlterColumnGenericOptions away from where it was unthinkingly
dropped, in the middle of a lot of ALTER COLUMN TYPE code.  I don't have
any high principles about where to put it instead, so let's just put it
after ALTER COLUMN TYPE and before ALTER OWNER, matching existing
decisions about how to order related code stanzas.

Also add the minimal function header comment that the original author
was too cool to bother with.

Along the way, upgrade header comments for nearby ALTER COLUMN TYPE
functions.

Discussion: https://postgr.es/m/14787.1561403130@sss.pgh.pa.us
2019-06-24 17:19:37 -04:00
Tom Lane f946a40914 Further fix ALTER COLUMN TYPE's handling of indexes and index constraints.
This patch reverts all the code changes of commit e76de8861, which turns
out to have been seriously misguided.  We can't wait till later to compute
the definition string for an index; we must capture that before applying
the data type change for any column it depends on, else ruleutils.c will
deliverr wrong/misleading results.  (This fine point was documented
nowhere, of course.)

I'd also managed to forget that ATExecAlterColumnType executes once per
ALTER COLUMN TYPE clause, not once per statement; which resulted in the
code being basically completely broken for any case in which multiple ALTER
COLUMN TYPE clauses are applied to a table having non-constraint indexes
that must be rebuilt.  Through very bad luck, none of the existing test
cases nor the ones added by e76de8861 caught that, but of course it was
soon found in the field.

The previous patch also had an implicit assumption that if a constraint's
index had a dependency on a table column, so would the constraint --- but
that isn't actually true, so it didn't fix such cases.

Instead of trying to delete unneeded index dependencies later, do the
is-there-a-constraint lookup immediately on seeing an index dependency,
and switch to remembering the constraint if so.  In the unusual case of
multiple column dependencies for a constraint index, this will result in
duplicate constraint lookups, but that's not that horrible compared to all
the other work that happens here.  Besides, such cases did not work at all
before, so it's hard to argue that they're performance-critical for anyone.

Per bug #15865 from Keith Fiske.  As before, back-patch to all supported
branches.

Discussion: https://postgr.es/m/15865-17940eacc8f8b081@postgresql.org
2019-06-24 16:43:21 -04:00
Michael Paquier 20e1cc898d Rework some error strings for REINDEX CONCURRENTLY with system catalogs
This makes the whole user experience more consistent when bumping into
failures, and more in line with the rewording done via 508300e.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20190514153252.GA22168@alvherre.pgsql
2019-06-20 13:28:12 +09:00
Michael Paquier 3412030205 Fix more typos and inconsistencies in the tree
Author: Alexander Lakhin
Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
2019-06-17 16:13:16 +09:00
Tomas Vondra 6cbfb784c3 Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic.  That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.

Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics.  That would be a surprising behavior.

Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly.  This changed with introduction
of MCV lists, which do include most common combinations of values.

The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.

Bumped CATVERSION due to changing system catalog definitions.

Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-16 01:20:31 +02:00
Michael Paquier f43608bda2 Fix typos and inconsistencies in code comments
Author: Alexander Lakhin
Discussion: https://postgr.es/m/dec6aae8-2d63-639f-4d50-20e229fb83e3@gmail.com
2019-06-14 09:34:34 +09:00
Tom Lane e76de88615 Fix ALTER COLUMN TYPE failure with a partial exclusion constraint.
ATExecAlterColumnType failed to consider the possibility that an index
that needs to be rebuilt might be a child of a constraint that needs to be
rebuilt.  We missed this so far because usually a constraint index doesn't
have a direct dependency on its table, just on the constraint object.
But if there's a WHERE clause, then dependency analysis of the WHERE
clause results in direct dependencies on the column(s) mentioned in WHERE.
This led to trying to drop and rebuild both the constraint and its
underlying index.

In v11/HEAD, we successfully drop both the index and the constraint,
and then try to rebuild both, and of course the second rebuild hits a
duplicate-index-name problem.  Before v11, it fails with obscure messages
about a missing relation OID, due to trying to drop the index twice.

This is essentially the same kind of problem noted in commit
20bef2c31: the possible dependency linkages are broader than what
ATExecAlterColumnType was designed for.  It was probably OK when
written, but it's certainly been broken since the introduction of
partial exclusion constraints.  Fix by adding an explicit check
for whether any of the indexes-to-be-rebuilt belong to any of the
constraints-to-be-rebuilt, and ignoring any that do.

In passing, fix a latent bug introduced by commit 8b08f7d48: in
get_constraint_index() we must "continue" not "break" when rejecting
a relation of a wrong relkind.  This is harmless today because we don't
expect that code path to be taken anyway; but if there ever were any
relations to be ignored, the existing coding would have an extremely
undesirable dependency on the order of pg_depend entries.

Also adjust a couple of obsolete comments.

Per bug #15835 from Yaroslav Schekin.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/15835-32d9b7a76c06a7a9@postgresql.org
2019-06-12 12:29:39 -04:00
Andres Freund fff2a7d7bd Don't access catalogs to validate GUCs when not connected to a DB.
Vignesh found this bug in the check function for
default_table_access_method's check hook, but that was just copied
from older GUCs. Investigation by Michael and me then found the bug in
further places.

When not connected to a database (e.g. in a walsender connection), we
cannot perform (most) GUC checks that need database access. Even when
only shared tables are needed, unless they're
nailed (c.f. RelationCacheInitializePhase2()), they cannot be accessed
without pg_class etc. being present.

Fix by extending the existing IsTransactionState() checks to also
check for MyDatabaseOid.

Reported-By: Vignesh C, Michael Paquier, Andres Freund
Author: Vignesh C, Andres Freund
Discussion: https://postgr.es/m/CALDaNm1KXK9gbZfY-p_peRFm_XrBh1OwQO1Kk6Gig0c0fVZ2uw%40mail.gmail.com
Backpatch: 9.4-
2019-06-10 23:34:50 -07:00
Amit Kapila 92c4abc736 Fix assorted inconsistencies.
There were a number of issues in the recent commits which include typos,
code and comments mismatch, leftover function declarations.  Fix them.

Reported-by: Alexander Lakhin
Author: Alexander Lakhin, Amit Kapila and Amit Langote
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com
2019-06-08 08:16:38 +05:30
Alvaro Herrera a36c84c3e4 Fix default_tablespace usage for partitioned tables
In commit 87259588d0 I (Álvaro) tried to rationalize the determination
of tablespace to use for partitioned tables, but failed to handle the
default_tablespace case.  Repair and add proper tests.

Author: Amit Langote, Rushabh Lathia
Reported-by: Rushabh Lathia
Reviewed-by: Amit Langote, Álvaro Herrera
Discussion: https://postgr.es/m/CAGPqQf0cYjm1=rjxk_6gU0SjUS70=yFUAdCJLwWzh9bhNJnyVg@mail.gmail.com
2019-06-07 00:44:17 -04:00
David Rowley e24a815c1c Fix confusing NOTICE text in REINDEX CONCURRENTLY
When performing REINDEX TABLE CONCURRENTLY, if all of the table's indexes
could not be reindexed, a NOTICE message claimed that the table had no
indexes.  This was confusing, so let's change the NOTICE text to something
less confusing.

In passing, also mention in the comment before ReindexRelationConcurrently
that materialized views are supported too and also explain what the return
value of the function means.

Author: Ashwin Agrawal
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CALfoeithHvi13p_VyR8kt9o6Pa7Z=Smi6Nfc2anHnQx5Lj8bTQ@mail.gmail.com
2019-06-05 21:05:41 +12:00
David Rowley 56b3b38382 Fix incorrect index behavior in COPY FROM with partitioned tables
86b85044e rewrote how COPY FROM works to allow multiple tuple buffers to
exist to once thus allowing multi-inserts to be used in more cases with
partitioned tables.  That commit neglected to update the estate's
es_result_relation_info when flushing the insert buffer to the partition
making it possible for the index tuples to be added into an index on the
wrong partition.

Fix this and also add an Assert in ExecInsertIndexTuples to help ensure
that we never make this mistake again.

Reported-by: Haruka Takatsuka
Author: Ashutosh Sharma
Discussion: https://postgr.es/m/15832-b1bf336a4ee246b5@postgresql.org
2019-06-05 18:28:38 +12:00
Michael Paquier f7e954ad1c Rework code using list_delete_cell() in MergeAttributes
When merging two attributes, we are sure that at least one remains.
However, when deleting one element in the attribute list we may finish
with an empty list returned as NIL by list_delete_cell(), but the code
failed to track that, which is not project-like.  Adjust the call so as
we check for an empty list, and make use of it in an assertion.

This has been introduced by e7b3349, when adding support for CREATE
TABLE OF.

Author: Mark Dilger
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/CAE-h2TpPDqSWgOvfvSziOaMngMPwW+QZcmPpY8hQ_KOJ2+3hXQ@mail.gmail.com
2019-06-05 15:01:14 +09:00
Peter Eisentraut c880096dc1 Add command column to pg_stat_progress_create_index
This allows determining which command is running, similar to
pg_stat_progress_cluster.

Discussion: https://www.postgresql.org/message-id/flat/f0e56b3b-74b7-6cbc-e207-a5ed6bee18dc%402ndquadrant.com
2019-06-04 09:29:02 +02:00
Tom Lane eaf0292c3b Fix unsafe memory management in CloneRowTriggersToPartition().
It's not really supported to call systable_getnext() in a different
memory context than systable_beginscan() was called in, and it's
*definitely* not safe to do so and then reset that context between
calls.  I'm not very clear on how this code survived
CLOBBER_CACHE_ALWAYS testing ... but Alexander Lakhin found a case
that would crash it pretty reliably.

Per bug #15828.  Fix, and backpatch to v11 where this code came in.

Discussion: https://postgr.es/m/15828-f6ddd7df4852f473@postgresql.org
2019-06-03 16:59:26 -04:00
Alvaro Herrera a100974751 Fix typo in message
I introduced the typo in source code in the course of 75445c1515.
Repair.
2019-05-28 17:36:14 -04:00
Amit Kapila 9679345f3c Fix typos.
Reported-by: Alexander Lakhin
Author: Alexander Lakhin
Reviewed-by: Amit Kapila and Tom Lane
Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com
2019-05-26 18:28:18 +05:30
Andres Freund 73b8c3bd28 tableam: Rename wrapper functions to match callback names.
Some of the wrapper functions didn't match the callback names. Many of
them due to staying "consistent" with historic naming of the wrapped
functionality. We decided that for most cases it's more important to
be for tableam to be consistent going forward, than with the past.

The one exception is beginscan/endscan/...  because it'd have looked
odd to have systable_beginscan/endscan/... with a different naming
scheme, and changing the systable_* APIs would have caused way too
much churn (including breaking a lot of external users).

Author: Ashwin Agrawal, with some small additions by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CALfoeiugyrXZfX7n0ORCa4L-m834dzmaE8eFdbNR6PMpetU4Ww@mail.gmail.com
2019-05-23 16:32:36 -07:00
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Andres Freund fb504c5e4b Remove outdated comment in copy.c. 2019-05-19 20:47:54 -07:00
Alvaro Herrera 75445c1515 More message style fixes
Discussion: https://postgr.es/m/20190515183005.GA26486@alvherre.pgsql
2019-05-16 19:14:31 -04:00
Michael Paquier 508300e2e1 Improve and fix some error handling for REINDEX INDEX/TABLE CONCURRENTLY
This improves the user experience when it comes to restrict several
flavors of REINDEX CONCURRENTLY.  First, for INDEX, remove a restriction
on shared relations as we already check after catalog relations.  Then,
for TABLE, add a proper error message when attempting to run the command
on system catalogs.  The code path of CREATE INDEX CONCURRENTLY already
complains about that, but if a REINDEX is issued then then the error
generated is confusing.

While on it, add more tests to check restrictions on catalog indexes and
on toast table/index for catalogs.  Some error messages are improved,
with wording suggestion coming from Tom Lane.

Reported-by: Tom Lane
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/23694.1556806002@sss.pgh.pa.us
2019-05-10 08:18:46 +09:00
Tom Lane 2d7d946cd3 Clean up the behavior and API of catalog.c's is-catalog-relation tests.
The right way for IsCatalogRelation/Class to behave is to return true
for OIDs less than FirstBootstrapObjectId (not FirstNormalObjectId),
without any of the ad-hoc fooling around with schema membership.

The previous code was wrong because (1) it claimed that
information_schema tables were not catalog relations but their toast
tables were, which is silly; and (2) if you dropped and recreated
information_schema, which is a supported operation, the behavior
changed.  That's even sillier.  With this definition, "catalog
relations" are exactly the ones traceable to the postgres.bki data,
which seems like what we want.

With this simplification, we don't actually need access to the pg_class
tuple to identify a catalog relation; we only need its OID.  Hence,
replace IsCatalogClass with "IsCatalogRelationOid(oid)".  But keep
IsCatalogRelation as a convenience function.

This allows fixing some arguably-wrong semantics in contrib/sepgsql and
ReindexRelationConcurrently, which were using an IsSystemNamespace test
where what they really should be using is IsCatalogRelationOid.  The
previous coding failed to protect toast tables of system catalogs, and
also was not on board with the general principle that user-created tables
do not become catalogs just by virtue of being renamed into pg_catalog.
We can also get rid of a messy hack in ReindexMultipleTables.

While we're at it, also rename IsSystemNamespace to IsCatalogNamespace,
because the previous name invited confusion with the more expansive
semantics used by IsSystemRelation/Class.

Also improve the comments in catalog.c.

There are a few remaining places in replication-related code that are
special-casing OIDs below FirstNormalObjectId.  I'm inclined to think
those are wrong too, and if there should be any special case it should
just extend to FirstBootstrapObjectId.  But first we need to debate
whether a FOR ALL TABLES publication should include information_schema.

Discussion: https://postgr.es/m/21697.1557092753@sss.pgh.pa.us
Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-08 23:27:38 -04:00
Alvaro Herrera 61639816b8 Fix error messages
Some messages related to foreign servers were reporting the server name
without quotes, or not at all; our style is to have all names be quoted,
and the server name already appears quoted in a few other messages, so
just add quotes and make them all consistent.

Remove an extra "s" in other messages (typos introduced by myself in
f56f8f8da6).
2019-05-08 13:20:16 -04:00
Peter Eisentraut add85ead4a Fix table lock levels for REINDEX INDEX CONCURRENTLY
REINDEX CONCURRENTLY locks tables with ShareUpdateExclusiveLock rather
than the ShareLock used by a plain REINDEX.  However,
RangeVarCallbackForReindexIndex() was not updated for that and still
used the ShareLock only.  This would lead to lock upgrades later,
leading to possible deadlocks.

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de
2019-05-08 14:30:23 +02:00
Fujii Masao b84dbc8eb8 Add TRUNCATE parameter to VACUUM.
This commit adds new parameter to VACUUM command, TRUNCATE,
which specifies that VACUUM should attempt to truncate off
any empty pages at the end of the table and allow the disk space
for the truncated pages to be returned to the operating system.

This parameter, if specified, overrides the vacuum_truncate
reloption. If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Author: Fujii Masao
Reviewed-by: Julien Rouhaud, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com
2019-05-08 02:10:33 +09:00
Tom Lane 9691aa72e2 Fix style violations in syscache lookups.
Project style is to check the success of SearchSysCacheN and friends
by applying HeapTupleIsValid to the result.  A tiny minority of calls
creatively did it differently.  Bring them into line with the rest.

This is just cosmetic, since HeapTupleIsValid is indeed just a null
check at the moment ... but that may not be true forever, and in any
case it puts a mental burden on readers who may wonder why these
call sites are not like the rest.

Back-patch to v11 just to keep the branches in sync.  (The bulk of these
errors seem to have originated in v11 or v12, though a few are old.)

Per searching to see if anyplace else had made the same error
repaired in 62148c352.
2019-05-05 13:10:07 -04:00
Tom Lane 62148c3520 Add check for syscache lookup failure in update_relispartition().
Omitted in commit 05b38c7e6 (though it looks like the original blame
belongs to 9e9befac4).  A failure is admittedly unlikely, but if it
did happen, SIGSEGV is not the approved method of reporting it.

Per Coverity.  Back-patch to v11 where the broken code originated.
2019-05-05 12:44:32 -04:00
Alvaro Herrera 9f8b717a80 Message style fixes 2019-04-30 10:33:37 -04:00
Andres Freund 5c1560606d Fix several recently introduced issues around handling new relation forks.
Most of these stem from d25f519107 "tableam: relation creation, VACUUM
FULL/CLUSTER, SET TABLESPACE.".

1) To pass data to the relation_set_new_filenode()
   RelationSetNewRelfilenode() was made to update RelationData.rd_rel
   directly. That's not OK however, as it makes the relcache entries
   temporarily inconsistent. Which among other scenarios is a problem
   if a REINDEX targets an index on pg_class - the
   CatalogTupleUpdate() in RelationSetNewRelfilenode().  Presumably
   that was introduced because other places in the code do so - while
   those aren't "good practice" they don't appear to be actively
   buggy (e.g. because system tables may not be targeted).

   I (Andres) should have caught this while reviewing and signficantly
   evolving the code in that commit, mea culpa.

   Fix that by instead passing in the new RelFileNode as separate
   argument to relation_set_new_filenode() and rely on the relcache to
   update the catalog entry. Also revert that the
   RelationMapUpdateMap() call was changed to immediate, and undo some
   other more unnecessary changes.

2) Document that the relation_set_new_filenode cannot rely on the
   whole relcache entry to be valid. It might be worthwhile to
   refactor the code to never have to rely on that, but given the way
   heap_create() is currently coded, that'd be a large change.

3) ATExecSetTableSpace() shouldn't do FlushRelationBuffers() itself. A
   table AM might not use shared buffers at all. Move to
   index_copy_data() and heapam_relation_copy_data().

4) heapam_relation_set_new_filenode() previously sometimes accessed
   rel->rd_rel->relpersistence rather than the `persistence`
   argument. Code movement mistake.

5) Previously heapam_relation_set_new_filenode() re-opened the smgr
   relation to create the init for, if necesary. Instead have
   RelationCreateStorage() return the SMgrRelation and use it to
   create the init fork.

6) Add a note about the danger of modifying the relcache directly to
   ATExecSetTableSpace() - it's currently not a bug because there's a
   check ERRORing for catalog tables.

Regression tests and assertion improvements that together trigger the
bug described in 1) will be added in a later commit, as there is a
related bug on all branches.

Reported-By: Michael Paquier
Diagnosed-By: Tom Lane and Andres Freund
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
2019-04-29 19:28:05 -07:00
Tom Lane c01eb619a8 Apply stopgap fix for bug #15672.
Fix DefineIndex so that it doesn't attempt to pass down a to-be-reused
index relfilenode to a child index creation, and fix TryReuseIndex
to not think that reuse is sensible for a partitioned index.

In v11, this fixes a problem where ALTER TABLE on a partitioned table
could assign the same relfilenode to several different child indexes,
causing very nasty catalog corruption --- in fact, attempting to DROP
the partitioned table then leads not only to a database crash, but to
inability to restart because the same crash will recur during WAL replay.

Either of these two changes would be enough to prevent the failure, but
since neither action could possibly be sane, let's put in both changes
for future-proofing.

In HEAD, no such bug manifests, but that's just an accidental consequence
of having changed the pg_class representation of partitioned indexes to
have relfilenode = 0.  Both of these changes still seem like smart
future-proofing.

This is only a stop-gap because the code for ALTER TABLE on a partitioned
table with a no-op type change still leaves a great deal to be desired.
As the added regression tests show, it gets things wrong for comments on
child indexes/constraints, and it is regenerating child indexes it doesn't
have to.  However, fixing those problems will take more work which may not
get back-patched into v11.  We need a fix for the corruption problem now.

Per bug #15672 from Jianing Yang.

Patch by me, regression test cases based on work by Amit Langote,
who also did a lot of the investigative work.

Discussion: https://postgr.es/m/15672-b9fa7db32698269f@postgresql.org
2019-04-26 17:18:07 -04:00
Alvaro Herrera 05b38c7e63 Fix partitioned index attachment
When an existing index in a partition is attached to a new index on
its parent, we forgot to set the "relispartition" flag correctly, which
meant that it was not possible to find the index in various operations,
such as adding a foreign key constraint that references that partitioned
table.  One of four places that was assigning the parent index was
forgetting to do that, so fix by shifting responsibility of updating the
flag to the routine that changes the parent.

Author: Amit Langote, Álvaro Herrera
Reported-by: Hubert "depesz" Lubaczewski
Discussion: https://postgr.es/m/CA+HiwqHMsRtRYRWYTWavKJ8x14AFsv7bmAV46mYwnfD3vy8goQ@mail.gmail.com
2019-04-25 11:22:29 -04:00
Alvaro Herrera 87259588d0 Fix tablespace inheritance for partitioned rels
Commit ca4103025d left a few loose ends.  The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8, but some things remained:

* When ALTER TABLE rewrites tables, the indexes must remain in the
  tablespace they were originally in.  This didn't work because
  index recreation during ALTER TABLE runs manufactured SQL (yuck),
  which runs afoul of default_tablespace in competition with the parent
  relation tablespace.  To fix, reset default_tablespace to the empty
  string temporarily, and add the TABLESPACE clause as appropriate.

* Setting a partitioned rel's tablespace to the database default is
  confusing; if it worked, it would direct the partitions to that
  tablespace regardless of default_tablespace.  But in reality it does
  not work, and making it work is a larger project.  Therefore, throw
  an error when this condition is detected, to alert the unwary.

Add some docs and tests, too.

Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 10:31:32 -04:00
Andres Freund fdc7efcc30 Allow pg_class xid & multixid horizons to not be set.
This allows table AMs that don't need these horizons. This was already
documented in the tableam relation_set_new_filenode callback, but an
assert prevented if from actually working (the test AM code contained
the change itself). Defang the asserts in the general code, and move
the stronger ones into heap AM.

Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid /
relminmxid. Change the table_relation_copy_for_cluster() interface to
allow the AM to overwrite the horizons that get set on the pg_class
entry.  This'd also in the future allow AMs like heap to compute a
relfrozenxid during rewrite that's the table's actual minimum rather
than a pre-determined value.  Arguably it'd have been better to move
the whole computation / setting of those values into the callback, but
it seems likely that for other reasons it'd be better to be able to
use one value to vacuum/cluster multiple tables (e.g. a toast's
horizon shouldn't be different than the table's).

Reported-By: Heikki Linnakangas
Author: Andres Freund
Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
2019-04-23 21:42:12 -07:00
Tom Lane f4a3fdfbdc Avoid order-of-execution problems with ALTER TABLE ADD PRIMARY KEY.
Up to now, DefineIndex() was responsible for adding attnotnull constraints
to the columns of a primary key, in any case where it hadn't been
convenient for transformIndexConstraint() to mark those columns as
is_not_null.  It (or rather its minion index_check_primary_key) did this
by executing an ALTER TABLE SET NOT NULL command for the target table.

The trouble with this solution is that if we're creating the index due
to ALTER TABLE ADD PRIMARY KEY, and the outer ALTER TABLE has additional
sub-commands, the inner ALTER TABLE's operations executed at the wrong
time with respect to the outer ALTER TABLE's operations.  In particular,
the inner ALTER would perform a validation scan at a point where the
table's storage might be inconsistent with its catalog entries.  (This is
on the hairy edge of being a security problem, but AFAICS it isn't one
because the inner scan would only be interested in the tuples' null
bitmaps.)  This can result in unexpected failures, such as the one seen
in bug #15580 from Allison Kaptur.

To fix, let's remove the attempt to do SET NOT NULL from DefineIndex(),
reducing index_check_primary_key's role to verifying that the columns are
already not null.  (It shouldn't ever see such a case, but it seems wise
to keep the check for safety.)  Instead, make transformIndexConstraint()
generate ALTER TABLE SET NOT NULL subcommands to be executed ahead of
the ADD PRIMARY KEY operation in every case where it can't force the
column to be created already-not-null.  This requires only minor surgery
in parse_utilcmd.c, and it makes for a much more satisfying spec for
transformIndexConstraint(): it's no longer having to take it on faith
that someone else will handle addition of NOT NULL constraints.

To make that work, we have to move the execution of AT_SetNotNull into
an ALTER pass that executes ahead of AT_PASS_ADD_INDEX.  I moved it to
AT_PASS_COL_ATTRS, and put that after AT_PASS_ADD_COL to avoid failure
when the column is being added in the same command.  This incidentally
fixes a bug in the only previous usage of AT_PASS_COL_ATTRS, for
AT_SetIdentity: it didn't work either for a newly-added column.

Playing around with this exposed a separate bug in ALTER TABLE ONLY ...
ADD PRIMARY KEY for partitioned tables.  The intent of the ONLY modifier
in that context is to prevent doing anything that would require holding
lock for a long time --- but the implied SET NOT NULL would recurse to
the child partitions, and do an expensive validation scan for any child
where the column(s) were not already NOT NULL.  To fix that, invent a
new ALTER subcommand AT_CheckNotNull that just insists that a child
column be already NOT NULL, and apply that, not AT_SetNotNull, when
recursing to children in this scenario.  This results in a slightly laxer
definition of ALTER TABLE ONLY ... SET NOT NULL for partitioned tables,
too: that command will now work as long as all children are already NOT
NULL, whereas before it just threw up its hands if there were any
partitions.

In passing, clean up the API of generateClonedIndexStmt(): remove a
useless argument, ensure that the output argument is not left undefined,
update the header comment.

A small side effect of this change is that no-such-column errors in ALTER
TABLE ADD PRIMARY KEY now produce a different message that includes the
table name, because they are now detected by the SET NOT NULL step which
has historically worded its error that way.  That seems fine to me, so
I didn't make any effort to avoid the wording change.

The basic bug #15580 is of very long standing, and these other bugs
aren't new in v12 either.  However, this is a pretty significant change
in the way ALTER TABLE ADD PRIMARY KEY works.  On balance it seems best
not to back-patch, at least not till we get some more confidence that
this patch has no new bugs.

Patch by me, but thanks to Jie Zhang for a preliminary version.

Discussion: https://postgr.es/m/15580-d1a6de5a3d65da51@postgresql.org
Discussion: https://postgr.es/m/1396E95157071C4EBBA51892C5368521017F2E6E63@G08CNEXMBPEKD02.g08.fujitsu.local
2019-04-23 12:25:27 -04:00
Andres Freund 88e6ad3054 Fix two memory leaks around force-storing tuples in slots.
As reported by Tom, when ExecStoreMinimalTuple() had to perform a
conversion to store the minimal tuple in the slot, it forgot to
respect the shouldFree flag, and leaked the tuple into the current
memory context if true.  Fix that by freeing the tuple in that case.

Looking at the relevant code made me (Andres) realize that not having
the shouldFree parameter to ExecForceStoreHeapTuple() was a bad
idea. Some callers had to locally implement the necessary logic, and
in one case it was missing, creating a potential per-group leak in
non-hashed aggregation.

The choice to not free the tuple in ExecComputeStoredGenerated() is
not pretty, but not introduced by this commit - I'll start a separate
discussion about it.

Reported-By: Tom Lane
Discussion: https://postgr.es/m/366.1555382816@sss.pgh.pa.us
2019-04-19 11:39:56 -07:00
Andres Freund 75e03eabea Fix potential use-after-free for BEFORE UPDATE row triggers on non-core AMs.
When such a trigger returns the old row version, it naturally get
stored in the slot for the trigger result. When a table AMs doesn't
store HeapTuples internally, ExecBRUpdateTriggers() frees the old row
version passed to triggers - but before this fix it might still be
referenced by the slot holding the new tuple.

Noticed when running the out-of-core zheap AM against the in-core
version of tableam.

Author: Andres Freund
2019-04-18 17:53:54 -07:00
Michael Paquier a6dcf9df4d Rework handling of invalid indexes with REINDEX CONCURRENTLY
Per discussion with others, allowing REINDEX INDEX CONCURRENTLY to work
for invalid indexes when working directly on them can have a lot of
value to unlock situations with invalid indexes without having to use a
dance involving DROP INDEX followed by an extra CREATE INDEX
CONCURRENTLY (which would not work for indexes with constraint
dependency anyway).  This also does not create extra bloat on the
relation involved as this works on individual indexes, so let's enable
it.

Note that REINDEX TABLE CONCURRENTLY still bypasses invalid indexes as
we don't want to bloat the number of indexes defined on a relation in
the event of multiple and successive failures of REINDEX CONCURRENTLY.

More regression tests are added to cover those behaviors, using an
invalid index created with CREATE INDEX CONCURRENTLY.

Reported-by: Dagfinn Ilmari Mannsåker, Álvaro Herrera
Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20190411134947.GA22043@alvherre.pgsql
2019-04-17 09:33:51 +09:00
Michael Paquier d527fda621 Fix more strcmp() calls using boolean-like comparisons for result checks
Such calls can confuse the reader as strcmp() uses an integer as result.
The places patched here have been spotted by Thomas Munro, David Rowley
and myself.

Author: Michael Paquier
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20190411021946.GG2728@paquier.xyz
2019-04-12 10:16:49 +09:00
Andres Freund 4c9e1bd0a3 Reset memory context once per tuple in validateForeignKeyConstraint.
When using tableam ExecFetchSlotHeapTuple() might return a separately
allocated tuple. We could use the shouldFree argument to explicitly
free it, but it seems more robust to to protect

Also add a CHECK_FOR_INTERRUPTS() after each tuple. It's likely that
each AM has (heap does) a CFI somewhere in the relevant path, but it
seems more robust to have one in validateForeignKeyConstraint()
itself.

Note that this only affects the cases that couldn't be optimized to be
verified with a query.

Author: Andres Freund
Reviewed-By: Tom Lane (in an earlier version)
Discussion:
    https://postgr.es/m/19030.1554574075@sss.pgh.pa.us
    https://postgr.es/m/CAKJS1f_SHKcPYMsi39An5aUjhAcEMZb6Cx1Sj1QWEWSiKJkBVQ@mail.gmail.com
    https://postgr.es/m/20180711185628.mrvl46bjgk2uxoki@alap3.anarazel.de
2019-04-07 22:42:42 -07:00
Peter Eisentraut 03f9e5cba0 Report progress of REINDEX operations
This uses the same infrastructure that the CREATE INDEX progress
reporting uses.  Add a column to pg_stat_progress_create_index to
report the OID of the index being worked on.  This was not necessary
for CREATE INDEX, but it's useful for REINDEX.

Also edit the phase descriptions a bit to be more consistent with the
source code comments.

Discussion: https://www.postgresql.org/message-id/ef6a6757-c36a-9e81-123f-13b19e36b7d7%402ndquadrant.com
2019-04-07 12:35:29 +02:00
Tom Lane 46e3442c9e Fix failures in validateForeignKeyConstraint's slow path.
The foreign-key-checking loop in ATRewriteTables failed to ignore
relations without storage (e.g., partitioned tables), unlike the
initial loop.  This accidentally worked as long as RI_Initial_Check
succeeded, which it does in most practical cases (including all the
ones exercised in the existing regression tests :-().  However, if
that failed, as for instance when there are permissions issues,
then we entered the slow fire-the-trigger-on-each-tuple path.
And that would try to read from the referencing relation, and fail
if it lacks storage.

A second problem, recently introduced in HEAD, was that this loop
had been broken by sloppy refactoring for the tableam API changes.

Repair both issues, and add a regression test case so we have some
coverage on this code path.  Back-patch as needed to v11.

(It looks like this code could do with additional bulletproofing,
but let's get a working test case in place first.)

Hadi Moshayedi, Tom Lane, Andres Freund

Discussion: https://postgr.es/m/CAK=1=WrnNmBbe5D9sm3t0a6dnAq3cdbF1vXY816j1wsMqzC8bw@mail.gmail.com
Discussion: https://postgr.es/m/19030.1554574075@sss.pgh.pa.us
Discussion: https://postgr.es/m/20190325180405.jytoehuzkeozggxx%40alap3.anarazel.de
2019-04-06 15:09:09 -04:00
Andres Freund 57a7a3adfe Remove unused struct member, enforce multi_insert callback presence.
Author: David Rowley, Andres Freund
Discussion: https://postgr.es/m/CAKJS1f9=9phmm66diAji4gvHnWSrK7BGFoNct+mEUT_c8pPOjw@mail.gmail.com
2019-04-04 17:39:39 -07:00
Andres Freund ea97e440b8 Harden tableam against nonexistant / wrong kind of AMs.
Previously it was allowed to set default_table_access_method to an
empty string. That makes sense for default_tablespace, where that was
copied from, as it signals falling back to the database's default
tablespace. As there is no equivalent for table AMs, forbid that.

Also make sure to throw a usable error when creating a table using an
index AM, by using get_am_type_oid() to implement get_table_am_oid()
instead of a separate copy. Previously we'd error out only later, in
GetTableAmRoutine().

Thirdly remove GetTableAmRoutineByAmId() - it was only used in an
earlier version of 8586bf7ed8.

Add tests for the above (some for index AMs as well).
2019-04-04 17:39:39 -07:00
Andres Freund 86b85044e8 tableam: Add table_multi_insert() and revamp/speed-up COPY FROM buffering.
This adds table_multi_insert(), and converts COPY FROM, the only user
of heap_multi_insert, to it.

A simple conversion of COPY FROM use slots would have yielded a
slowdown when inserting into a partitioned table for some
workloads. Different partitions might need different slots (both slot
types and their descriptors), and dropping / creating slots when
there's constant partition changes is measurable.

Thus instead revamp the COPY FROM buffering for partitioned tables to
allow to buffer inserts into multiple tables, flushing only when
limits are reached across all partition buffers. By only dropping
slots when there've been inserts into too many different partitions,
the aforementioned overhead is gone. By allowing larger batches, even
when there are frequent partition changes, we actuall speed such cases
up significantly.

By using slots COPY of very narrow rows into unlogged / temporary
might slow down very slightly (due to the indirect function calls).

Author: David Rowley, Andres Freund, Haribabu Kommi
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20190327054923.t3epfuewxfqdt22e@alap3.anarazel.de
2019-04-04 16:28:18 -07:00
Robert Haas a96c41feec Allow VACUUM to be run with index cleanup disabled.
This commit adds a new reloption, vacuum_index_cleanup, which
controls whether index cleanup is performed for a particular
relation by default.  It also adds a new option to the VACUUM
command, INDEX_CLEANUP, which can be used to override the
reloption.  If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro
Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me.
The wording of the documentation is mostly due to me.

Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com
2019-04-04 15:04:43 -04:00
Thomas Munro 3eb77eba5a Refactor the fsync queue for wider use.
Previously, md.c and checkpointer.c were tightly integrated so that
fsync calls could be handed off and processed in the background.
Introduce a system of callbacks and file tags, so that other modules
can hand off fsync work in the same way.

For now only md.c uses the new interface, but other users are being
proposed.  Since there may be use cases that are not strictly SMGR
implementations, use a new function table for sync handlers rather
than extending the traditional SMGR one.

Instead of using a bitmapset of segment numbers for each RelFileNode
in the checkpointer's hash table, make the segment number part of the
key.  This requires sending explicit "forget" requests for every
segment individually when relations are dropped, but suits the file
layout schemes of proposed future users better (ie sparse or high
segment numbers).

Author: Shawn Debnath and Thomas Munro
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAEepm=2gTANm=e3ARnJT=n0h8hf88wqmaZxk0JYkxw+b21fNrw@mail.gmail.com
2019-04-04 23:38:38 +13:00
Tomas Vondra ea569d64ac Add SETTINGS option to EXPLAIN, to print modified settings.
Query planning is affected by a number of configuration options, and it
may be crucial to know which of those options were set to non-default
values.  With this patch you can say EXPLAIN (SETTINGS ON) to include
that information in the query plan.  Only options affecting planning,
with values different from the built-in default are printed.

This patch also adds auto_explain.log_settings option, providing the
same capability in auto_explain module.

Author: Tomas Vondra
Reviewed-by: Rafia Sabih, John Naylor
Discussion: https://postgr.es/m/e1791b4c-df9c-be02-edc5-7c8874944be0@2ndquadrant.com
2019-04-04 00:04:31 +02:00
Alvaro Herrera 5f6fc34af5 Copy name when cloning FKs recurses to partitions
We were passing a string owned by a syscache entry, which was released
before recursing.  Fix by pstrdup'ing the string.

Per buildfarm member prion.
2019-04-03 15:35:54 -03:00
Alvaro Herrera f56f8f8da6 Support foreign keys that reference partitioned tables
Previously, while primary keys could be made on partitioned tables, it
was not possible to define foreign keys that reference those primary
keys.  Now it is possible to do that.

Author: Álvaro Herrera
Reviewed-by: Amit Langote, Jesper Pedersen
Discussion: https://postgr.es/m/20181102234158.735b3fevta63msbj@alvherre.pgsql
2019-04-03 14:40:21 -03:00
Alvaro Herrera ab0dfc961b Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.

There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific.  The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.

The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan.  (The index validation index scan requires
patching each AM, which has not been included here.)

Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 15:18:08 -03:00
Andres Freund d45e401586 tableam: Add table_finish_bulk_insert().
This replaces the previous calls of heap_sync() in places using
bulk-insert. By passing in the flags used for bulk-insert the AM can
decide (first at insert time and then during the finish call) which of
the optimizations apply to it, and what operations are necessary to
finish a bulk insert operation.

Also change HEAP_INSERT_* flags to TABLE_INSERT, and rename hi_options
to ti_options.

These changes are made even in copy.c, which hasn't yet been converted
to tableam. There's no harm in doing so.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-04-01 14:41:42 -07:00
Andres Freund 737a292b5d tableam: VACUUM and ANALYZE support.
This is a relatively straightforward move of the current
implementation to sit below tableam. As the current analyze sampling
implementation is pretty inherently block based, the tableam analyze
interface is as well. It might make sense to generalize that at some
point, but that seems like a larger project that shouldn't be
undertaken at the same time as the introduction of tableam.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-30 19:25:58 -07:00
Peter Eisentraut fc22b6623b Generated columns
This is an SQL-standard feature that allows creating columns that are
computed from expressions rather than assigned, similar to a view or
materialized view but on a column basis.

This implements one kind of generated column: stored (computed on
write).  Another kind, virtual (computed on read), is planned for the
future, and some room is left for it.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
2019-03-30 08:15:57 +01:00
Peter Eisentraut 6b8b5364dd Small code simplification for REINDEX CONCURRENTLY
This was left over from an earlier code structure.
2019-03-30 07:16:24 +01:00
Robert Haas 41b54ba78e Allow existing VACUUM options to take a Boolean argument.
This makes VACUUM work more like EXPLAIN already does without changing
the meaning of any commands that already work.  It is intended to
facilitate the addition of future VACUUM options that may take
non-Boolean parameters or that default to false.

Masahiko Sawada, reviewed by me.

Discussion: http://postgr.es/m/CA+TgmobpYrXr5sUaEe_T0boabV0DSm=utSOZzwCUNqfLEEm8Mw@mail.gmail.com
Discussion: http://postgr.es/m/CAD21AoBaFcKBAeL5_++j+Vzir2vBBcF4juW7qH8b3HsQY=Q6+w@mail.gmail.com
2019-03-29 08:22:49 -04:00
Peter Eisentraut bb76134b08 Fix incorrect code in new REINDEX CONCURRENTLY code
The previous code was adding pointers to transient variables to a
list, but by the time the list was read, the variable might be gone,
depending on the compiler.  Fix it by making copies in the proper
memory context.
2019-03-29 10:53:40 +01:00
Peter Eisentraut 5dc92b844e REINDEX CONCURRENTLY
This adds the CONCURRENTLY option to the REINDEX command.  A REINDEX
CONCURRENTLY on a specific index creates a new index (like CREATE
INDEX CONCURRENTLY), then renames the old index away and the new index
in place and adjusts the dependencies, and then drops the old
index (like DROP INDEX CONCURRENTLY).  The REINDEX command also has
the capability to run its other variants (TABLE, DATABASE) with the
CONCURRENTLY option (but not SYSTEM).

The reindexdb command gets the --concurrently option.

Author: Michael Paquier, Andreas Karlsson, Peter Eisentraut
Reviewed-by: Andres Freund, Fujii Masao, Jim Nasby, Sergei Kornilov
Discussion: https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee
2019-03-29 08:26:33 +01:00
Andres Freund d25f519107 tableam: relation creation, VACUUM FULL/CLUSTER, SET TABLESPACE.
This moves the responsibility for:
- creating the storage necessary for a relation, including creating a
  new relfilenode for a relation with existing storage
- non-transactional truncation of a relation
- VACUUM FULL / CLUSTER's rewrite of a table
below tableam.

This is fairly straight forward, with a bit of complexity smattered in
to move the computation of xid / multixid horizons below the AM, as
they don't make sense for every table AM.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-28 20:01:43 -07:00
Tomas Vondra 7300a69950 Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.

Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
Alvaro Herrera 126d631222 Fix partitioned index creation bug with dropped columns
ALTER INDEX .. ATTACH PARTITION fails if the partitioned table where the
index is defined contains more dropped columns than its partition, with
this message:
  ERROR:  incorrect attribute map
The cause was that one caller of CompareIndexInfo was passing the number
of attributes of the partition rather than the parent, which confused
the length check.  Repair.

This can cause pg_upgrade to fail when used on such a database.  Leave
some more objects around after regression tests, so that the case is
detected by pg_upgrade test suite.

Remove some spurious empty lines noticed while looking for other cases
of the same problem.

Discussion: https://postgr.es/m/20190326213924.GA2322@alvherre.pgsql
2019-03-26 20:19:28 -03:00
Robert Haas 6f97457e0d Add progress reporting for CLUSTER and VACUUM FULL.
This uses the same progress reporting infrastructure added in commit
c16dc1aca5 and extends it to these
additional cases.  We lack the ability to track the internal progress
of sorts and index builds so the information reported is
coarse-grained for some parts of the operation, but it still seems
like a significant improvement over having nothing at all.

Tatsuro Yamada, reviewed by Thomas Munro, Masahiko Sawada, Michael
Paquier, Jeff Janes, Alvaro Herrera, Rafia Sabih, and by me.  A fair
amount of polishing also by me.

Discussion: http://postgr.es/m/59A77072.3090401@lab.ntt.co.jp
2019-03-25 10:59:04 -04:00
Andres Freund 9a8ee1dc65 tableam: Add and use table_fetch_row_version().
This is essentially the tableam version of heapam_fetch(),
i.e. fetching a tuple identified by a tid, performing visibility
checks.

Note that this different from table_index_fetch_tuple(), which is for
index lookups. It therefore has to handle a tid pointing to an earlier
version of a tuple if the AM uses an optimization like heap's HOT. Add
comments to that end.

This commit removes the stats_relation argument from heap_fetch, as
it's been unused for a long time.

Author: Andres Freund
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-25 00:17:59 -07:00
Andres Freund 919e48b943 tableam: Use in CREATE TABLE AS and CREATE MATERIALIZED VIEW.
Previously those directly performed a heap_insert(). Use
table_insert() instead.  The input slot of those routines is not of
the target relation - we could fix that by copying if necessary, but
that'd not be beneficial for performance. As those codepaths don't
access any AM specific tuple fields (say xmin/xmax), there's no need
to use an AM specific slot.

Author: Andres Freund
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-24 18:58:37 -07:00
Andres Freund 5db6df0c01 tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.

Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one.  It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch().  Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.

The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.

As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.

The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).

Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.

Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.

Author: Andres Freund and Haribabu Kommi
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-23 19:55:57 -07:00
Peter Eisentraut 5e1963fb76 Collations with nondeterministic comparison
This adds a flag "deterministic" to collations.  If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal.  That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.

This functionality is only supported with the ICU provider.  At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.

The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).

This patch makes changes in three areas:

- CREATE COLLATION DDL changes and system catalog changes to support
  this new flag.

- Many executor nodes and auxiliary code are extended to track
  collations.  Previously, this code would just throw away collation
  information, because the eventually-called user-defined functions
  didn't use it since they only cared about equality, which didn't
  need collation information.

- String data type functions that do equality comparisons and hashing
  are changed to take the (non-)deterministic flag into account.  For
  comparison, this just means skipping various shortcuts and tie
  breakers that use byte-wise comparison.  For hashing, we first need
  to convert the input string to a canonical "sort key" using the ICU
  analogue of strxfrm().

Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
2019-03-22 12:12:43 +01:00
Alvaro Herrera a6da004715 Add index_get_partition convenience function
This new function simplifies some existing coding, as well as supports
future patches.

Discussion: https://postgr.es/m/201901222145.t6wws6t6vrcu@alvherre.pgsql
Reviewed-by: Amit Langote, Jesper Pedersen
2019-03-20 18:18:50 -03:00
Andrew Gierth 01bde4fa4c Implement OR REPLACE option for CREATE AGGREGATE.
Aggregates have acquired a dozen or so optional attributes in recent
years for things like parallel query and moving-aggregate mode; the
lack of an OR REPLACE option to add or change these for an existing
agg makes extension upgrades gratuitously hard. Rectify.
2019-03-19 01:16:50 +00:00
Robert Haas 6776142a07 Revise parse tree representation for VACUUM and ANALYZE.
Like commit f41551f61f, this aims
to make it easier to add non-Boolean options to VACUUM (or, in
this case, to ANALYZE).  Instead of building up a bitmap of
options directly in the parser, build up a list of DefElem
objects and let ExecVacuum() sort it out; right now, we make
no use of the fact that a DefElem can carry an associated value,
but it will be easy to make that change in the future.

Masahiko Sawada

Discussion: http://postgr.es/m/CAD21AoATE4sn0jFFH3NcfUZXkU2BMbjBWB_kDj-XWYA-LXDcQA@mail.gmail.com
2019-03-18 15:14:52 -04:00
Robert Haas f41551f61f Fold vacuum's 'int options' parameter into VacuumParams.
Many places need both, so this allows a few functions to take one
fewer parameter.  More importantly, as soon as we add a VACUUM
option that takes a non-Boolean parameter, we need to replace
'int options' with a struct, and it seems better to think
of adding more fields to VacuumParams rather than passing around
both VacuumParams and a separate struct as well.

Patch by me, reviewed by Masahiko Sawada

Discussion: http://postgr.es/m/CA+Tgmob6g6-s50fyv8E8he7APfwCYYJ4z0wbZC2yZeSz=26CYQ@mail.gmail.com
2019-03-18 13:57:33 -04:00
Peter Eisentraut 61dc407893 Improve code comment 2019-03-14 22:44:21 +01:00
Peter Eisentraut c6ff0b892c Refactor ParamListInfo initialization
There were six copies of identical nontrivial code.  Put it into a
function.
2019-03-14 13:30:09 +01:00
Peter Eisentraut f177660ab0 Include all columns in default names for foreign key constraints
When creating a name for a foreign key constraint when none is
specified, use all column names instead of only the first one, similar
to how it is already done for index names.

Author: Paul Martinez <hellopfm@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CAF+2_SFjky6XRfLNRXpkG97W6PRbOO_mjAxqXzAAimU=c7w7_A@mail.gmail.com
2019-03-13 14:25:42 +01:00
Robert Haas bbb96c3704 Allow ALTER TABLE .. SET NOT NULL to skip provably unnecessary scans.
If existing CHECK or NOT NULL constraints preclude the presence
of nulls, we need not look to see whether any are present.

Sergei Kornilov, reviewed by Stephen Frost, Ildar Musin, David Rowley,
and by me.

Discussion: http://postgr.es/m/81911511895540@web58j.yandex.ru
2019-03-13 08:55:00 -04:00
Andres Freund c2fe139c20 tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:

1) Heap scans need to be generalized into table scans. Do this by
   introducing TableScanDesc, which will be the "base class" for
   individual AMs. This contains the AM independent fields from
   HeapScanDesc.

   The previous heap_{beginscan,rescan,endscan} et al. have been
   replaced with a table_ version.

   There's no direct replacement for heap_getnext(), as that returned
   a HeapTuple, which is undesirable for a other AMs. Instead there's
   table_scan_getnextslot().  But note that heap_getnext() lives on,
   it's still used widely to access catalog tables.

   This is achieved by new scan_begin, scan_end, scan_rescan,
   scan_getnextslot callbacks.

2) The portion of parallel scans that's shared between backends need
   to be able to do so without the user doing per-AM work. To achieve
   that new parallelscan_{estimate, initialize, reinitialize}
   callbacks are introduced, which operate on a new
   ParallelTableScanDesc, which again can be subclassed by AMs.

   As it is likely that several AMs are going to be block oriented,
   block oriented callbacks that can be shared between such AMs are
   provided and used by heap. table_block_parallelscan_{estimate,
   intiialize, reinitialize} as callbacks, and
   table_block_parallelscan_{nextpage, init} for use in AMs. These
   operate on a ParallelBlockTableScanDesc.

3) Index scans need to be able to access tables to return a tuple, and
   there needs to be state across individual accesses to the heap to
   store state like buffers. That's now handled by introducing a
   sort-of-scan IndexFetchTable, which again is intended to be
   subclassed by individual AMs (for heap IndexFetchHeap).

   The relevant callbacks for an AM are index_fetch_{end, begin,
   reset} to create the necessary state, and index_fetch_tuple to
   retrieve an indexed tuple.  Note that index_fetch_tuple
   implementations need to be smarter than just blindly fetching the
   tuples for AMs that have optimizations similar to heap's HOT - the
   currently alive tuple in the update chain needs to be fetched if
   appropriate.

   Similar to table_scan_getnextslot(), it's undesirable to continue
   to return HeapTuples. Thus index_fetch_heap (might want to rename
   that later) now accepts a slot as an argument. Core code doesn't
   have a lot of call sites performing index scans without going
   through the systable_* API (in contrast to loads of heap_getnext
   calls and working directly with HeapTuples).

   Index scans now store the result of a search in
   IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
   target is not generally a HeapTuple anymore that seems cleaner.

To be able to sensible adapt code to use the above, two further
callbacks have been introduced:

a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
   slots capable of holding a tuple of the AMs
   type. table_slot_callbacks() and table_slot_create() are based
   upon that, but have additional logic to deal with views, foreign
   tables, etc.

   While this change could have been done separately, nearly all the
   call sites that needed to be adapted for the rest of this commit
   also would have been needed to be adapted for
   table_slot_callbacks(), making separation not worthwhile.

b) tuple_satisfies_snapshot checks whether the tuple in a slot is
   currently visible according to a snapshot. That's required as a few
   places now don't have a buffer + HeapTuple around, but a
   slot (which in heap's case internally has that information).

Additionally a few infrastructure changes were needed:

I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
   internally uses a slot to keep track of tuples. While
   systable_getnext() still returns HeapTuples, and will so for the
   foreseeable future, the index API (see 1) above) now only deals with
   slots.

The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.

Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 12:46:41 -07:00
Alvaro Herrera af38498d4c Move hash_any prototype from access/hash.h to utils/hashutils.h
... as well as its implementation from backend/access/hash/hashfunc.c to
backend/utils/hash/hashfn.c.

access/hash is the place for the hash index AM, not really appropriate
for generic facilities, which is what hash_any is; having things the old
way meant that anything using hash_any had to include the AM's include
file, pointlessly polluting its namespace with unrelated, unnecessary
cruft.

Also move the HTEqual strategy number to access/stratnum.h from
access/hash.h.

To avoid breaking third-party extension code, add an #include
"utils/hashutils.h" to access/hash.h.  (An easily removed line by
committers who enjoy their asbestos suits to protect them from angry
extension authors.)

Discussion: https://postgr.es/m/201901251935.ser5e4h6djt2@alvherre.pgsql
2019-03-11 13:17:50 -03:00
Tom Lane caf626b2cd Convert [autovacuum_]vacuum_cost_delay into floating-point GUCs.
This change makes it possible to specify sub-millisecond delays,
which work well on most modern platforms, though that was not true
when the cost-delay feature was designed.

To support this without breaking existing configuration entries,
improve guc.c to allow floating-point GUCs to have units.  Also,
allow "us" (microseconds) as an input/output unit for time-unit GUCs.
(It's not allowed as a base unit, at least not yet.)

Likewise change the autovacuum_vacuum_cost_delay reloption to be
floating-point; this forces a catversion bump because the layout of
StdRdOptions changes.

This patch doesn't in itself change the default values or allowed
ranges for these parameters, and it should not affect the behavior
for any already-allowed setting for them.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-10 15:01:39 -04:00
Noah Misch 3c5926301a Avoid some table rewrites for ALTER TABLE .. SET DATA TYPE timestamp.
When the timezone is UTC, timestamptz and timestamp are binary coercible
in both directions.  See b8a18ad485 and
c22ecc6562 for the previous attempt in
this problem space.  Skip the table rewrite; for now, continue to
needlessly rewrite any index on an affected column.

Reviewed by Simon Riggs and Tom Lane.

Discussion: https://postgr.es/m/20190226061450.GA1665944@rfd.leadboat.com
2019-03-08 20:16:27 -08:00
Robert Haas 898e5e3290 Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.
We still require AccessExclusiveLock on the partition itself, because
otherwise an insert that violates the newly-imposed partition
constraint could be in progress at the same time that we're changing
that constraint; only the lock level on the parent relation is
weakened.

To make this safe, we have to cope with (at least) three separate
problems. First, relevant DDL might commit while we're in the process
of building a PartitionDesc.  If so, find_inheritance_children() might
see a new partition while the RELOID system cache still has the old
partition bound cached, and even before invalidation messages have
been queued.  To fix that, if we see that the pg_class tuple seems to
be missing or to have a null relpartbound, refetch the value directly
from the table. We can't get the wrong value, because DETACH PARTITION
still requires AccessExclusiveLock throughout; if we ever want to
change that, this will need more thought. In testing, I found it quite
difficult to hit even the null-relpartbound case; the race condition
is extremely tight, but the theoretical risk is there.

Second, successive calls to RelationGetPartitionDesc might not return
the same answer.  The query planner will get confused if lookup up the
PartitionDesc for a particular relation does not return a consistent
answer for the entire duration of query planning.  Likewise, query
execution will get confused if the same relation seems to have a
different PartitionDesc at different times.  Invent a new
PartitionDirectory concept and use it to ensure consistency.  This
ensures that a single invocation of either the planner or the executor
sees the same view of the PartitionDesc from beginning to end, but it
does not guarantee that the planner and the executor see the same
view.  Since this allows pointers to old PartitionDesc entries to
survive even after a relcache rebuild, also postpone removing the old
PartitionDesc entry until we're certain no one is using it.

For the most part, it seems to be OK for the planner and executor to
have different views of the PartitionDesc, because the executor will
just ignore any concurrently added partitions which were unknown at
plan time; those partitions won't be part of the inheritance
expansion, but invalidation messages will trigger replanning at some
point.  Normally, this happens by the time the very next command is
executed, but if the next command acquires no locks and executes a
prepared query, it can manage not to notice until a new transaction is
started.  We might want to tighten that up, but it's material for a
separate patch.  There would still be a small window where a query
that started just after an ATTACH PARTITION command committed might
fail to notice its results -- but only if the command starts before
the commit has been acknowledged to the user. All in all, the warts
here around serializability seem small enough to be worth accepting
for the considerable advantage of being able to add partitions without
a full table lock.

Although in general the consequences of new partitions showing up
between planning and execution are limited to the query not noticing
the new partitions, run-time partition pruning will get confused in
that case, so that's the third problem that this patch fixes.
Run-time partition pruning assumes that indexes into the PartitionDesc
are stable between planning and execution.  So, add code so that if
new partitions are added between plan time and execution time, the
indexes stored in the subplan_map[] and subpart_map[] arrays within
the plan's PartitionedRelPruneInfo get adjusted accordingly.  There
does not seem to be a simple way to generalize this scheme to cope
with partitions that are removed, mostly because they could then get
added back again with different bounds, but it works OK for added
partitions.

This code does not try to ensure that every backend participating in
a parallel query sees the same view of the PartitionDesc.  That
currently doesn't matter, because we never pass PartitionDesc
indexes between backends.  Each backend will ignore the concurrently
added partitions which it notices, and it doesn't matter if different
backends are ignoring different sets of concurrently added partitions.
If in the future that matters, for example because we allow writes in
parallel query and want all participants to do tuple routing to the same
set of partitions, the PartitionDirectory concept could be improved to
share PartitionDescs across backends.  There is a draft patch to
serialize and restore PartitionDescs on the thread where this patch
was discussed, which may be a useful place to start.

Patch by me.  Thanks to Alvaro Herrera, David Rowley, Simon Riggs,
Amit Langote, and Michael Paquier for discussion, and to Alvaro
Herrera for some review.

Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
2019-03-07 11:13:12 -05:00
Andres Freund 8586bf7ed8 tableam: introduce table AM infrastructure.
This introduces the concept of table access methods, i.e. CREATE
  ACCESS METHOD ... TYPE TABLE and
  CREATE TABLE ... USING (storage-engine).
No table access functionality is delegated to table AMs as of this
commit, that'll be done in following commits.

Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.

Docs will be updated at the end, as adding them incrementally would
likely make them less coherent, and definitely is a lot more work,
without a lot of benefit.

Table access methods are specified similar to index access methods,
i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with
callbacks. In contrast to index AMs that struct needs to live as long
as a backend, typically that's achieved by just returning a pointer to
a constant struct.

Psql's \d+ now displays a table's access method. That can be disabled
with HIDE_TABLEAM=true, which is mainly useful so regression tests can
be run against different AMs.  It's quite possible that this behaviour
still needs to be fine tuned.

For now it's not allowed to set a table AM for a partitioned table, as
we've not resolved how partitions would inherit that. Disallowing
allows us to introduce, if we decide that's the way forward, such a
behaviour without a compatibility break.

Catversion bumped, to add the heap table AM and references to it.

Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
    https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de
    https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 09:54:38 -08:00
Andres Freund 5963b29e03 Initialize variable to silence compiler warning.
After ff11e7f4b9 Tom's compiler warns about accessing a potentially
uninitialized rInfo. That's not actually possible, but it's
understandable the compiler would get this wrong. NULL initialize too.

Reported-By: Tom Lane
Discussion: https://postgr.es/m/11199.1551285318@sss.pgh.pa.us
2019-02-27 09:14:34 -08:00
Michael Paquier 414a9d3cf3 Fix memory leak when inserting tuple at relation creation for CTAS
The leak has been introduced by 763f2ed which has addressed the problem
for transient tables, and forgot CREATE TABLE AS which shares a similar
logic when receiving a new tuple to store into the newly-created
relation.

Author: Jeff Janes
Discussion: https://postgr.es/m/CAMkU=1xZXtz3mziPEPD2Fubbas4G2RWkZm5HHABtfKVcbu1=Sg@mail.gmail.com
2019-02-27 14:14:06 +09:00
Andres Freund ff11e7f4b9 Use slots in trigger infrastructure, except for the actual invocation.
In preparation for abstracting table storage, convert trigger.c to
track tuples in slots. Which also happens to make code calling
triggers simpler.

As the calling interface for triggers themselves is not changed in
this patch, HeapTuples still are extracted from the slot at that
time. But that's handled solely inside trigger.c, not visible to
callers. It's quite likely that we'll want to revise the external
trigger interface, but that's a separate large project.

As part of this work the slots used for old/new/return tuples are
moved from EState into ResultRelInfo, as different updated tables
might need different slots. The slots are now also now created
on-demand, which is good both from an efficiency POV, but also makes
the modifying code simpler.

Author: Andres Freund, Amit Khandekar and Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26 20:31:38 -08:00
Andres Freund b8d71745ea Store table oid and tuple's tid in tuple slots directly.
After the introduction of tuple table slots all table AMs need to
support returning the table oid of the tuple stored in a slot created
by said AM. It does not make sense to re-implement that in every AM,
therefore move handling of table OIDs into the TupleTableSlot
structure itself.  It's possible that we, at a later date, might want
to get rid of HeapTupleData.t_tableOid entirely, but doing so before
the abstractions for table AMs are integrated turns out to be too
hard, so delay that for now.

Similarly, every AM needs to support the concept of a tuple
identifier (tid / item pointer) for its tuples. It's quite possible
that we'll generalize the exact form of a tid at a future point (to
allow for things like index organized tables), but for now many parts
of the code know about tids, so there's not much point in abstracting
tids away. Therefore also move into slot (rather than providing API to
set/get the tid associated with the tuple in a slot).

Once table AM includes insert/updating/deleting tuples, the
responsibility to set the correct tid after such an action will move
into that. After that change, code doing such modifications, should
not have to deal with HeapTuples directly anymore.

Author: Andres Freund, Haribabu Kommi and Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26 20:31:16 -08:00
Peter Eisentraut bc09d5e4cc Remove unnecessary use of PROCEDURAL
Remove some unnecessary, legacy-looking use of the PROCEDURAL keyword
before LANGUAGE.  We mostly don't use this anymore, so some of these
look a bit old.

There is still some use in pg_dump, which is harder to remove because
it's baked into the archive format, so I'm not touching that.

Discussion: https://www.postgresql.org/message-id/2330919b-62d9-29ac-8de3-58c024fdcb96@2ndquadrant.com
2019-02-25 08:38:59 +01:00
Robert Haas 1bb5e78218 Move code for managing PartitionDescs into a new file, partdesc.c
This is similar in spirit to the existing partbounds.c file in the
same directory, except that there's a lot less code in the new file
created by this commit.  Pending work in this area proposes to add a
bunch more code related to PartitionDescs, though, and this will give
us a good place to put it.

Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
2019-02-21 11:45:02 -05:00
Tom Lane a32ca78836 Fix CREATE VIEW to allow zero-column views.
We should logically have allowed this case when we allowed zero-column
tables, but it was overlooked.

Although this might be thought a feature addition, it's really a bug
fix, because it was possible to create a zero-column view via
the convert-table-to-view code path, and then you'd have a situation
where dump/reload would fail.  Hence, back-patch to all supported
branches.

Arrange the added test cases to provide coverage of the related
pg_dump code paths (since these views will be dumped and reloaded
during the pg_upgrade regression test).  I also made them test
the case where pg_dump has to postpone the view rule into post-data,
which disturbingly had no regression coverage before.

Report and patch by Ashutosh Sharma (test case by me)

Discussion: https://postgr.es/m/CAE9k0PkmHdeSaeZt2ujnb_cKucmK3sDDceDzw7+d5UZoNJPYOg@mail.gmail.com
2019-02-17 12:37:31 -05:00
Tom Lane 1d92a0c9f7 Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:

1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents.  Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation.  (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension.  I'm not sure if any less-artificial
cases exist.)

2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.

Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies.  This is
cosmetic to users but causes testing problems.

To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted.  To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.

To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes.  (They behave identically except for having different
priorities for being cited in error messages.)  This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.

This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.

Catversion bump because the contents of pg_depend change for
partitioning relationships.

Patch HEAD only.  It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.

Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 14:41:17 -05:00
Peter Eisentraut 256fc004af Adjust gratuitously different error message wording 2019-02-11 14:01:05 +01:00
Alvaro Herrera cb90de1aac Fix trigger drop procedure
After commit 123cc697a8, we remove redundant FK action triggers during
partition ATTACH by merely deleting the catalog tuple, but that's wrong:
it should use performDeletion() instead.  Repair, and make the comments
more explicit.

Per code review from Tom Lane.

Discussion: https://postgr.es/m/18885.1549642539@sss.pgh.pa.us
2019-02-10 10:00:11 -03:00
Tom Lane 1fb57af920 Create the infrastructure for planner support functions.
Rename/repurpose pg_proc.protransform as "prosupport".  The idea is
still that it names an internal function that provides knowledge to
the planner about the behavior of the function it's attached to;
but redesign the API specification so that it's not limited to doing
just one thing, but can support an extensible set of requests.

The original purpose of simplifying a function call is handled by
the first request type to be invented, SupportRequestSimplify.
Adjust all the existing transform functions to handle this API,
and rename them fron "xxx_transform" to "xxx_support" to reflect
the potential generalization of what they do.  (Since we never
previously provided any way for extensions to add transform functions,
this change doesn't create an API break for them.)

Also add DDL and pg_dump support for attaching a support function to a
user-defined function.  Unfortunately, DDL access has to be restricted
to superusers, at least for now; but seeing that support functions
will pretty much have to be written in C, that limitation is just
theoretical.  (This support is untested in this patch, but a follow-on
patch will add cases that exercise it.)

Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
2019-02-09 18:08:48 -05:00
Andres Freund 171e0418b0 Fix heap_getattr() handling of fast defaults.
Previously heap_getattr() returned NULL for attributes with a fast
default value (c.f. 16828d5c02), as it had no handling whatsoever
for that case.

A previous fix, 7636e5c60f, attempted to fix issues caused by this
oversight, but just expanding OLD tuples for triggers doesn't actually
solve the underlying issue.

One known consequence of this bug is that the check for HOT updates
can return the wrong result, when a previously fast-default'ed column
is set to NULL. Which in turn means that an index over a column with
fast default'ed columns might be corrupt if the underlying column(s)
allow NULLs.

Fix by handling fast default columns in heap_getattr(), remove now
superfluous expansion in GetTupleForTrigger().

Author: Andres Freund
Discussion: https://postgr.es/m/20190201162404.onngi77f26baem4g@alap3.anarazel.de
Backpatch: 11, where fast defaults were introduced
2019-02-06 01:09:32 -08:00
Tom Lane 5f5c014590 Allow RECORD and RECORD[] to be specified in function coldeflists.
We can't allow these pseudo-types to be used as table column types,
because storing an anonymous record value in a table would result
in data that couldn't be understood by other sessions.  However,
it seems like there's no harm in allowing the case in a column
definition list that's specifying what a function-returning-record
returns.  The data involved is all local to the current session,
so we should be just as able to resolve its actual tuple type as
we are for the function-returning-record's top-level tuple output.

Elvis Pranskevichus, with cosmetic changes by me

Discussion: https://postgr.es/m/11038447.kQ5A9Uj5xi@hammer.magicstack.net
2019-01-30 19:25:33 -05:00
Tom Lane f09346a9c6 Refactor planner's header files.
Create a new header optimizer/optimizer.h, which exposes just the
planner functions that can be used "at arm's length", without need
to access Paths or the other planner-internal data structures defined
in nodes/relation.h.  This is intended to provide the whole planner
API seen by most of the rest of the system; although FDWs still need
to use additional stuff, and more thought is also needed about just
what selfuncs.c should rely on.

The main point of doing this now is to limit the amount of new
#include baggage that will be needed by "planner support functions",
which I expect to introduce later, and which will be in relevant
datatype modules rather than anywhere near the planner.

This commit just moves relevant declarations into optimizer.h from
other header files (a couple of which go away because everything
got moved), and adjusts #include lists to match.  There's further
cleanup that could be done if we want to decide that some stuff
being exposed by optimizer.h doesn't belong in the planner at all,
but I'll leave that for another day.

Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
2019-01-29 15:48:51 -05:00
Tom Lane a1b8c41e99 Make some small planner API cleanups.
Move a few very simple node-creation and node-type-testing functions
from the planner's clauses.c to nodes/makefuncs and nodes/nodeFuncs.
There's nothing planner-specific about them, as evidenced by the
number of other places that were using them.

While at it, rename and_clause() etc to is_andclause() etc, to clarify
that they are node-type-testing functions not node-creation functions.
And use "static inline" implementations for the shortest ones.

Also, modify flatten_join_alias_vars() and some subsidiary functions
to take a Query not a PlannerInfo to define the join structure that
Vars should be translated according to.  They were only using the
"parse" field of the PlannerInfo anyway, so this just requires removing
one level of indirection.  The advantage is that now parse_agg.c can
use flatten_join_alias_vars() without the horrid kluge of creating an
incomplete PlannerInfo, which will allow that file to be decoupled from
relation.h in a subsequent patch.

Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
2019-01-29 15:26:44 -05:00
Tomas Vondra 36a1281f86 Separate per-batch and per-tuple memory contexts in COPY
In batching mode, COPY was using the same (per-tuple) memory context for
allocations with longer lifetime. This was confusing but harmless, until
commit 31f3817402 added COPY FROM ... WHERE feature, introducing a risk
of memory leak.

The "per-tuple" memory context was reset only when starting new batch,
but as the rows may be filtered out by the WHERE clauses, that may not
happen at all.  The WHERE clause however has to be evaluated for all
rows, before filtering them out.

This commit separates the per-tuple and per-batch contexts, removing the
ambiguity.  Expressions (both defaults and WHERE clause) are evaluated
in the per-tuple context, while tuples are formed in the batch context.
This allows resetting the contexts at appropriate times.

The main complexity is related to partitioning, in which case we need to
reset the batch context after forming the tuple (which happens before
routing to leaf partition).  Instead of switching between two contexts
as before, we simply copy the last tuple aside, reset the context and
then copy the tuple back.  The performance impact is negligible, and
juggling with two contexts is not free either.

Discussion: https://www.postgresql.org/message-id/flat/CALAY4q_DdpWDuB5-Zyi-oTtO2uSk8pmy+dupiRe3AvAc++1imA@mail.gmail.com
2019-01-29 00:00:47 +01:00
Andres Freund a9c35cf85c Change function call information to be variable length.
Before this change FunctionCallInfoData, the struct arguments etc for
V1 function calls are stored in, always had space for
FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two
arrays.  For nearly every function call 100 arguments is far more than
needed, therefore wasting memory. Arg and argnull being two separate
arrays also guarantees that to access a single argument, two
cachelines have to be touched.

Change the layout so there's a single variable-length array with pairs
of value / isnull. That drastically reduces memory consumption for
most function calls (on x86-64 a two argument function now uses
64bytes, previously 936 bytes), and makes it very likely that argument
value and its nullness are on the same cacheline.

Arguments are stored in a new NullableDatum struct, which, due to
padding, needs more memory per argument than before. But as usually
far fewer arguments are stored, and individual arguments are cheaper
to access, that's still a clear win.  It's likely that there's other
places where conversion to NullableDatum arrays would make sense,
e.g. TupleTableSlots, but that's for another commit.

Because the function call information is now variable-length
allocations have to take the number of arguments into account. For
heap allocations that can be done with SizeForFunctionCallInfoData(),
for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro
that helps to allocate an appropriately sized and aligned variable.

Some places with stack allocation function call information don't know
the number of arguments at compile time, and currently variably sized
stack allocations aren't allowed in postgres. Therefore allow for
FUNC_MAX_ARGS space in these cases. They're not that common, so for
now that seems acceptable.

Because of the need to allocate FunctionCallInfo of the appropriate
size, older extensions may need to update their code. To avoid subtle
breakages, the FunctionCallInfoData struct has been renamed to
FunctionCallInfoBaseData. Most code only references FunctionCallInfo,
so that shouldn't cause much collateral damage.

This change is also a prerequisite for more efficient expression JIT
compilation (by allocating the function call information on the stack,
allowing LLVM to optimize it away); previously the size of the call
information caused problems inside LLVM's optimizer.

Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-26 14:17:52 -08:00
Michael Paquier c9b75c5838 Simplify restriction handling of two-phase commit for temporary objects
There were two flags used to track the access to temporary tables and
to the temporary namespace of a session which are used to restrict
PREPARE TRANSACTION, however the first control flag is a concept
included in the second.  This removes the flag for temporary table
tracking, keeping around only the one at namespace level.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20190118053126.GH1883@paquier.xyz
2019-01-26 10:45:23 +09:00
Heikki Linnakangas a5be6e9a1d Fix comments that claimed that mblen() only looks at first byte.
GB18030's mblen() function looks at the first and the second byte of the
multibyte character, to determine its length. copy.c had made the
assumption that mblen() only looks at the first byte, but it turns out to
work out fine, because of the way the GB18030 encoding works. COPY will
see a 4-byte encoded character as two 2-byte encoded characters, which is
enough for COPY's purposes. It cannot mix those up with delimiter or
escaping characters, because only single-byte ASCII characters are
supported as delimiters or escape characters.

Discussion: https://www.postgresql.org/message-id/7704d099-9643-2a55-fb0e-becd64400dcb%40iki.fi
2019-01-25 14:54:38 +02:00
Peter Eisentraut 7c079d7417 Allow generalized expression syntax for partition bounds
Previously, only literals were allowed.  This change allows general
expressions, including functions calls, which are evaluated at the
time the DDL command is executed.

Besides offering some more functionality, it simplifies the parser
structures and removes some inconsistencies in how the literals were
handled.

Author: Kyotaro Horiguchi, Tom Lane, Amit Langote
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/9f88b5e0-6da2-5227-20d0-0d7012beaa1c@lab.ntt.co.jp/
2019-01-25 11:28:49 +01:00
Alvaro Herrera efd9366dce Fix droppability of constraints upon partition detach
We were failing to set conislocal correctly for constraints in
partitions after partition detach, leading to those constraints becoming
undroppable.  Fix by setting the flag correctly.  Existing databases
might contain constraints with the conislocal wrongly set to false, for
partitions that were detached; this situation should be fixable by
applying an UPDATE on pg_constraint to set conislocal true.  This
problem should otherwise be innocuous and should disappear across a
dump/restore or pg_upgrade.

Secondarily, when constraint drop was attempted in a partitioned table,
ATExecDropConstraint would try to recurse to partitions after doing
performDeletion() of the constraint in the partitioned table itself; but
since the constraint in the partitions are dropped by the initial call
of performDeletion() (because of following dependencies), the recursion
step would fail since it would not find the constraint, causing the
whole operation to fail.  Fix by preventing recursion.

Reported-by: Amit Langote
Diagnosed-by: Amit Langote
Author: Amit Langote, Álvaro Herrera
Discussion: https://postgr.es/m/f2b8ead5-4131-d5a8-8016-2ea0a31250af@lab.ntt.co.jp
2019-01-24 14:09:56 -03:00
Alvaro Herrera 19184fcc09 Simplify coding to detach constraints when detaching partition
The original coding was too baroque and led to an use-after-release
mistake, noticed by buildfarm member prion.

Discussion: https://postgr.es/m/21693.1548305934@sss.pgh.pa.us
2019-01-24 11:25:29 -03:00
Alvaro Herrera ae366aa577 Detach constraints when partitions are detached
I (Álvaro) forgot to do this in eb7ed3f306, leading to undroppable
constraints after partitions are detached.  Repair.

Reported-by: Amit Langote
Author: Amit Langote
Discussion: https://postgr.es/m/c1c9b688-b886-84f7-4048-1e4ebe9b1d06@lab.ntt.co.jp
2019-01-24 00:01:32 -03:00
Michael Paquier 289198c0d9 Remove argument isprimary from index_build()
The flag was introduced in 3fdeb18, but f66e8bf actually forgot to
finish the cleanup as index_update_stats() has simplified its
interface.

Author: Michael Paquier
Discussion: https://postgr.es/m/20190122080852.GB3873@paquier.xyz
2019-01-24 07:57:09 +09:00
Tomas Vondra 4a8283d0ec Fix handling of volatile expressions in COPY FROM ... WHERE
The checking for calls to volatile functions in the COPY FROM ... WHERE
expression was treating all WHERE clauses as if containing such calls.
While that does not produce incorrect results, this disables batching
which may result in significant performance regression.

Discussion: https://www.postgresql.org/message-id/flat/CALAY4q_DdpWDuB5-Zyi-oTtO2uSk8pmy+dupiRe3AvAc++1imA@mail.gmail.com
2019-01-22 23:11:17 +01:00
Andres Freund 346ed70b0a Rename RelationData.rd_amroutine to rd_indam.
The upcoming table AM support makes rd_amroutine to generic, as its
only about index AMs. The new name makes that clear, and is shorter to
boot.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-21 17:36:55 -08:00
Andres Freund c91560defc Move remaining code from tqual.[ch] to heapam.h / heapam_visibility.c.
Given these routines are heap specific, and that there will be more
generic visibility support in via table AM, it makes sense to move the
prototypes to heapam.h (routines like HeapTupleSatisfiesVacuum will
not be exposed in a generic fashion, because they are too storage
specific).

Similarly, the code in tqual.c is specific to heap, so moving it into
access/heap/ makes sense.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-21 17:07:10 -08:00
Andres Freund b7eda3e0e3 Move generic snapshot related code from tqual.h to snapmgr.h.
The code in tqual.c is largely heap specific. Due to the upcoming
pluggable storage work, it therefore makes sense to move it into
access/heap/ (as the file's header notes, the tqual name isn't very
good).

But the various statically allocated snapshot and snapshot
initialization functions are now (see previous commit) generic and do
not depend on functions declared in tqual.h anymore. Therefore move.
Also move XidInMVCCSnapshot as that's useful for future AMs, and
already used outside of tqual.c.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-21 17:06:41 -08:00
Alvaro Herrera 0464fdf07f Create action triggers when partitions are detached
Detaching a partition from a partitioned table that's constrained by
foreign keys requires additional action triggers on the referenced side;
otherwise, DELETE/UPDATE actions there fail to notice rows in the table
that was partition, and so are incorrectly allowed through.  With this
commit, those triggers are now created.  Conversely, when a table that
has a foreign key is attached as a partition to a table that also has
the same foreign key, those action triggers are no longer needed, so we
remove them.

Add a minimal test case verifying (part of) this.

Authors: Amit Langote, Álvaro Herrera
Discussion: https://postgr.es/m/f2b8ead5-4131-d5a8-8016-2ea0a31250af@lab.ntt.co.jp
2019-01-21 20:08:52 -03:00
Andres Freund e7cc78ad43 Remove superfluous tqual.h includes.
Most of these had been obsoleted by 568d4138c / the SnapshotNow
removal.

This is is preparation for moving most of tqual.[ch] into either
snapmgr.h or heapam.h, which in turn is in preparation for pluggable
table AMs.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-21 12:15:02 -08:00
Andres Freund e0c4ec0728 Replace uses of heap_open et al with the corresponding table_* function.
Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 10:51:37 -08:00
Andres Freund 111944c5ee Replace heapam.h includes with {table, relation}.h where applicable.
A lot of files only included heapam.h for relation_open, heap_open etc
- replace the heapam.h include in those files with the narrower
header.

Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 10:51:37 -08:00
Andres Freund 8cc157b234 Fix ALTER TRIGGER ... RENAME, broken in WITH OIDS removal.
I (Andres) broke this in 578b229718.

Author: Rushabh Lathia
Discussion: https://postgr.es/m/CAGPqQf04PywZX3sVQaF6H=oLiW9GJncRW+=e78vTy4MokEWcZw@mail.gmail.com
2019-01-21 09:13:43 -08:00
Tomas Vondra 31f3817402 Allow COPY FROM to filter data using WHERE conditions
Extends the COPY FROM command with a WHERE condition, which allows doing
various types of filtering while importing the data (random sampling,
condition on a data column, etc.).  Until now such filtering required
either preprocessing of the input data, or importing all data and then
filtering in the database. COPY FROM ... WHERE is an easy-to-use and
low-overhead alternative for most simple cases.

Author: Surafel Temesgen
Reviewed-by: Tomas Vondra, Masahiko Sawada, Lim Myungkyu
Discussion: https://www.postgresql.org/message-id/flat/CALAY4q_DdpWDuB5-Zyi-oTtO2uSk8pmy+dupiRe3AvAc++1imA@mail.gmail.com
2019-01-20 00:22:14 +01:00
Alvaro Herrera 0325d7a595 Fix creation of duplicate foreign keys on partitions
When creating a foreign key in a partitioned table, if some partitions
already have equivalent constraints, we wastefully create duplicates of
the constraints instead of attaching to the existing ones.  That's
inconsistent with the de-duplication that is applied when a table is
attached as a partition.  To fix, reuse the FK-cloning code instead of
having a separate code path.

Backpatch to Postgres 11.  This is a subtle behavior change, but surely
a welcome one since there's no use in having duplicate foreign keys.

Discovered by Álvaro Herrera while thinking about a different problem
reported by Jesper Pedersen (bug #15587).

Author: Álvaro Herrera
Discussion: https://postgr.es/m/201901151935.zfadrzvyof4k@alvherre.pgsql
2019-01-18 15:00:45 -03:00
Alvaro Herrera 03afae201f Move CloneForeignKeyConstraints to tablecmds.c
My commit 3de241dba8 introduced some code to create a clone of a
foreign key to a partition, but I put it in pg_constraint.c because it
was too close to the contents of the pg_constraint row.  With the
previous commit that split out the constraint tuple deconstruction into
its own routine, it makes more sense to have the FK-cloning function in
tablecmds.c, mostly because its static subroutine can then be used by a
future bugfix.

My initial posting of this patch had this routine as static in
tablecmds.c, but sadly this function is already part of the Postgres 11
ABI as exported from pg_constraint.c, so keep it as exported also just
to avoid breaking any possible users of it.
2019-01-18 15:00:06 -03:00
Michael Paquier c5660e0aa5 Restrict the use of temporary namespace in two-phase transactions
Attempting to use a temporary table within a two-phase transaction is
forbidden for ages.  However, there have been uncovered grounds for
a couple of other object types and commands which work on temporary
objects with two-phase commit.  In short, trying to create, lock or drop
an object on a temporary schema should not be authorized within a
two-phase transaction, as it would cause its state to create
dependencies with other sessions, causing all sorts of side effects with
the existing session or other sessions spawned later on trying to use
the same temporary schema name.

Regression tests are added to cover all the grounds found, the original
report mentioned function creation, but monitoring closer there are many
other patterns with LOCK, DROP or CREATE EXTENSION which are involved.
One of the symptoms resulting in combining both is that the session
which used the temporary schema is not able to shut down completely,
waiting for being able to drop the temporary schema, something that it
cannot complete because of the two-phase transaction involved with
temporary objects.  In this case the client is able to disconnect but
the session remains alive on the backend-side, potentially blocking
connection backend slots from being used.  Other problems reported could
also involve server crashes.

This is back-patched down to v10, which is where 9b013dc has introduced
MyXactFlags, something that this patch relies on.

Reported-by: Alexey Bashtanov
Author: Michael Paquier
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/5d910e2e-0db8-ec06-dd5f-baec420513c3@imap.cc
Backpatch-through: 10
2019-01-18 09:21:44 +09:00
Andres Freund 285d8e1205 Move vacuumlazy.c into access/heap.
It's heap table storage specific code that can't realistically be
generalized into table AM agnostic code.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-15 12:06:19 -08:00
Andres Freund 0944ec54de Don't include genam.h from execnodes.h and relscan.h anymore.
This is the genam.h equivalent of 4c850ecec6 (which removed
heapam.h from a lot of other headers).  There's still a few header
includes of genam.h, but not from central headers anymore.

As a few headers are not indirectly included anymore, execnodes.h and
relscan.h need a few additional includes. Some of the depended on
types were replacable by using the underlying structs, but e.g. for
Snapshot in execnodes.h that'd have gotten more invasive than
reasonable in this commit.

Like the aforementioned commit 4c850ecec6, this requires adding new
genam.h includes to a number of backend files, which likely is also
required in a few external projects.

Author: Andres Freund
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-14 17:02:12 -08:00
Andres Freund 4c850ecec6 Don't include heapam.h from others headers.
heapam.h previously was included in a number of widely used
headers (e.g. execnodes.h, indirectly in executor.h, ...). That's
problematic on its own, as heapam.h contains a lot of low-level
details that don't need to be exposed that widely, but becomes more
problematic with the upcoming introduction of pluggable table storage
- it seems inappropriate for heapam.h to be included that widely
afterwards.

heapam.h was largely only included in other headers to get the
HeapScanDesc typedef (which was defined in heapam.h, even though
HeapScanDescData is defined in relscan.h). The better solution here
seems to be to just use the underlying struct (forward declared where
necessary). Similar for BulkInsertState.

Another problem was that LockTupleMode was used in executor.h - parts
of the file tried to cope without heapam.h, but due to the fact that
it indirectly included it, several subsequent violations of that goal
were not not noticed. We could just reuse the approach of declaring
parameters as int, but it seems nicer to move LockTupleMode to
lockoptions.h - that's not a perfect location, but also doesn't seem
bad.

As a number of files relied on implicitly included heapam.h, a
significant number of files grew an explicit include. It's quite
probably that a few external projects will need to do the same.

Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-14 16:24:41 -08:00
Alvaro Herrera 0ad41cf537 Fix unique INCLUDE indexes on partitioned tables
We were considering the INCLUDE columns as part of the key, allowing
unicity-violating rows to be inserted in different partitions.

Concurrent development conflict in eb7ed3f306 and 8224de4f42.

Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20190109065109.GA4285@telsasoft.com
2019-01-14 19:28:10 -03:00
Andrew Dunstan e33884d412 Free pre-modification HeapTuple in ALTER TABLE ... TYPE ...
This was an oversight in commit 3b174b1a3.

Per offline gripe from Alvaro Herrera

Backpatch to release 11.
2019-01-11 17:12:54 -05:00
Andrew Dunstan 3b174b1a35 Fix missing values when doing ALTER TABLE ALTER COLUMN TYPE
This was an oversight in commit 16828d5c. If the table is going to be
rewritten, we simply clear all the missing values from all the table's
attributes, since there will no longer be any rows with the attributes
missing. Otherwise, we repackage the missing value in an array
constructed with the new type specifications.

Backpatch to release 11.

This fixes bug #15446, reported by Dmitry Molotkov

Reviewed by Dean Rasheed
2019-01-10 15:53:45 -05:00
Alvaro Herrera df5be63763 Rename macro to RELKIND_HAS_STORAGE
The original name was an unfortunate choice.

Discussion: https://postgr.es/m/20181218.145600.172055615.horiguchi.kyotaro@lab.ntt.co.jp
2019-01-04 14:34:18 -03:00
Tom Lane 7170268efd Improve ANALYZE's handling of concurrent-update scenarios.
This patch changes the rule for whether or not a tuple seen by ANALYZE
should be included in its sample.

When we last touched this logic, in commit 51e1445f1, we weren't
thinking very hard about tuples being UPDATEd by a long-running
concurrent transaction.  In such a case, we might see the pre-image as
either LIVE or DELETE_IN_PROGRESS depending on timing; and we might see
the post-image not at all, or as INSERT_IN_PROGRESS.  Since the existing
code will not sample either DELETE_IN_PROGRESS or INSERT_IN_PROGRESS
tuples, this leads to concurrently-updated rows being omitted from the
sample entirely.  That's not very helpful, and it's especially the wrong
thing if the concurrent transaction ends up rolling back.

The right thing seems to be to sample DELETE_IN_PROGRESS rows just as if
they were live.  This makes the "sample it" and "count it" decisions the
same, which seems good for consistency.  It's clearly the right thing
if the concurrent transaction ends up rolling back; in effect, we are
sampling as though IN_PROGRESS transactions haven't happened yet.
Also, this combination of choices ensures maximum robustness against
the different combinations of whether and in which state we might see the
pre- and post-images of an update.

It's slightly annoying that we end up recording immediately-out-of-date
stats in the case where the transaction does commit, but on the other
hand the stats are fine for columns that didn't change in the update.
And the alternative of sampling INSERT_IN_PROGRESS rows instead seems
like a bad idea, because then the sampling would be inconsistent with
the way rows are counted for the stats report.

Per report from Mark Chambers; thanks to Jeff Janes for diagnosing
what was happening.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/CAFh58O_Myr6G3tcH3gcGrF-=OExB08PJdWZcSBcEcovaiPsrHA@mail.gmail.com
2019-01-03 17:00:08 -05:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Alvaro Herrera e439c6f0c3 Remove some useless code
In commit 8b08f7d482 I added member relationId to IndexStmt struct.
I'm now not sure why; DefineIndex doesn't need it, since the relation
OID is passed as a separate argument anyway.  Remove it.

Also remove a redundant assignment to the relationId argument (it wasn't
redundant when added by commit e093dcdd28, but should have been removed
in commit 5f173040e3), and use relationId instead of stmt->relation when
locking the relation in the second phase of CREATE INDEX CONCURRENTLY,
which is not only confusing but it means we resolve the name twice for
no reason.
2018-12-31 14:50:48 -03:00
Peter Eisentraut ae4472c619 Remove obsolete IndexIs* macros
Remove IndexIsValid(), IndexIsReady(), IndexIsLive() in favor of
accessing the index structure directly.  These macros haven't been
used consistently, and the original reason of maintaining source
compatibility with PostgreSQL 9.2 is gone.

Discussion: https://www.postgresql.org/message-id/flat/d419147c-09d4-6196-5d9d-0234b230880a%402ndquadrant.com
2018-12-27 10:07:46 +01:00
Michael Paquier 1e504f01da Ignore inherited temp relations from other sessions when truncating
Inheritance trees can include temporary tables if the parent is
permanent, which makes possible the presence of multiple temporary
children from different sessions.  Trying to issue a TRUNCATE on the
parent in this scenario causes a failure, so similarly to any other
queries just ignore such cases, which makes TRUNCATE work
transparently.

This makes truncation behave similarly to any other DML query working on
the parent table with queries which need to be work on the children.  A
set of isolation tests is added to cover basic cases.

Reported-by: Zhou Digoal
Author: Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/15565-ce67a48d0244436a@postgresql.org
Backpatch-through: 9.4
2018-12-27 10:16:19 +09:00
Michael Paquier bf491a9073 Disable WAL-skipping optimization for COPY on views and foreign tables
COPY can skip writing WAL when loading data on a table which has been
created in the same transaction as the one loading the data, however
this cannot work on views or foreign table as this would result in
trying to flush relation files which do not exist.  So disable the
optimization so as commands are able to work the same way with any
configuration of wal_level.

Tests are added to cover the different cases, which need to have
wal_level set to minimal to allow the problem to show up, and that is
not the default configuration.

Reported-by: Luis M. Carril, Etsuro Fujita
Author: Amit Langote, Michael Paquier
Reviewed-by: Etsuro Fujita
Discussion: https://postgr.es/m/15552-c64aa14c5c22f63c@postgresql.org
Backpatch-through: 10, where support for COPY on views has been added,
while v11 has added support for COPY on foreign tables.
2018-12-23 16:42:22 +09:00
Alvaro Herrera 7b14bcc06c Fix lock level used for partition when detaching it
For probably bogus reasons, we acquire only AccessShareLock on the
partition when we try to detach it from its parent partitioned table.
This can cause ugly things to happen if another transaction is doing
any sort of DDL to the partition concurrently.

Upgrade that lock to ShareUpdateExclusiveLock, which per discussion
seems to be the minimum needed.

Reported by Robert Haas.

Discussion: https://postgr.es/m/CA+TgmoYruJQ+2qnFLtF1xQtr71pdwgfxy3Ziy-TxV28M6pEmyA@mail.gmail.com
2018-12-20 16:42:13 -03:00
Alvaro Herrera 0c2377152f DETACH PARTITION: hold locks on indexes until end of transaction
When a partition is detached from its parent, we acquire locks on all
attached indexes to also detach them ... but we release those locks
immediately.  This is a violation of the policy of keeping locks on user
objects to the end of the transaction.  Bug introduced in 8b08f7d482.

It's unclear that there are any ill effects possible, but it's clearly
wrong nonetheless.  It's likely that bad behavior *is* possible, but
mostly because the relation that the index is for is only locked with
AccessShareLock, which is an older bug that shall be fixed separately.

While touching that line of code, close the index opened with
index_open() using index_close() instead of relation_close().
No difference in practice, but let's be consistent.

Unearthed by Robert Haas.

Discussion: https://postgr.es/m/CA+TgmoYruJQ+2qnFLtF1xQtr71pdwgfxy3Ziy-TxV28M6pEmyA@mail.gmail.com
2018-12-20 10:58:22 -03:00
Greg Stark 1075dfdaf3 Fix ADD IF NOT EXISTS used in conjunction with ALTER TABLE ONLY
The flag for IF NOT EXISTS was only being passed down in the normal
recursing case. It's been this way since originally added in 9.6 in
commit 2cd40adb85 so backpatch back to 9.6.
2018-12-19 19:38:31 -05:00
Tom Lane 586b98fdf1 Make type "name" collation-aware.
The "name" comparison operators now all support collations, making them
functionally equivalent to "text" comparisons, except for the different
physical representation of the datatype.  They do, in fact, mostly share
the varstr_cmp and varstr_sortsupport infrastructure, which has been
slightly enlarged to handle the case.

To avoid changes in the default behavior of the datatype, set name's
typcollation to C_COLLATION_OID not DEFAULT_COLLATION_OID, so that
by default comparisons to a name value will continue to use strcmp
semantics.  (This would have been the case for system catalog columns
anyway, because of commit 6b0faf723, but doing this makes it true for
user-created name columns as well.  In particular, this avoids
locale-dependent changes in our regression test results.)

In consequence, tweak a couple of places that made assumptions about
collatable base types always having typcollation DEFAULT_COLLATION_OID.
I have not, however, attempted to relax the restriction that user-
defined collatable types must have that.  Hence, "name" doesn't
behave quite like a user-defined type; it acts more like a domain
with COLLATE "C".  (Conceivably, if we ever get rid of the need for
catalog name columns to be fixed-length, "name" could actually become
such a domain over text.  But that'd be a pretty massive undertaking,
and I'm not volunteering.)

Discussion: https://postgr.es/m/15938.1544377821@sss.pgh.pa.us
2018-12-19 17:46:25 -05:00
Alvaro Herrera 68f6f2b739 Remove function names from error messages
They are not necessary, and having them there gives useless work for
translators.
2018-12-19 14:53:27 -03:00
Alvaro Herrera ca4103025d Fix tablespace handling for partitioned tables
When partitioned tables were introduced, we failed to realize that by
copying the tablespace handling for other relation kinds with no
physical storage we were causing the secondary effect that their
partitions would not automatically inherit the tablespace setting.  This
is surprising and unhelpful, so change it to adopt the behavior
introduced in pg11 (commit 33e6c34c32) for partitioned indexes: the
parent relation remembers the tablespace specification, which is then
used for any new partitions that don't declare one.

Because this commit changes behavior of the TABLESPACE clause for
partitioned tables (it's no longer a no-op), it is not backpatched.

Author: David Rowley, Álvaro Herrera
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAKJS1f9SxVzqDrGD1teosFd6jBMM0UEaa14_8mRvcWE19Tu0hA@mail.gmail.com
2018-12-17 15:37:40 -03:00
Amit Kapila 3abb11e55b Remove extra semicolons.
Reported-by: David Rowley
Author: David Rowley
Reviewed-by: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/CAKJS1f8EneeYyzzvdjahVZ6gbAHFkHbSFB5m_C0Y6TUJs9Dgdg@mail.gmail.com
2018-12-17 14:32:25 +05:30
Michael Paquier 67915fb8e5 Fix use-after-free bug when renaming constraints
This is an oversight from recent commit b13fd344.  While on it, tweak
the previous test with a better name for the renamed primary key.

Detected by buildfarm member prion which forces relation cache release
with -DRELCACHE_FORCE_RELEASE.  Back-patch down to 9.4 as the previous
commit.
2018-12-17 12:43:00 +09:00
Michael Paquier b13fd344c5 Make constraint rename issue relcache invalidation on target relation
When a constraint gets renamed, it may have associated with it a target
relation (for example domain constraints don't have one).  Not
invalidating the target relation cache when issuing the renaming can
result in issues with subsequent commands that refer to the old
constraint name using the relation cache, causing various failures.  One
pattern spotted was using CREATE TABLE LIKE after a constraint
renaming.

Reported-by: Stuart <sfbarbee@gmail.com>
Author: Amit Langote
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/2047094.V130LYfLq4@station53.ousa.org
2018-12-17 10:34:44 +09:00
Tom Lane ade2d61ed0 Improve detection of child-process SIGPIPE failures.
Commit ffa4cbd62 added logic to detect SIGPIPE failure of a COPY child
process, but it only worked correctly if the SIGPIPE occurred in the
immediate child process.  Depending on the shell in use and the
complexity of the shell command string, we might instead get back
an exit code of 128 + SIGPIPE, representing a shell error exit
reporting SIGPIPE in the child process.

We could just hack up ClosePipeToProgram() to add the extra case,
but it seems like this is a fairly general issue deserving a more
general and better-documented solution.  I chose to add a couple
of functions in src/common/wait_error.c, which is a natural place
to know about wait-result encodings, that will test for either a
specific child-process signal type or any child-process signal failure.
Then, adjust other places that were doing ad-hoc tests of this type
to use the common functions.

In RestoreArchivedFile, this fixes a race condition affecting whether
the process will report an error or just silently proc_exit(1): before,
that depended on whether the intermediate shell got SIGTERM'd itself
or reported a child process failing on SIGTERM.

Like the previous patch, back-patch to v10; we could go further
but there seems no real need to.

Per report from Erik Rijkers.

Discussion: https://postgr.es/m/f3683f87ab1701bea5d86a7742b22432@xs4all.nl
2018-12-16 14:32:14 -05:00
Tom Lane 5e09280057 Make pg_statistic and related code account more honestly for collations.
When we first put in collations support, we basically punted on teaching
pg_statistic, ANALYZE, and the planner selectivity functions about that.
They've just used DEFAULT_COLLATION_OID independently of the actual
collation of the data.  It's time to improve that, so:

* Add columns to pg_statistic that record the specific collation associated
with each statistics slot.

* Teach ANALYZE to use the column's actual collation when comparing values
for statistical purposes, and record this in the appropriate slot.  (Note
that type-specific typanalyze functions are now expected to fill
stats->stacoll with the appropriate collation, too.)

* Teach assorted selectivity functions to use the actual collation of
the stats they are looking at, instead of just assuming it's
DEFAULT_COLLATION_OID.

This should give noticeably better results in selectivity estimates for
columns with nondefault collations, at least for query clauses that use
that same collation (which would be the default behavior in most cases).
It's still true that comparisons with explicit COLLATE clauses different
from the stored data's collation won't be well-estimated, but that's no
worse than before.  Also, this patch does make the first step towards
doing better with that, which is that it's now theoretically possible to
collect stats for a collation other than the column's own collation.

Patch by me; thanks to Peter Eisentraut for review.

Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us
2018-12-14 12:52:49 -05:00
Tom Lane 04fe805a17 Drop no-op CoerceToDomain nodes from expressions at planning time.
If a domain has no constraints, then CoerceToDomain doesn't really do
anything and can be simplified to a RelabelType.  This not only
eliminates cycles at execution, but allows the planner to optimize better
(for instance, match the coerced expression to an index on the underlying
column).  However, we do have to support invalidating the plan later if
a constraint gets added to the domain.  That's comparable to the case of
a change to a SQL function that had been inlined into a plan, so all the
necessary logic already exists for plans depending on functions.  We
need only duplicate or share that logic for domains.

ALTER DOMAIN ADD/DROP CONSTRAINT need to be taught to send out sinval
messages for the domain's pg_type entry, since those operations don't
update that row.  (ALTER DOMAIN SET/DROP NOT NULL do update that row,
so no code change is needed for them.)

Testing this revealed what's really a pre-existing bug in plpgsql:
it caches the SQL-expression-tree expansion of type coercions and
had no provision for invalidating entries in that cache.  Up to now
that was only a problem if such an expression had inlined a SQL
function that got changed, which is unlikely though not impossible.
But failing to track changes of domain constraints breaks an existing
regression test case and would likely cause practical problems too.

We could fix that locally in plpgsql, but what seems like a better
idea is to build some generic infrastructure in plancache.c to store
standalone expressions and track invalidation events for them.
(It's tempting to wonder whether plpgsql's "simple expression" stuff
could use this code with lower overhead than its current use of the
heavyweight plancache APIs.  But I've left that idea for later.)

Other stuff fixed in passing:

* Allow estimate_expression_value() to drop CoerceToDomain
unconditionally, effectively assuming that the coercion will succeed.
This will improve planner selectivity estimates for cases involving
estimatable expressions that are coerced to domains.  We could have
done this independently of everything else here, but there wasn't
previously any need for eval_const_expressions_mutator to know about
CoerceToDomain at all.

* Use a dlist for plancache.c's list of cached plans, rather than a
manually threaded singly-linked list.  That eliminates a potential
performance problem in DropCachedPlan.

* Fix a couple of inconsistencies in typecmds.c about whether
operations on domains drop RowExclusiveLock on pg_type.  Our common
practice is that DDL operations do drop catalog locks, so standardize
on that choice.

Discussion: https://postgr.es/m/19958.1544122124@sss.pgh.pa.us
2018-12-13 13:24:43 -05:00
Tom Lane 5deadfef28 Fix misapplication of pgstat_count_truncate to wrong relation.
The stanza of ExecuteTruncate[Guts] that truncates a target table's toast
relation re-used the loop local variable "rel" to reference the toast rel.
This was safe enough when written, but commit d42358efb added code below
that that supposed "rel" still pointed to the parent table.  Therefore,
the stats counter update was applied to the wrong relcache entry (the
toast rel not the user rel); and if we were unlucky and that relcache
entry had been flushed during reindex_relation, very bad things could
ensue.

(I'm surprised that CLOBBER_CACHE_ALWAYS testing hasn't found this.
I'm even more surprised that the problem wasn't detected during the
development of d42358efb; it must not have been tested in any case
with a toast table, as the incorrect stats counts are very obvious.)

To fix, replace use of "rel" in that code branch with a more local
variable.  Adjust test cases added by d42358efb so that some of them
use tables with toast tables.

Per bug #15540 from Pan Bian.  Back-patch to 9.5 where d42358efb came in.

Discussion: https://postgr.es/m/15540-01078812338195c0@postgresql.org
2018-12-07 12:11:59 -05:00
Tom Lane 9286ef8e91 Clean up sloppy coding in publicationcmds.c's OpenTableList().
Remove dead code (which would be incorrect if it weren't dead),
per report from Pan Bian.  Add a CHECK_FOR_INTERRUPTS in the
inner loop over child relations, because there's little point
in having one in the outer loop if there's not one here too.
Minor stylistic adjustments and comment improvements.

Seems to be aboriginal to this code (cf commit 665d1fad9).
Back-patch to v10 where that came in, not because any of this
is significant, but just to keep the branches looking similar.

Discussion: https://postgr.es/m/15539-06d00ef6b1e2e1bb@postgresql.org
2018-12-07 11:02:39 -05:00
Michael Paquier 730422afcd Fix some errhint and errdetail strings missing a period
As per the error message style guide of the documentation, those should
be full sentences.

Author: Daniel Gustafsson
Reviewed-by: Michael Paquier, Álvaro Herrera
Discussion: https://1E8D49B4-16BC-4420-B4ED-58501D9E076B@yesql.se
2018-12-07 07:47:42 +09:00
Alvaro Herrera 71a05b2232 Don't mark partitioned indexes invalid unnecessarily
When an indexes is created on a partitioned table using ONLY (don't
recurse to partitions), it gets marked invalid until index partitions
are attached for each table partition.  But there's no reason to do this
if there are no partitions ... and moreover, there's no way to get the
index to become valid afterwards, because all partitions that get
created/attached get their own index partition already attached to the
parent index, so there's no chance to do ALTER INDEX ... ATTACH PARTITION
that would make the parent index valid.

Fix by not marking the index as invalid to begin with.

This is very similar to 9139aa1942, but the pg_dump aspect does not
appear to be relevant until we add FKs that can point to PKs on
partitioned tables.  (I tried to cause the pg_upgrade test to break by
leaving some of these bogus tables around, but wasn't able to.)

Making this change means that an index that was supposed to be invalid
in the insert_conflict regression test is no longer invalid; reorder the
DDL so that the test continues to verify the behavior we want it to.

Author: Álvaro Herrera
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/20181203225019.2vvdef2ybnkxt364@alvherre.pgsql
2018-12-05 13:31:51 -03:00
Peter Eisentraut 2dedf4d9a8 Integrate recovery.conf into postgresql.conf
recovery.conf settings are now set in postgresql.conf (or other GUC
sources).  Currently, all the affected settings are PGC_POSTMASTER;
this could be refined in the future case by case.

Recovery is now initiated by a file recovery.signal.  Standby mode is
initiated by a file standby.signal.  The standby_mode setting is
gone.  If a recovery.conf file is found, an error is issued.

The trigger_file setting has been renamed to promote_trigger_file as
part of the move.

The documentation chapter "Recovery Configuration" has been integrated
into "Server Configuration".

pg_basebackup -R now appends settings to postgresql.auto.conf and
creates a standby.signal file.

Author: Fujii Masao <masao.fujii@gmail.com>
Author: Simon Riggs <simon@2ndquadrant.com>
Author: Abhijit Menon-Sen <ams@2ndquadrant.com>
Author: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/607741529606767@web3g.yandex.ru/
2018-11-25 16:33:40 +01:00
Alvaro Herrera de38ce1b83 Don't allow partitioned indexes in pg_global tablespace
Missing in dfa6081419.

Author: David Rowley
Discussion: https://postgr.es/m/CAKJS1f-M3NMTCpv=vDfkoqHbMPFf=3-Z1ud=+1DHH00tC+zLaQ@mail.gmail.com
2018-11-23 08:48:20 -03:00
Andres Freund 578b229718 Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.

This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row.  Neither pg_dump nor COPY included the contents of the
oid column by default.

The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.

WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.

Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
  WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
  issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
  restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
  OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
  plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.

The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.

The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such.  This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.

The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.

Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).

The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.

While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.

Catversion bump, for obvious reasons.

Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-20 16:00:17 -08:00
Tom Lane cb09903fe6 Add needed #include.
Per POSIX, WIFSIGNALED and related macros are provided by <sys/wait.h>.
Apparently on Linux they're also pulled in by some other inclusions,
but BSD-ish systems are pickier.  Fixes portability issue in ffa4cbd62.

Per buildfarm.
2018-11-19 17:28:04 -05:00
Tom Lane ffa4cbd623 Handle EPIPE more sanely when we close a pipe reading from a program.
Previously, any program launched by COPY TO/FROM PROGRAM inherited the
server's setting of SIGPIPE handling, i.e. SIG_IGN.  Hence, if we were
doing COPY FROM PROGRAM and closed the pipe early, the child process
would see EPIPE on its output file and typically would treat that as
a fatal error, in turn causing the COPY to report error.  Similarly,
one could get a failure report from a query that didn't read all of
the output from a contrib/file_fdw foreign table that uses file_fdw's
PROGRAM option.

To fix, ensure that child programs inherit SIG_DFL not SIG_IGN processing
of SIGPIPE.  This seems like an all-around better situation since if
the called program wants some non-default treatment of SIGPIPE, it would
expect to have to set that up for itself.  Then in COPY, if it's COPY
FROM PROGRAM and we stop reading short of detecting EOF, treat a SIGPIPE
exit from the called program as a non-error condition.  This still allows
us to report an error for any case where the called program gets SIGPIPE
on some other file descriptor.

As coded, we won't report a SIGPIPE if we stop reading as a result of
seeing an in-band EOF marker (e.g. COPY BINARY EOF marker).  It's
somewhat debatable whether we should complain if the called program
continues to transmit data after an EOF marker.  However, it seems like
we should avoid throwing error in any questionable cases, especially in a
back-patched fix, and anyway it would take additional code to make such
an error get reported consistently.

Back-patch to v10.  We could go further back, since COPY FROM PROGRAM
has been around awhile, but AFAICS the only way to reach this situation
using core or contrib is via file_fdw, which has only supported PROGRAM
sources since v10.  The COPY statement per se has no feature whereby
it'd stop reading without having hit EOF or an error already.  Therefore,
I don't see any upside to back-patching further that'd outweigh the
risk of complaints about behavioral change.

Per bug #15449 from Eric Cyr.

Patch by me, review by Etsuro Fujita and Kyotaro Horiguchi

Discussion: https://postgr.es/m/15449-1cf737dd5929450e@postgresql.org
2018-11-19 17:02:39 -05:00
Alvaro Herrera 5c9a5513a3 Disallow COPY FREEZE on partitioned tables
This didn't actually work: COPY would fail to flush the right files, and
instead would try to flush a non-existing file, causing the whole
transaction to fail.

Cope by raising an error as soon as the command is sent instead, to
avoid a nasty later surprise.  Of course, it would be much better to
make it work, but we don't have a patch for that yet, and we don't know
if we'll want to backpatch one when we do.

Reported-by: Tomas Vondra
Author: David Rowley
Reviewed-by: Amit Langote, Steve Singer, Tomas Vondra
2018-11-19 11:16:28 -03:00
Andres Freund 4da597edf1 Make TupleTableSlots extensible, finish split of existing slot type.
This commit completes the work prepared in 1a0586de36, splitting the
old TupleTableSlot implementation (which could store buffer, heap,
minimal and virtual slots) into four different slot types.  As
described in the aforementioned commit, this is done with the goal of
making tuple table slots extensible, to allow for pluggable table
access methods.

To achieve runtime extensibility for TupleTableSlots, operations on
slots that can differ between types of slots are performed using the
TupleTableSlotOps struct provided at slot creation time.  That
includes information from the size of TupleTableSlot struct to be
allocated, initialization, deforming etc.  See the struct's definition
for more detailed information about callbacks TupleTableSlotOps.

I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and
ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more
consistent with other naming introduced in recent patches.

There's plenty optimization potential in the slot implementation, but
according to benchmarking the state after this commit has similar
performance characteristics to before this set of changes, which seems
sufficient.

There's a few changes in execReplication.c that currently need to poke
through the slot abstraction, that'll be repaired once the pluggable
storage patchset provides the necessary infrastructure.

Author: Andres Freund and  Ashutosh Bapat, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16 16:35:15 -08:00
Alvaro Herrera 3f2393edef Redesign initialization of partition routing structures
This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well
as the future MERGE) on partitioned tables.

This changes the setup for tuple routing so that it does far less work
during the initial setup and pushes more work out to when partitions
receive tuples.  PartitionDispatchData structs for sub-partitioned
tables are only created when a tuple gets routed through it.  The
possibly large arrays in the PartitionTupleRouting struct have largely
been removed.  The partitions[] array remains but now never contains any
NULL gaps.  Previously the NULLs had to be skipped during
ExecCleanupTupleRouting(), which could add a large overhead to the
cleanup when the number of partitions was large.  The partitions[] array
is allocated small to start with and only enlarged when we route tuples
to enough partitions that it runs out of space. This allows us to keep
simple single-row partition INSERTs running quickly.  Redesign

The arrays in PartitionTupleRouting which stored the tuple translation maps
have now been removed.  These have been moved out into a
PartitionRoutingInfo struct which is an additional field in ResultRelInfo.

The find_all_inheritors() call still remains by far the slowest part of
ExecSetupPartitionTupleRouting(). This commit just removes the other slow
parts.

In passing also rename the tuple translation maps from being ParentToChild
and ChildToParent to being RootToPartition and PartitionToRoot. The old
names mislead you into thinking that a partition of some sub-partitioned
table would translate to the rowtype of the sub-partitioned table rather
than the root partitioned table.

Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera
Testing help from Jesper Pedersen and Kato Sho.
Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com
2018-11-16 15:01:05 -03:00
Andres Freund 1a0586de36 Introduce notion of different types of slots (without implementing them).
Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.

Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots.  Thus different types of
slots are needed, which requires adapting creators of slots.

The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.

Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.

But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.

Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().

The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.

This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit.  The different types of slots introduced will, for now, still
use the same backing implementation.

While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.

Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 22:00:30 -08:00
Andres Freund 763f2edd92 Rejigger materializing and fetching a HeapTuple from a slot.
Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot.  For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.

For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot.  Those places will be adapted in due course, so while
not pretty, that's OK for now.

Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.

Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-15 14:31:12 -08:00
Peter Eisentraut 1b5d797cd4 Lower lock level for renaming indexes
Change lock level for renaming index (either ALTER INDEX or implicitly
via some other commands) from AccessExclusiveLock to
ShareUpdateExclusiveLock.

One reason we need a strong lock for relation renaming is that the
name change causes a rebuild of the relcache entry.  Concurrent
sessions that have the relation open might not be able to handle the
relcache entry changing underneath them.  Therefore, we need to lock
the relation in a way that no one can have the relation open
concurrently.  But for indexes, the relcache handles reloads specially
in RelationReloadIndexInfo() in a way that keeps changes in the
relcache entry to a minimum.  As long as no one keeps pointers to
rd_amcache and rd_options around across possible relcache flushes,
which is the case, this ought to be safe.

We also want to use a self-exclusive lock for correctness, so that
concurrent DDL doesn't overwrite the rename if they start updating
while still seeing the old version.  Therefore, we use
ShareUpdateExclusiveLock, which is already used by other DDL commands
that want to operate in a concurrent manner.

The reason this is interesting at all is that renaming an index is a
typical part of a concurrent reindexing workflow (CREATE INDEX
CONCURRENTLY new + DROP INDEX CONCURRENTLY old + rename back).  And
indeed a future built-in REINDEX CONCURRENTLY might rely on the ability
to do concurrent renames as well.

Reviewed-by: Andrey Klychkov <aaklychkov@mail.ru>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/1531767486.432607658@f357.i.mail.ru
2018-11-14 17:09:54 +01:00
Michael Paquier 52b70b1c7d Remove CommandCounterIncrement() after processing ON COMMIT DELETE
This comes from f9b5b41, which is part of one the original commits that
implemented ON COMMIT actions.  By looking at the truncation code, any
CCI needed happens locally when rebuilding indexes, so it looks safe to
just remove this final incrementation.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20181109024731.GF2652@paquier.xyz
2018-11-13 08:59:41 +09:00
Tom Lane 3de491482b Simplify null-element handling in extension_config_remove().
There's no point in asking deconstruct_array() for a null-flags
array when we already checked the array has no nulls, and aren't
going to examine the output anyhow.  Not asking for this output
should make the code marginally faster, and it's also more
robust since if there somehow were nulls, deconstruct_array()
would throw an error.

Daniel Gustafsson

Discussion: https://postgr.es/m/289FFB8B-7AAB-48B5-A497-6E0D41D7BA47@yesql.se
2018-11-12 11:50:28 -05:00
Peter Eisentraut 69ee2ff930 Apply RI trigger skipping tests also for DELETE
The tests added in cfa0f4255b to skip
firing an RI trigger if any old key value is NULL can also be applied
for DELETE.  This should give a performance gain in those cases, and it
also saves a lot of duplicate code in the actual RI triggers.  (That
code was already dead code for the UPDATE cases.)

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2018-11-10 16:14:51 +01:00
Tom Lane fa2952d8eb Fix missing role dependencies for some schema and type ACLs.
This patch fixes several related cases in which pg_shdepend entries were
never made, or were lost, for references to roles appearing in the ACLs of
schemas and/or types.  While that did no immediate harm, if a referenced
role were later dropped, the drop would be allowed and would leave a
dangling reference in the object's ACL.  That still wasn't a big problem
for normal database usage, but it would cause obscure failures in
subsequent dump/reload or pg_upgrade attempts, taking the form of
attempts to grant privileges to all-numeric role names.  (I think I've
seen field reports matching that symptom, but can't find any right now.)

Several cases are fixed here:

1. ALTER DOMAIN SET/DROP DEFAULT would lose the dependencies for any
existing ACL entries for the domain.  This case is ancient, dating
back as far as we've had pg_shdepend tracking at all.

2. If a default type privilege applies, CREATE TYPE recorded the
ACL properly but forgot to install dependency entries for it.
This dates to the addition of default privileges for types in 9.2.

3. If a default schema privilege applies, CREATE SCHEMA recorded the
ACL properly but forgot to install dependency entries for it.
This dates to the addition of default privileges for schemas in v10
(commit ab89e465c).

Another somewhat-related problem is that when creating a relation
rowtype or implicit array type, TypeCreate would apply any available
default type privileges to that type, which we don't really want
since such an object isn't supposed to have privileges of its own.
(You can't, for example, drop such privileges once they've been added
to an array type.)

ab89e465c is also to blame for a race condition in the regression tests:
privileges.sql transiently installed globally-applicable default
privileges on schemas, which sometimes got absorbed into the ACLs of
schemas created by concurrent test scripts.  This should have resulted
in failures when privileges.sql tried to drop the role holding such
privileges; but thanks to the bug fixed here, it instead led to dangling
ACLs in the final state of the regression database.  We'd managed not to
notice that, but it became obvious in the wake of commit da906766c, which
allowed the race condition to occur in pg_upgrade tests.

To fix, add a function recordDependencyOnNewAcl to encapsulate what
callers of get_user_default_acl need to do; while the original call
sites got that right via ad-hoc code, none of the later-added ones
have.  Also change GenerateTypeDependencies to generate these
dependencies, which requires adding the typacl to its parameter list.
(That might be annoying if there are any extensions calling that
function directly; but if there are, they're most likely buggy in the
same way as the core callers were, so they need work anyway.)  While
I was at it, I changed GenerateTypeDependencies to accept most of its
parameters in the form of a Form_pg_type pointer, making its parameter
list a bit less unwieldy and mistake-prone.

The test race condition is fixed just by wrapping the addition and
removal of default privileges into a single transaction, so that that
state is never visible externally.  We might eventually prefer to
separate out tests of default privileges into a script that runs by
itself, but that would be a bigger change and would make the tests
run slower overall.

Back-patch relevant parts to all supported branches.

Discussion: https://postgr.es/m/15719.1541725287@sss.pgh.pa.us
2018-11-09 20:42:14 -05:00
Michael Paquier 319a810180 Fix dependency handling of partitions and inheritance for ON COMMIT
This commit fixes a set of issues with ON COMMIT actions when used on
partitioned tables and tables with inheritance children:
- Applying ON COMMIT DROP on a partitioned table with partitions or on a
table with inheritance children caused a failure at commit time, with
complains about the children being already dropped as all relations are
dropped one at the same time.
- Applying ON COMMIT DELETE on a partition relying on a partitioned
table which uses ON COMMIT DROP would cause the partition truncation to
fail as the parent is removed first.

The solution to the first problem is to handle the removal of all the
dependencies in one go instead of dropping relations one-by-one, based
on a suggestion from Álvaro Herrera.  So instead all the relation OIDs
to remove are gathered and then processed in one round of multiple
deletions.

The solution to the second problem is to reorder the actions, with
truncation happening first and relation drop done after.  Even if it
means that a partition could be first truncated, then immediately
dropped if its partitioned table is dropped, this has the merit to keep
the code simple as there is no need to do existence checks on the
relations to drop.

Contrary to a manual TRUNCATE on a partitioned table, ON COMMIT DELETE
does not cascade to its partitions.  The ON COMMIT action defined on
each partition gets the priority.

Author: Michael Paquier
Reviewed-by: Amit Langote, Álvaro Herrera, Robert Haas
Discussion: https://postgr.es/m/68f17907-ec98-1192-f99f-8011400517f5@lab.ntt.co.jp
Backpatch-through: 10
2018-11-09 10:03:22 +09:00
Alvaro Herrera 705d433fd5 Revise attribute handling code on partition creation
The original code to propagate NOT NULL and default expressions
specified when creating a partition was mostly copy-pasted from
typed-tables creation, but not being a great match it contained some
duplicity, inefficiency and bugs.

This commit fixes the bug that NOT NULL constraints declared in the
parent table would not be honored in the partition.  One reported issue
that is not fixed is that a DEFAULT declared in the child is not used
when inserting through the parent.  That would amount to a behavioral
change that's better not back-patched.

This rewrite makes the code simpler:

1. instead of checking for duplicate column names in its own block,
reuse the original one that already did that;

2. instead of concatenating the list of columns from parent and the one
declared in the partition and scanning the result to (incorrectly)
propagate defaults and not-null constraints, just scan the latter
searching the former for a match, and merging sensibly.  This works
because we know the list in the parent is already correct and there can
only be one parent.

This rewrite makes ColumnDef->is_from_parent unused, so it's removed
on branch master; on released branches, it's kept as an unused field in
order not to cause ABI incompatibilities.

This commit also adds a test case for creating partitions with
collations mismatching that on the parent table, something that is
closely related to the code being patched.  No code change is introduced
though, since that'd be a behavior change that could break some (broken)
working applications.

Amit Langote wrote a less invasive fix for the original
NOT NULL/defaults bug, but while I kept the tests he added, I ended up
not using his original code.  Ashutosh Bapat reviewed Amit's fix.  Amit
reviewed mine.

Author: Álvaro Herrera, Amit Langote
Reviewed-by: Ashutosh Bapat, Amit Langote
Reported-by: Jürgen Strobel (bug #15212)
Discussion: https://postgr.es/m/152746742177.1291.9847032632907407358@wrigleys.postgresql.org
2018-11-08 16:22:09 -03:00
Michael Paquier dc3e436b19 Block creation of partitions with open references to its parent
When a partition is created as part of a trigger processing, it is
possible that the partition which just gets created changes the
properties of the table the executor of the ongoing command relies on,
causing a subsequent crash.  This has been found possible when for
example using a BEFORE INSERT which creates a new partition for a
partitioned table being inserted to.

Any attempt to do so is blocked when working on a partition, with
regression tests added for both CREATE TABLE PARTITION OF and ALTER
TABLE ATTACH PARTITION.

Reported-by: Dmitry Shalashov
Author: Amit Langote
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/15437-3fe01ee66bd1bae1@postgresql.org
Backpatch-through: 10
2018-11-05 11:04:02 +09:00
Tom Lane 9b6fb9fbb4 Fix ExecuteCallStmt to not scribble on the passed-in parse tree.
Modifying the parse tree at execution time is, or at least ought to be,
verboten.  It seems quite difficult to actually cause a crash this way
in v11 (although you can exhibit it pretty easily in HEAD by messing
with plan_cache_mode).  Nonetheless, it's risky, so fix and back-patch.

Discussion: https://postgr.es/m/13789.1541359611@sss.pgh.pa.us
2018-11-04 14:50:55 -05:00
Alvaro Herrera dfa6081419 Fix tablespace handling for partitioned indexes
When creating partitioned indexes, the tablespace was not being saved
for the parent index. This meant that subsequently created partitions
would not use the right tablespace for their indexes.

ALTER INDEX SET TABLESPACE and ALTER INDEX ALL IN TABLESPACE raised
errors when tried; fix them too.  This requires bespoke code for
ATExecCmd() that applies to the special case when the tablespace move is
just a catalog change.

Discussion: https://postgr.es/m/20181102003138.uxpaca6qfxzskepi@alvherre.pgsql
2018-11-03 13:25:19 -03:00
Magnus Hagander fbec7459aa Fix spelling errors and typos in comments
Author: Daniel Gustafsson <daniel@yesql.se>
2018-11-02 13:56:52 +01:00
Peter Eisentraut 96b00c433c Remove obsolete pg_constraint.consrc column
This has been deprecated and effectively unused for a long time.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2018-11-01 20:36:05 +01:00
Peter Eisentraut 5d7c703a44 Remove get_attidentity()
All existing uses can get this information more easily from the
relation descriptor, so the detour through the syscache is not
necessary.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-10-23 14:47:14 +02:00
Michael Paquier 17f206fbc8 Set pg_class.relhassubclass for partitioned indexes
Like for relations, switching this parameter is optimistic by turning it
on each time a partitioned index gains a partition.  So seeing this
parameter set to true means that the partitioned index has or has had
partitions.  The flag cannot be reset yet for partitioned indexes, which
is something not obvious anyway as partitioned relations exist to have
partitions.

This allows to track more conveniently partition trees for indexes,
which will come in use with an upcoming patch helping in listing
partition trees with an SQL-callable function.

Author: Amit Langote
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/80306490-b5fc-ea34-4427-f29c52156052@lab.ntt.co.jp
2018-10-22 11:04:48 +09:00
Peter Eisentraut a7a1b44516 Fix crash in multi-insert COPY
A bug introduced in 0d5f05cde0
considered the *previous* partition's triggers when deciding whether
multi-insert can be used.  Rearrange the code so that the current
partition is considered.

Author: Ashutosh Sharma <ashu.coek88@gmail.com>
2018-10-17 20:31:20 +02:00
Andres Freund 62649bad83 Correct constness of a few variables.
This allows the compiler / linker to mark affected pages as read-only.

There's other cases, but they're a bit more invasive, and should go
through some review. These are easy.

They were found with
objdump -j .data -t src/backend/postgres|awk '{print $4, $5, $6}'|sort -r|less

Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de
2018-10-15 21:01:14 -07:00
Alvaro Herrera c7d43c4d8a Correct attach/detach logic for FKs in partitions
There was no code to handle foreign key constraints on partitioned
tables in the case of ALTER TABLE DETACH; and if you happened to ATTACH
a partition that already had an equivalent constraint, that one was
ignored and a new constraint was created.  Adding this to the fact that
foreign key cloning reuses the constraint name on the partition instead
of generating a new name (as it probably should, to cater to SQL
standard rules about constraint naming within schemas), the result was a
pretty poor user experience -- the most visible failure was that just
detaching a partition and re-attaching it failed with an error such as

  ERROR:  duplicate key value violates unique constraint "pg_constraint_conrelid_contypid_conname_index"
  DETAIL:  Key (conrelid, contypid, conname)=(26702, 0, test_result_asset_id_fkey) already exists.

because it would try to create an identically-named constraint in the
partition.  To make matters worse, if you tried to drop the constraint
in the now-independent partition, that would fail because the constraint
was still seen as dependent on the constraint in its former parent
partitioned table:
  ERROR:  cannot drop inherited constraint "test_result_asset_id_fkey" of relation "test_result_cbsystem_0001_0050_monthly_2018_09"

This fix attacks the problem from two angles: first, when the partition
is detached, the constraint is also marked as independent, so the drop
now works.  Second, when the partition is re-attached, we scan existing
constraints searching for one matching the FK in the parent, and if one
exists, we link that one to the parent constraint.  So we don't end up
with a duplicate -- and better yet, we don't need to scan the referenced
table to verify that the constraint holds.

To implement this I made a small change to previously planner-only
struct ForeignKeyCacheInfo to contain the constraint OID; also relcache
now maintains the list of FKs for partitioned tables too.

Backpatch to 11.

Reported-by: Michael Vitale (bug #15425)
Discussion: https://postgr.es/m/15425-2dbc9d2aa999f816@postgresql.org
2018-10-12 12:37:37 -03:00
Peter Eisentraut f8c10f616f Turn transaction_isolation into GUC enum
It was previously a string setting that was converted into an enum by
custom code, but using the GUC enum facility seems much simpler and
doesn't change any functionality, except that

    set transaction_isolation='default';

no longer works, but that was never documented and doesn't work with
any other transaction characteristics.  (Note that this is not the
same as RESET or SET TO DEFAULT, which still work.)

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/457db615-e84c-4838-310e-43841eb806e5@iki.fi
2018-10-09 21:26:00 +02:00
Thomas Munro 212fab9926 Relax transactional restrictions on ALTER TYPE ... ADD VALUE (redux).
Originally committed as 15bc038f (plus some follow-ups), this was
reverted in 28e07270 due to a problem discovered in parallel
workers.  This new version corrects that problem by sending the
list of uncommitted enum values to parallel workers.

Here follows the original commit message describing the change:

To prevent possibly breaking indexes on enum columns, we must keep
uncommitted enum values from getting stored in tables, unless we
can be sure that any such column is new in the current transaction.

Formerly, we enforced this by disallowing ALTER TYPE ... ADD VALUE
from being executed at all in a transaction block, unless the target
enum type had been created in the current transaction.  This patch
removes that restriction, and instead insists that an uncommitted enum
value can't be referenced unless it belongs to an enum type created
in the same transaction as the value.  Per discussion, this should be
a bit less onerous.  It does require each function that could possibly
return a new enum value to SQL operations to check this restriction,
but there aren't so many of those that this seems unmaintainable.

Author: Andrew Dunstan and Tom Lane, with parallel query fix by Thomas Munro
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAEepm%3D0Ei7g6PaNTbcmAh9tCRahQrk%3Dr5ZWLD-jr7hXweYX3yg%40mail.gmail.com
Discussion: https://postgr.es/m/4075.1459088427%40sss.pgh.pa.us
2018-10-09 12:51:01 +13:00
Alvaro Herrera eee01d606e Silence compiler warning in Assert()
gcc 6.3 does not whine about this mistake I made in 39808e8868 but
evidently lots of other compilers do, according to Michael Paquier,
Peter Eisentraut, Arthur Zakirov, Tomas Vondra.

Discussion: too many to list
2018-10-08 10:37:21 -03:00
Peter Eisentraut 634b4b79cb Track procedure calls in pg_stat_user_functions
This was forgotten when procedures were implemented.

Reported-by: Lukas Fittl <lukas@fittl.com>
2018-10-08 11:22:53 +02:00
Michael Paquier 9c2a970d1f Improve two error messages related to foreign keys on partitioned tables
Error messages for creating a foreign key on a partitioned table using
ONLY or NOT VALID were wrong in mentioning the objects they worked on.
This commit adds on the way some regression tests missing for those
cases.

Author: Laurenz Albe
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/c11c05810a9ed65e9b2c817a9ef442275a32fe80.camel@cybertec.at
2018-10-08 17:56:13 +09:00
Alvaro Herrera 39808e8868 Fix catalog insertion order for ATTACH PARTITION
Commit 2fbdf1b38b changed the order in which we inserted catalog rows
when creating partitions, so that we could remove an unsightly hack
required for untimely relcache invalidations.  However, that commit only
changed the ordering for CREATE TABLE PARTITION OF, and left ALTER TABLE
ATTACH PARTITION unchanged, so the latter can be affected when catalog
invalidations occur, for instance when the partition key involves an SQL
function.

Reported-by: Rajkumar Raghuwanshi
Author: Amit Langote
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/CAKcux6=nTz9KSfTr_6Z2mpzLJ_09JN-rK6=dWic6gGyTSWueyQ@mail.gmail.com
2018-10-06 22:13:19 -03:00
Alvaro Herrera ad08006ba0 Fix event triggers for partitioned tables
Index DDL cascading on partitioned tables introduced a way for ALTER
TABLE to be called reentrantly.  This caused an an important deficiency
in event trigger support to be exposed: on exiting the reentrant call,
the alter table state object was clobbered, causing a crash when the
outer alter table tries to finalize its processing.  Fix the crash by
creating a stack of event trigger state objects.  There are still ways
to cause things to misbehave (and probably other crashers) with more
elaborate tricks, but at least it now doesn't crash in the obvious
scenario.

Backpatch to 9.5, where DDL deparsing of event triggers was introduced.

Reported-by: Marco Slot
Authors: Michaël Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/CANNhMLCpi+HQ7M36uPfGbJZEQLyTy7XvX=5EFkpR-b1bo0uJew@mail.gmail.com
2018-10-06 19:17:46 -03:00
Michael Paquier 38921d1416 Assign constraint name when cloning FK definition for partitions
This is for example used when attaching a partition to a partitioned
table which includes foreign keys, and in this case the constraint name
has been missing in the data cloned.  This could lead to hard crashes,
as when validating the foreign key constraint, the constraint name is
always expected.  Particularly, when using log_min_messages >= DEBUG1, a
log message would be generated with this unassigned constraint name,
leading to an assertion failure on HEAD.

While on it, rename a variable in ATExecAttachPartition which was
declared twice with the same name.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20181005042236.GG1629@paquier.xyz
Backpatch-through: 11
2018-10-06 14:59:36 +09:00
Tom Lane d73f4c74dd In the executor, use an array of pointers to access the rangetable.
Instead of doing a lot of list_nth() accesses to es_range_table,
create a flattened pointer array during executor startup and index
into that to get at individual RangeTblEntrys.

This eliminates one source of O(N^2) behavior with lots of partitions.
(I'm not exactly convinced that it's the most important source, but
it's an easy one to fix.)

Amit Langote and David Rowley

Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-10-04 15:48:17 -04:00
Tom Lane 9ddef36278 Centralize executor's opening/closing of Relations for rangetable entries.
Create an array estate->es_relations[] paralleling the es_range_table,
and store references to Relations (relcache entries) there, so that any
given RT entry is opened and closed just once per executor run.  Scan
nodes typically still call ExecOpenScanRelation, but ExecCloseScanRelation
is no more; relation closing is now done centrally in ExecEndPlan.

This is slightly more complex than one would expect because of the
interactions with relcache references held in ResultRelInfo nodes.
The general convention is now that ResultRelInfo->ri_RelationDesc does
not represent a separate relcache reference and so does not need to be
explicitly closed; but there is an exception for ResultRelInfos in the
es_trig_target_relations list, which are manufactured by
ExecGetTriggerResultRel and have to be cleaned up by
ExecCleanUpTriggerState.  (That much was true all along, but these
ResultRelInfos are now more different from others than they used to be.)

To allow the partition pruning logic to make use of es_relations[] rather
than having its own relcache references, adjust PartitionedRelPruneInfo
to store an RT index rather than a relation OID.

Amit Langote, reviewed by David Rowley and Jesper Pedersen,
some mods by me

Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-10-04 14:03:42 -04:00
Michael Paquier 803b1301e8 Add option SKIP_LOCKED to VACUUM and ANALYZE
When specified, this option allows VACUUM to skip the work on a relation
if there is a conflicting lock on it when trying to open it at the
beginning of its processing.

Similarly to autovacuum, this comes with a couple of limitations while
the relation is processed which can cause the process to still block:
- when opening the relation indexes.
- when acquiring row samples for table inheritance trees, partition trees
or certain types of foreign tables, and that a lock is taken on some
leaves of such trees.

Author: Nathan Bossart
Reviewed-by: Michael Paquier, Andres Freund, Masahiko Sawada
Discussion: https://postgr.es/m/9EF7EBE4-720D-4CF1-9D0E-4403D7E92990@amazon.com
Discussion: https://postgr.es/m/20171201160907.27110.74730@wrigleys.postgresql.org
2018-10-04 09:00:33 +09:00
Andres Freund c03c1449c0 Fix issues around EXPLAIN with JIT.
I (Andres) was more than a bit hasty in committing 33001fd7a7
after last minute changes, leading to a number of problems (jit output
was only shown for JIT in parallel workers, and just EXPLAIN without
ANALYZE didn't work).  Lukas luckily found these issues quickly.

Instead of combining instrumentation in in standard_ExecutorEnd(), do
so on demand in the new ExplainPrintJITSummary().

Also update a documentation example of the JIT output, changed in
52050ad8eb.

Author: Lukas Fittl, with minor changes by me
Discussion: https://postgr.es/m/CAP53PkxmgJht69pabxBXJBM+0oc6kf3KHMborLP7H2ouJ0CCtQ@mail.gmail.com
Backpatch: 11, where JIT compilation was introduced
2018-10-03 12:48:37 -07:00
Andres Freund cc2905e963 Use slots more widely in tuple mapping code and make naming more consistent.
It's inefficient to use a single slot for mapping between tuple
descriptors for multiple tuples, as previously done when using
ConvertPartitionTupleSlot(), as that means the slot's tuple descriptors
change for every tuple.

Previously we also, via ConvertPartitionTupleSlot(), built new tuples
after the mapping even in cases where we, immediately afterwards,
access individual columns again.

Refactor the code so one slot, on demand, is used for each
partition. That avoids having to change the descriptor (and allows to
use the more efficient "fixed" tuple slots). Then use slot->slot
mapping, to avoid unnecessarily forming a tuple.

As the naming between the tuple and slot mapping functions wasn't
consistent, rename them to execute_attr_map_{tuple,slot}.  It's likely
that we'll also rename convert_tuples_by_* to denote that these
functions "only" build a map, but that's left for later.

Author: Amit Khandekar and Amit Langote, editorialized by me
Reviewed-By: Amit Langote, Amit Khandekar, Andres Freund
Discussion:
    https://postgr.es/m/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB+Jvpg@mail.gmail.com
    https://postgr.es/m/e4f9d743-cd4b-efb0-7574-da21d86a7f36%40lab.ntt.co.jp
Backpatch: -
2018-10-02 11:14:26 -07:00
Michael Paquier e3a25ab9ea Refactor relation opening for VACUUM and ANALYZE
VACUUM and ANALYZE share similar logic when it comes to opening a
relation to work on in terms of how the relation is opened, in which
order locks are tried and how logs should be generated when something
does not work as expected.

This commit refactors things so as both use the same code path to handle
the way a relation is opened, so as the integration of new options
becomes easier.

Author: Michael Paquier
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/20180927075152.GT1659@paquier.xyz
2018-10-02 08:53:38 +09:00
Tom Lane e27453bd83 Fix ALTER COLUMN TYPE to not open a relation without any lock.
If the column being modified is referenced by a foreign key constraint
of another table, ALTER TABLE would open the other table (to re-parse
the constraint's definition) without having first obtained a lock on it.
This was evidently intentional, but that doesn't mean it's really safe.
It's especially not safe in 9.3, which pre-dates use of MVCC scans for
catalog reads, but even in current releases it doesn't seem like a good
idea.

We know we'll need AccessExclusiveLock shortly to drop the obsoleted
constraint, so just get that a little sooner to close the hole.

Per testing with a patch that complains if we open a relation without
holding any lock on it.  I don't plan to back-patch that patch, but we
should close the holes it identifies in all supported branches.

Discussion: https://postgr.es/m/2038.1538335244@sss.pgh.pa.us
2018-10-01 11:39:13 -04:00
Tom Lane fdba460a26 Create an RTE field to record the query's lock mode for each relation.
Add RangeTblEntry.rellockmode, which records the appropriate lock mode for
each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock,
or RowExclusiveLock depending on the RTE's role in the query).

This patch creates the field and makes all creators of RTE nodes fill it
in reasonably, but for the moment nothing much is done with it.  The plan
is to replace assorted post-parser logic that re-determines the right
lockmode to use with simple uses of rte->rellockmode.  For now, just add
Asserts in each of those places that the rellockmode matches what they are
computing today.  (In some cases the match isn't perfect, so the Asserts
are weaker than you might expect; but this seems OK, as per discussion.)

This passes check-world for me, but it seems worth pushing in this state
to see if the buildfarm finds any problems in cases I failed to test.

catversion bump due to change of stored rules.

Amit Langote, reviewed by David Rowley and Jesper Pedersen,
and whacked around a bit more by me

Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-09-30 13:55:51 -04:00
Peter Eisentraut 0320ddaf3c Recurse to sequences on ownership change for all relkinds
When a table ownership is changed, we must apply that also to any owned
sequences.  (Otherwise, it would result in a situation that cannot be
restored, because linked sequences must have the same owner as the
table.)  But this was previously only applied to regular tables and
materialized views.  But it should also apply to at least foreign
tables.  This patch removes the relkind check altogether, because it
doesn't save very much and just introduces the possibility of similar
omissions.

Bug: #15238
Reported-by: Christoph Berg <christoph.berg@credativ.de>
2018-09-26 20:19:15 +02:00
Andres Freund 29c94e03c7 Split ExecStoreTuple into ExecStoreHeapTuple and ExecStoreBufferHeapTuple.
Upcoming changes introduce further types of tuple table slots, in
preparation of making table storage pluggable. New storage methods
will have different representation of tuples, therefore the slot
accessor should refer explicitly to heap tuples.

Instead of just renaming the functions, split it into one function
that accepts heap tuples not residing in buffers, and one accepting
ones in buffers.  Previously one function was used for both, but that
was a bit awkward already, and splitting will allow us to represent
slot types for tuples in buffers and normal memory separately.

This is split out from the patch introducing abstract slots, as this
largely consists out of mechanical changes.

Author: Ashutosh Bapat
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
2018-09-25 16:27:48 -07:00
Andres Freund 33001fd7a7 Collect JIT instrumentation from workers.
Previously, when using parallel query, EXPLAIN (ANALYZE)'s JIT
compilation timings did not include the overhead from doing so on the
workers.  Fix that.

We do so by simply aggregating the cost of doing JIT compilation on
workers and the leader together. Arguably that's not quite accurate,
because the total time spend doing so is spent in parallel - but it's
hard to do much better.  For additional detail, when VERBOSE is
specified, the stats for workers are displayed separately.

Author: Amit Khandekar and Andres Freund
Discussion: https://postgr.es/m/CAJ3gD9eLrz51RK_gTkod+71iDcjpB_N8eC6vU2AW-VicsAERpQ@mail.gmail.com
Backpatch: 11-
2018-09-25 13:12:44 -07:00
Andres Freund 52050ad8eb Make EXPLAIN output for JIT compilation more dense.
A discussion about also reporting JIT compilation overhead on workers
brought unhappiness with the verbosity of the current explain format
to light.  Make the text format more dense, and restructure the
structured output to mirror that more closely.

As we're re-jiggering the output format anyway: The denser format
allows us to report all flags for JIT compilation (now also reporting
PGJIT_EXPR and PGJIT_DEFORM), and report the total time in addition to
the individual times.

Per complaint from Tom Lane.

Author: Andres Freund
Discussion: https://postgr.es/m/27812.1537221015@sss.pgh.pa.us
Backpatch: 11-, where JIT compilation was introduced
2018-09-24 13:35:45 -07:00
Andrew Dunstan 7636e5c60f Fast default trigger and expand_tuple fixes
Ensure that triggers get properly filled in tuples for the OLD value.
Also fix the logic of detecting missing null values. The previous logic
failed to detect a missing null column before the first missing column
with a default. Fixing this has simplified the logic a bit.

Regression tests are added to test changes. This should ensure better
coverage of expand_tuple().

Original bug reports, and some code and test scripts from Tomas Vondra

Backpatch to release 11.
2018-09-24 16:11:24 -04:00
Alvaro Herrera 20bef2c311 Fix ALTER/TYPE on columns referenced by FKs in partitioned tables
When ALTER TABLE ... SET DATA TYPE affects a column referenced by
constraints and indexes, it drop those constraints and indexes and
recreates them afterwards, so that the definitions match the new data
type.  The original code did this by dropping one object at a time
(commit 077db40fa1 of May 2004), which worked fine because the
dependencies between the objects were pretty straightforward, and
ordering the objects in a specific way was enough to make this work.
However, when there are foreign key constraints in partitioned tables,
the dependencies are no longer so straightforward, and we were getting
errors when attempted:
  ERROR:  cache lookup failed for constraint 16398

This can be fixed by doing all the drops in one pass instead, using
performMultipleDeletions (introduced by df18c51f29 of Aug 2006).  With
this change we can also remove the code to carefully order the list of
objects to be deleted.

Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAKcux6nWS_m+s=1Udk_U9B+QY7pA-Ac58qR5BdUfOyrwnWHDew@mail.gmail.com
2018-09-14 13:41:20 -03:00
Michael Paquier 28a8fa984c Improve autovacuum logging for aggressive and anti-wraparound runs
A log message was being generated when log_min_duration is reached for
autovacuum on a given relation to indicate if it was an aggressive run,
and missed the point of mentioning if it is doing an anti-wrapround
run.  The log message generated is improved so as one, both or no extra
details are added depending on the option set.

Author: Sergei Kornilov
Reviewed-by: Masahiko Sawada, Michael Paquier
Discussion: https://postgr.es/m/11587951532155118@sas1-19a94364928d.qloud-c.yandex.net
2018-09-14 07:35:39 +09:00
Michael Paquier 514a731ddc Simplify static function in extension.c
An extra argument for the filename defining the extension script
location was present, aimed at being used for error reporting, but has
never been used.  This was around since extensions have been added in
d9572c4.

Author: Yugo Nagata
Reviewed-by: Tatsuo Ishii
Discussion: https://postgr.es/m/20180907180504.1ff19e1675bb44a67e9c7ab1@sraoss.co.jp
2018-09-13 16:56:57 +09:00
Michael Paquier 9226a3b89b Remove duplicated words split across lines in comments
This has been detected using some interesting tricks with sed, and the
method used is mentioned in details in the discussion below.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20180908013109.GB15350@telsasoft.com
2018-09-08 12:24:19 -07:00
Tom Lane 54b01b9293 Remove no-longer-used variable.
Oversight in 2fbdf1b38.  Per buildfarm.
2018-09-05 14:29:58 -04:00
Alvaro Herrera 2fbdf1b38b Simplify partitioned table creation vs. relcache
In the original code, we were storing the pg_inherits row for a
partitioned table too early: enough that we had a hack for relcache to
avoid falling flat on its face while reading such a partial entry.  If
we finish the pg_class creation first and *then* store the pg_inherits
entry, we don't need that hack.

Also recognize that pg_class.relpartbound is not marked NOT NULL and
therefore it's entirely possible to read null values, so having only
Assert() protection isn't enough.  Change those to if/elog tests
instead.  This qualifies as a robustness fix, so backpatch to pg11.

In passing, remove one access that wasn't actually needed, and reword
one message to be like all the others that check for the same thing.

Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/20180903213916.hh6wasnrdg6xv2ud@alvherre.pgsql
2018-09-05 14:36:13 -03:00
Tom Lane 17b7c302b5 Fully enforce uniqueness of constraint names.
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.

Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.

To fix, create a unique index on (conrelid, contypid, conname).  Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain.  (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed.  But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.)  This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.

Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.

Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André.  And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.

While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.

Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 13:45:35 -04:00
Tom Lane 44cac93464 Avoid using potentially-under-aligned page buffers.
There's a project policy against using plain "char buf[BLCKSZ]" local
or static variables as page buffers; preferred style is to palloc or
malloc each buffer to ensure it is MAXALIGN'd.  However, that policy's
been ignored in an increasing number of places.  We've apparently got
away with it so far, probably because (a) relatively few people use
platforms on which misalignment causes core dumps and/or (b) the
variables chance to be sufficiently aligned anyway.  But this is not
something to rely on.  Moreover, even if we don't get a core dump,
we might be paying a lot of cycles for misaligned accesses.

To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock
that the compiler must allocate with sufficient alignment, and use
those in place of plain char arrays.

I used these types even for variables where there's no risk of a
misaligned access, since ensuring proper alignment should make
kernel data transfers faster.  I also changed some places where
we had been palloc'ing short-lived buffers, for coding style
uniformity and to save palloc/pfree overhead.

Since this seems to be a live portability hazard (despite the lack
of field reports), back-patch to all supported versions.

Patch by me; thanks to Michael Paquier for review.

Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
2018-09-01 15:27:17 -04:00
Peter Eisentraut 1e5e4efd02 Error position support for partition specifications
Add support for error position reporting for partition specifications.

Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2018-08-30 08:20:23 +02:00
Peter Eisentraut a4a232b1e7 Error position support for defaults and check constraints
Add support for error position reporting for the expressions contained
in defaults and check constraint definitions.  This currently works only
for CREATE TABLE, not ALTER TABLE, because the latter is not set up to
pass around the original query string.

Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2018-08-30 08:20:23 +02:00
Michael Paquier a556549d7e Improve VACUUM and ANALYZE by avoiding early lock queue
A caller of VACUUM can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a vacuum fill of a
critical catalog table to block even all incoming connection attempts.

Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after
building the list of relations to VACUUM, which can cause vacuum_rel()
or analyze_rel() to try to lock the relation but the operation would
just block.  When the client specifies a list of relations and the
relation needs to be skipped, ownership checks are done when building
the list of relations to work on, preventing a later lock attempt.

vacuum_rel() already had the sanity checks needed, except that those
were applied too late.  This commit refactors the code so as relation
skips are checked beforehand, making it safer to avoid too early locks,
for both manual VACUUM with and without a list of relations specified.

An isolation test is added emulating the fact that early locks do not
happen anymore, issuing a WARNING message earlier if the user calling
VACUUM is not a relation owner.

When a partitioned table is listed in a manual VACUUM or ANALYZE
command, its full list of partitions is fetched, all partitions get
added to the list to work on, and then each one of them is processed one
by one, with ownership checks happening at the later phase of
vacuum_rel() or analyze_rel().  Trying to do early ownership checks for
each partition is proving to be tedious as this would result in deadlock
risks with lock upgrades, and skipping all partitions if the listed
partitioned table is not owned would result in a behavior change
compared to how Postgres 10 has implemented vacuum for partitioned
tables.  The original problem reported related to early lock queue for
critical relations is fixed anyway, so priority is given to avoiding a
backward-incompatible behavior.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20180812222142.GA6097@paquier.xyz
2018-08-27 09:11:12 +09:00
Peter Eisentraut d12782898e Change PROCEDURE to FUNCTION in CREATE OPERATOR syntax
Since procedures are now a different thing from functions, change the
CREATE OPERATOR syntax to use FUNCTION in the clause that specifies the
function.  PROCEDURE is still accepted for compatibility.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>
2018-08-22 14:44:49 +02:00
Peter Eisentraut b19495772e doc: Update uses of the word "procedure"
Historically, the term procedure was used as a synonym for function in
Postgres/PostgreSQL.  Now we have procedures as separate objects from
functions, so we need to clean up the documentation to not mix those
terms.

In particular, mentions of "trigger procedures" are changed to "trigger
functions", and access method "support procedures" are changed to
"support functions".  (The latter already used FUNCTION in the SQL
syntax anyway.)  Also, the terminology in the SPI chapter has been
cleaned up.

A few tests, examples, and code comments are also adjusted to be
consistent with documentation changes, but not everything.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>
2018-08-22 14:44:49 +02:00
Michael Paquier 72be8c29a1 Fix set of NLS translation issues
While monitoring the code, a couple of issues related to string
translation has showed up:
- Some routines for auto-updatable views return an error string, which
sometimes missed the shot.  A comment regarding string translation is
added for each routine to help with future features.
- GSSAPI authentication missed two translations.
- vacuumdb handles non-translated strings.
- GetConfigOptionByNum should translate strings.  This part is not
back-patched as after a minor upgrade this could be surprising for
users.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20180810.152131.31921918.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch-through: 9.3
2018-08-21 15:17:13 +09:00
Peter Eisentraut e4597ee65d InsertPgAttributeTuple() to set attcacheoff
InsertPgAttributeTuple() is the interface between in-memory tuple
descriptors and on-disk pg_attribute, so it makes sense to give it the
job of resetting attcacheoff.  This avoids having all the callers having
to do so.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
2018-08-17 22:08:21 +02:00
Peter Eisentraut 3ebdd21b79 Remove obsolete comment
The sequence name is no longer stored in the sequence relation, since
1753b1b027.
2018-08-13 21:07:31 +02:00
Michael Paquier f841ceb26d Improve TRUNCATE by avoiding early lock queue
A caller of TRUNCATE could previously queue for an access exclusive lock
on a relation it may not have permission to truncate, potentially
interfering with users authorized to work on it.  This can be very
intrusive depending on the lock attempted to be taken.  For example,
pg_authid could be blocked, preventing any authentication attempt to
happen on a PostgreSQL instance.

This commit fixes the case of TRUNCATE so as RangeVarGetRelidExtended is
used with a callback doing the necessary ACL checks at an earlier stage,
avoiding lock queuing issues, so as an immediate failure happens for
unprivileged users instead of waiting on a lock that would not be
taken.

This is rather similar to the type of work done in cbe24a6 for CLUSTER,
and the code of TRUNCATE is this time refactored so as there is no
user-facing changes.  As the commit for CLUSTER, no back-patch is done.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20180806165816.GA19883@paquier.xyz
2018-08-10 18:26:59 +02:00
Michael Paquier 661dd23950 Restrict access to reindex of shared catalogs for non-privileged users
A database owner running a database-level REINDEX has the possibility to
also do the operation on shared system catalogs without being an owner
of them, which allows him to block resources it should not have access
to.  The same goes for a schema owner.  For example, PostgreSQL would go
unresponsive and even block authentication if a lock is waited for
pg_authid.  This commit makes sure that a user running a REINDEX SYSTEM,
DATABASE or SCHEMA only works on the following relations:
- The user is a superuser
- The user is the table owner
- The user is the database/schema owner, only if the relation worked on
is not shared.

Robert has worded most the documentation changes, and I have coded the
core part.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier, Robert Haas
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20180805211059.GA2185@paquier.xyz
Backpatch-through: 11- as the current behavior has been around for a
very long time and could be disruptive for already released branches.
2018-08-09 09:40:15 +02:00
Tom Lane 9b7c56d6cb Don't record FDW user mappings as members of extensions.
CreateUserMapping has a recordDependencyOnCurrentExtension call that's
been there since extensions were introduced (very possibly my fault).
However, there's no support anywhere else for user mappings as members
of extensions, nor are they listed as a possible member object type in
the documentation.  Nor does it really seem like a good idea for user
mappings to belong to extensions when roles don't.  Hence, remove the
bogus call.

(As we saw in bug #15310, the lack of any pg_dump support for this case
ensures that any such membership record would silently disappear during
pg_upgrade.  So there's probably no need for us to do anything else
about cleaning up after this mistake.)

Discussion: https://postgr.es/m/27952.1533667213@sss.pgh.pa.us
2018-08-07 16:32:50 -04:00
Peter Eisentraut 0d5f05cde0 Allow multi-inserts during COPY into a partitioned table
CopyFrom allows multi-inserts to be used for non-partitioned tables, but
this was disabled for partitioned tables.  The reason for this appeared
to be that the tuple may not belong to the same partition as the
previous tuple did.  Not allowing multi-inserts here greatly slowed down
imports into partitioned tables.  These could take twice as long as a
copy to an equivalent non-partitioned table.  It seems wise to do
something about this, so this change allows the multi-inserts by
flushing the so-far inserted tuples to the partition when the next tuple
does not belong to the same partition, or when the buffer fills.  This
improves performance when the next tuple in the stream commonly belongs
to the same partition as the previous tuple.

In cases where the target partition changes on every tuple, using
multi-inserts slightly slows the performance.  To get around this we
track the average size of the batches that have been inserted and
adaptively enable or disable multi-inserts based on the size of the
batch.  Some testing was done and the regression only seems to exist
when the average size of the insert batch is close to 1, so let's just
enable multi-inserts when the average size is at least 1.3.  More
performance testing might reveal a better number for, this, but since
the slowdown was only 1-2% it does not seem critical enough to spend too
much time calculating it.  In any case it may depend on other factors
rather than just the size of the batch.

Allowing multi-inserts for partitions required a bit of work around the
per-tuple memory contexts as we must flush the tuples when the next
tuple does not belong the same partition.  In which case there is no
good time to reset the per-tuple context, as we've already built the new
tuple by this time.  In order to work around this we maintain two
per-tuple contexts and just switch between them every time the partition
changes and reset the old one.  This does mean that the first of each
batch of tuples is not allocated in the same memory context as the
others, but that does not matter since we only reset the context once
the previous batch has been inserted.

Author: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
2018-08-01 10:23:09 +02:00
Michael Paquier 9f7ba88aa4 Fix two oversights from 9ebe0572 which refactored cluster_rel
The recheck option became a no-op as ClusterOption failed to set proper
values for each element.  There was a second code path where local
options got overwritten.

Both issues have been spotted by Coverity.
2018-07-29 22:00:42 +09:00
Michael Paquier 9ebe0572ce Refactor cluster_rel() to handle more options
This extends cluster_rel() in such a way that more options can be added
in the future, which will reduce the amount of chunk code for an
upcoming SKIP_LOCKED aimed for VACUUM.  As VACUUM FULL is a different
flavor of CLUSTER, we want to make that extensible to ease integration.

This only reworks the API and its callers, without providing anything
user-facing.  Two options are present now: verbose mode and relation
recheck when doing the cluster command work across multiple
transactions.  This could be used as well as a base to extend the
grammar of CLUSTER later on.

Author: Michael Paquier
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/20180723031058.GE2854@paquier.xyz
2018-07-24 11:37:32 +09:00
Tom Lane 2409716755 Remove undocumented restriction against duplicate partition key columns.
transformPartitionSpec rejected duplicate simple partition columns
(e.g., "PARTITION BY RANGE (x,x)") but paid no attention to expression
columns, resulting in inconsistent behavior.  Worse, cases like
"PARTITION BY RANGE (x,(x))") were accepted but would then result in
dump/reload failures, since the expression (x) would get simplified
to a plain column later.

There seems no better reason for this restriction than there was for
the one against duplicate included index columns (cf commit 701fd0bbc),
so let's just remove it.

Back-patch to v10 where this code was added.

Report and patch by Yugo Nagata.

Discussion: https://postgr.es/m/20180712165939.36b12aff.nagata@sraoss.co.jp
2018-07-19 15:41:46 -04:00
Tom Lane 701fd0bbc9 Drop the rule against included index columns duplicating key columns.
The initial version of the included-index-column feature stated that
included columns couldn't be the same as any key column of the index.
While it'd be pretty silly to do that, since the included column would be
entirely redundant, we've never prohibited redundant index columns before
so it's not very consistent to do so here.  Moreover, the prohibition
was itself badly implemented, so that it failed to reject columns that
were effectively identical but not spelled quite alike, as reported by
Aditya Toshniwal.

(Moreover, it's not hard to imagine that for some non-btree index types,
such cases would be non-silly anyhow: the index might use a lossy
representation for key columns but be able to support retrieval of the
original form of included columns.)

Hence, let's just drop the prohibition.

In passing, do some copy-editing on the documentation for the
included-column feature.

Yugo Nagata; documentation and test corrections by me

Discussion: https://postgr.es/m/CAM9w-_mhBCys4fQNfaiQKTRrVWtoFrZ-wXmDuE9Nj5y-Y7aDKQ@mail.gmail.com
2018-07-18 14:43:03 -04:00
Heikki Linnakangas 6b387179ba Fix misc typos, mostly in comments.
A collection of typos I happened to spot while reading code, as well as
grepping for common mistakes.

Backpatch to all supported versions, as applicable, to avoid conflicts
when backporting other commits in the future.
2018-07-18 16:17:32 +03:00
Alvaro Herrera cb9db2ab06 Fix ALTER TABLE...SET STATS error message for included columns
The existing error message was complaining that the column is not an
expression, which is not correct.  Introduce a suitable wording
variation and a test.

Co-authored-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/20180628182803.e4632d5a.nagata@sraoss.co.jp
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
2018-07-16 20:00:39 -04:00
Peter Eisentraut 3884072329 Prohibit transaction commands in security definer procedures
Starting and aborting transactions in security definer procedures
doesn't work.  StartTransaction() insists that the security context
stack is empty, so this would currently cause a crash, and
AbortTransaction() resets it.  This could be made to work by
reorganizing the code, but right now we just prohibit it.

Reported-by: amul sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b96Gupt_LFL7uNyy3c50-wbhA68NUjiK5%3DrF6_w%3Dpq_T%3DQ%40mail.gmail.com
2018-07-13 10:41:32 +02:00
Alvaro Herrera cd073d8f70 Fix FK checks of TRUNCATE involving partitioned tables
When truncating a table that is referenced by foreign keys in
partitioned tables, the check to ensure the referencing table are also
truncated spuriously failed.  This is because it was relying on
relhastriggers as a proxy for the table having FKs, and that's wrong for
partitioned tables.  Fix it to consider such tables separately.  There
may be a better way ... but this code is pretty inefficient already.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquiër <michael@paquier.xyz>
Discussion: https://postgr.es/m/20180711000624.zmeizicibxeehhsg@alvherre.pgsql
2018-07-12 12:09:08 -04:00
Amit Kapila 40ca70ebcc Allow using the updated tuple while moving it to a different partition.
An update that causes the tuple to be moved to a different partition was
missing out on re-constructing the to-be-updated tuple, based on the latest
tuple in the update chain.  Instead, it's simply deleting the latest tuple
and inserting a new tuple in the new partition based on the old tuple.
Commit 2f17844104 didn't consider this case, so some of the updates were
getting lost.

In passing, change the argument order for output parameter in ExecDelete
and add some commentary about it.

Reported-by: Pavan Deolasee
Author: Amit Khandekar, with minor changes by me
Reviewed-by: Dilip Kumar, Amit Kapila and Alvaro Herrera
Backpatch-through: 11
Discussion: https://postgr.es/m/CAJ3gD9fRbEzDqdeDq1jxqZUb47kJn+tQ7=Bcgjc8quqKsDViKQ@mail.gmail.com
2018-07-12 12:51:39 +05:30
Michael Paquier edc6b41bd4 Rename VACOPT_NOWAIT to VACOPT_SKIP_LOCKED
When it comes to SELECT ... FOR or LOCK, NOWAIT means to not wait for
something to happen, and issue an error.  SKIP LOCKED means to not wait
for something to happen but to move on without issuing an error.  The
internal option of autovacuum and autoanalyze mentioned above, used only
when wraparound is not involved was named NOWAIT, but behaves like SKIP
LOCKED which is confusing.

Author: Nathan Bossart
Discussion: https://postgr.es/m/20180307050345.GA3095@paquier.xyz
2018-07-12 14:28:28 +09:00
Michael Paquier 6551f3daa2 Add assertion in expand_vacuum_rel() for non-autovacuum path
The code path where the assertion is added helps to check that
autovacuum always includes a relation OID when doing a vacuum on it.
Extracted from a larger patch set to add support for SKIP LOCKED with
manual VACUUM commands.

Author: Nathan Bossart
Discussion: https://postgr.es/m/9EF7EBE4-720D-4CF1-9D0E-4403D7E92990@amazon.com
2018-07-12 13:50:17 +09:00
Alexander Korotkov edf59c40dd Fix more wrong paths in header comments
It appears that there are more files, whose header comment paths are
wrong.  So, fix those paths.  No backpatching per proposal of Tom Lane.

Discussion: https://postgr.es/m/CAPpHfdsJyYbOj59MOQL%2B4XxdcomLSLfLqBtAvwR%2BpsCqj3ELdQ%40mail.gmail.com
2018-07-11 17:57:04 +03:00
Peter Eisentraut ec67b89816 Add UtilityReturnsTuples() support for CALL
This ensures that prepared statements for CALL can return tuples.
2018-07-09 13:58:08 +02:00
Peter Eisentraut e34ec13620 Allow CALL with polymorphic type arguments
In order to be able to resolve polymorphic types, we need to set fn_expr
before invoking the procedure.
2018-07-06 22:40:16 +02:00
Andrew Dunstan 1e9c858090 pgindent run prior to branching 2018-06-30 12:25:49 -04:00
Alvaro Herrera 41372071df Fix crash when ALTER TABLE recreates indexes on partitions
The skip_build flag was not being passed correctly when recursing to
indexes on partitions, leading to attempts to rebuild indexes when they
were not yet ready to be rebuilt.

Reported-by: Rajkumar Raghuwanshi
Discussion: https://postgr.es/m/CAKcux6mxNCGsgATwf5CGMF8g4WSupCXicCVMeKUTuWbyxHOMsQ@mail.gmail.com
2018-06-29 11:49:30 -04:00
Alvaro Herrera 475be5e790 When index recurses to a partition, map columns numbers
Two out of three code paths were mapping column numbers correctly if a
partition had different column numbers than parent table, but the most
commonly used one (recursing in CREATE INDEX to a new index on a
partition) failed to map attribute numbers in expressions.  Oddly
enough, attnums in WHERE clauses are already handled correctly
everywhere.

Reported-by: Amit Langote
Author: Amit Langote
Discussion: https://postgr.es/m/dce1fda4-e0f0-94c9-6abb-f5956a98c057@lab.ntt.co.jp
Reviewed-by: Álvaro Herrera
2018-06-22 16:45:48 -04:00
Michael Paquier 1c7c317cd9 Clarify use of temporary tables within partition trees
Since their introduction, partition trees have been a bit lossy
regarding temporary relations.  Inheritance trees respect the following
patterns:
1) a child relation can be temporary if the parent is permanent.
2) a child relation can be temporary if the parent is temporary.
3) a child relation cannot be permanent if the parent is temporary.
4) The use of temporary relations also imply that when both parent and
child need to be from the same sessions.

Partitions share many similar patterns with inheritance, however the
handling of the partition bounds make the situation a bit tricky for
case 1) as the partition code bases a lot of its lookup code upon
PartitionDesc which does not really look after relpersistence.  This
causes for example a temporary partition created by session A to be
visible by another session B, preventing this session B to create an
extra partition which overlaps with the temporary one created by A with
a non-intuitive error message.  There could be use-cases where mixing
permanent partitioned tables with temporary partitions make sense, but
that would be a new feature.  Partitions respect 2), 3) and 4) already.

It is a bit depressing to see those error checks happening in
MergeAttributes() whose purpose is different, but that's left as future
refactoring work.

Back-patch down to 10, which is where partitioning has been introduced,
except that default partitions do not apply there.  Documentation also
includes limitations related to the use of temporary tables with
partition trees.

Reported-by: David Rowley
Author: Amit Langote, Michael Paquier
Reviewed-by: Ashutosh Bapat, Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/CAKJS1f94Ojk0og9GMkRHGt8wHTW=ijq5KzJKuoBoqWLwSVwGmw@mail.gmail.com
2018-06-20 10:42:25 +09:00
Tom Lane 19832753f1 Fix some ill-chosen names for globally-visible partition support functions.
"compute_hash_value" is particularly gratuitously generic, but IMO
all of these ought to have names clearly related to partitioning.
2018-06-13 13:18:02 -04:00
Tom Lane bdc643e5e4 Fix access to just-closed relcache entry.
It might be impossible for this to cause a problem in non-debug builds,
since there'd be no opportunity for the relcache entry to get recycled
before the fetch.  It blows up nicely with -DRELCACHE_FORCE_RELEASE plus
valgrind, though.

Evidently introduced by careless refactoring in commit f0e44751d.
Back-patch accordingly.

Discussion: https://postgr.es/m/27543.1528758304@sss.pgh.pa.us
2018-06-11 19:18:04 -04:00
Alvaro Herrera 5b0c7e2f75 Don't needlessly check the partition contraint twice
Starting with commit f0e44751d7, ExecConstraints was in charge of
running the partition constraint; commit 19c47e7c82 modified that so
that caller could request to skip that checking depending on some
conditions, but that commit and 15ce775faa together introduced a small
bug there which caused ExecInsert to request skipping the constraint
check but have this not be honored -- in effect doing the check twice.
This could have been fixed in a very small patch, but on further
analysis of the involved function and its callsites, it turns out to be
simpler to give the responsibility of checking the partition constraint
fully to the caller, and return ExecConstraints to its original
(pre-partitioning) shape where it only checked tuple descriptor-related
constraints.  Each caller must do partition constraint checking on its
own schedule, which is more convenient after commit 2f17844104 anyway.

Reported-by: David Rowley
Author: David Rowley, Álvaro Herrera
Reviewed-by: Amit Langote, Amit Khandekar, Simon Riggs
Discussion: https://postgr.es/m/CAKJS1f8w8+awsxgea8wt7_UX8qzOQ=Tm1LD+U1fHqBAkXxkW2w@mail.gmail.com
2018-06-11 17:12:16 -04:00
Tom Lane ecac23511e Widen COPY FROM's current-line-number counter from 32 to 64 bits.
Because the code for the HEADER option skips a line when this counter
is zero, a very long COPY FROM WITH HEADER operation would drop a line
every 2^32 lines.  A lesser but still unfortunate problem is that errors
would show a wrong input line number for errors occurring beyond the
2^31'st input line.  While such large input streams seemed impractical
when this code was first written, they're not any more.  Widening the
counter (and some associated variables) to uint64 should be enough to
prevent problems for the foreseeable future.

David Rowley

Discussion: https://postgr.es/m/CAKJS1f88yh-6wwEfO6QLEEvH3BEugOq2QX1TOja0vCauoynmOQ@mail.gmail.com
2018-05-22 13:32:52 -04:00
Tom Lane f755a152d4 Improve spelling of new FINALFUNC_MODIFY aggregate attribute.
I'd used SHARABLE as a value originally, but Peter Eisentraut points out
that dictionaries agree that SHAREABLE is the preferred spelling.
Run around and change that before it's too late.

Discussion: https://postgr.es/m/d2e1afd4-659c-50d6-1b20-7cfd3675e909@2ndquadrant.com
2018-05-21 11:41:42 -04:00
Tom Lane 41c912cad1 Clean up warnings from -Wimplicit-fallthrough.
Recent gcc can warn about switch-case fall throughs that are not
explicitly labeled as intentional.  This seems like a good thing,
so clean up the warnings exposed thereby by labeling all such
cases with comments that gcc will recognize.

In files that already had one or more suitable comments, I generally
matched the existing style of those.  Otherwise I went with
/* FALLTHROUGH */, which is one of the spellings approved at the
more-restrictive-than-default level -Wimplicit-fallthrough=4.
(At the default level you can also spell it /* FALL ?THRU */,
and it's not picky about case.  What you can't do is include
additional text in the same comment, so some existing comments
containing versions of this aren't good enough.)

Testing with gcc 8.0.1 (Fedora 28's current version), I found that
I also had to put explicit "break"s after elog(ERROR) or ereport(ERROR);
apparently, for this purpose gcc doesn't recognize that those don't
return.  That seems like possibly a gcc bug, but it's fine because
in most places we did that anyway; so this amounts to a visit from the
style police.

Discussion: https://postgr.es/m/15083.1525207729@sss.pgh.pa.us
2018-05-01 19:35:08 -04:00
Tom Lane 2e83e6bd74 Adjust hints and docs to suggest CREATE EXTENSION not CREATE LANGUAGE.
The core PLs have been extension-ified for seven years now, and we can
reasonably hope that all out-of-core PLs have been too.  So adjust a few
places that were still recommending CREATE LANGUAGE as the user-level
way to install a PL.

Discussion: https://postgr.es/m/CA+TgmoaJTUDMSuSCg4k08Dv8vhbrJq9nP3ZfPbmysVz_616qxw@mail.gmail.com
2018-04-27 13:42:03 -04:00
Tom Lane bdf46af748 Post-feature-freeze pgindent run.
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
2018-04-26 14:47:16 -04:00
Alvaro Herrera bc972072a3 Add missing pstrdup
Lifetime of the input string is not right, so create a separate copy.

Author: Amit Langote
Discussion: https://postgr.es/m/a2773420-50d1-0a42-3396-fe42b0921134@lab.ntt.co.jp
2018-04-23 12:11:41 -03:00
Stephen Frost a0fefbcb71 Fix a couple minor typos
In commit f0e4475, GetIndexOpClass was renamed to ResolveOpClass, but
the comment in typecmds.c didn't get the memo.

In objectaddress.c, missing 'of' in a comment.

Both noticed by Vik Fearing, patch is mine though.
2018-04-20 19:04:54 -04:00
Tom Lane b1b71f1658 Fix race conditions when an event trigger is added concurrently with DDL.
EventTriggerTableRewrite crashed if there were table_rewrite triggers
present, but there had not been when the calling command started.

EventTriggerDDLCommandEnd called ddl_command_end triggers if present,
even if there had been no such triggers when the calling command started,
which would lead to a failure in pg_event_trigger_ddl_commands.

In both cases, fix by doing nothing; it's better to wait till the next
command when things will be properly initialized.

In passing, remove an elog(DEBUG1) call that might have seemed interesting
four years ago but surely isn't today.

We found this because of intermittent failures in the buildfarm.  Thanks
to Alvaro Herrera and Andrew Gierth for analysis.

Back-patch to 9.5; some of this code exists before that, but the specific
hazards we need to guard against don't.

Discussion: https://postgr.es/m/5767.1523995174@sss.pgh.pa.us
2018-04-20 17:15:31 -04:00
Tom Lane 1dec82068b Better fix for deadlock hazard in CREATE INDEX CONCURRENTLY.
Commit 54eff5311 did not account for the possibility that we'd have
a transaction snapshot due to default_transaction_isolation being
set high enough to require one.  The transaction snapshot is enough
to hold back our advertised xmin and thus risk deadlock anyway.
The only way to get rid of that snap is to start a new transaction,
so let's do that instead.  Also throw in an assert checking that we
really have gotten to a state where no xmin is being advertised.

Back-patch to 9.4, like the previous commit.

Discussion: https://postgr.es/m/CAMkU=1ztk3TpQdcUNbxq93pc80FrXUjpDWLGMeVBDx71GHNwZQ@mail.gmail.com
2018-04-18 12:07:37 -04:00
Tatsuo Ishii 03030512d1 Add more infinite recursion detection while locking a view.
Also add regression test cases for detecting infinite recursion in
locking view tests.  Some document enhancements. Patch by Yugo Nagata.
2018-04-17 16:59:17 +09:00
Alvaro Herrera da6f3e45dd Reorganize partitioning code
There's been a massive addition of partitioning code in PostgreSQL 11,
with little oversight on its placement, resulting in a
catalog/partition.c with poorly defined boundaries and responsibilities.
This commit tries to set a couple of distinct modules to separate things
a little bit.  There are no code changes here, only code movement.

There are three new files:
  src/backend/utils/cache/partcache.c
  src/include/partitioning/partdefs.h
  src/include/utils/partcache.h

The previous arrangement of #including catalog/partition.h almost
everywhere is no more.

Authors: Amit Langote and Álvaro Herrera
Discussion: https://postgr.es/m/98e8d509-790a-128c-be7f-e48a5b2d8d97@lab.ntt.co.jp
	https://postgr.es/m/11aa0c50-316b-18bb-722d-c23814f39059@lab.ntt.co.jp
	https://postgr.es/m/143ed9a4-6038-76d4-9a55-502035815e68@lab.ntt.co.jp
	https://postgr.es/m/20180413193503.nynq7bnmgh6vs5vm@alvherre.pgsql
2018-04-14 21:12:14 -03:00
Peter Eisentraut a8677e3ff6 Support named and default arguments in CALL
We need to call expand_function_arguments() to expand named and default
arguments.

In PL/pgSQL, we also need to deal with named and default INOUT arguments
when receiving the output values into variables.

Author: Pavel Stehule <pavel.stehule@gmail.com>
2018-04-14 09:13:53 -04:00
Alvaro Herrera b8ca984b2c Revert lowering of lock level for ATTACH PARTITION
I lowered the lock level for partitions being scanned from
AccessExclusive to ShareLock in the course of 72cf7f310c, but that was
bogus, as pointed out by Robert Haas.  Revert that bit.  Doing this is
possible, but requires more work.

Discussion: https://postgr.es/m/CA+TgmobV7Nfmqv+TZXcdSsb9Bjc-OL-Anv6BNmCbfJVZLYPE4Q@mail.gmail.com
2018-04-12 16:53:27 -03:00
Teodor Sigaev 524054598f Fix interference between covering indexes and partitioned tables
The bug is caused due to the original IndexStmt that DefineIndex receives
being overwritten when processing the INCLUDE columns. Use separate list of
index params to propagate to child tables. Add tests covering this case.

Amit Langote and Alexander Korotkov.

Re-commit 5c6110c6a9 because it discovered a bug
fixed in c266ed31a8

Discussion: https://www.postgresql.org/message-id/CAJGNTeO%3DBguEyG8wxMpU_Vgvg3nGGzy71zUQ0RpzEn_mb0bSWA%40mail.gmail.com
2018-04-12 17:25:13 +03:00
Teodor Sigaev c266ed31a8 Cleanup covering infrastructure
- Explicitly forbids opclass, collation and indoptions (like DESC/ASC etc) for
  including columns. Throw an error if user points that.
- Truncated storage arrays for such attributes to store only key atrributes,
  added assertion checks.
- Do not check opfamily and collation for including columns in
  CompareIndexInfo()

Discussion: https://www.postgresql.org/message-id/5ee72852-3c4e-ee35-e2ed-c1d053d45c08@sigaev.ru
2018-04-12 16:37:22 +03:00
Simon Riggs 08ea7a2291 Revert MERGE patch
This reverts commits d204ef6377,
83454e3c2b and a few more commits thereafter
(complete list at the end) related to MERGE feature.

While the feature was fully functional, with sufficient test coverage and
necessary documentation, it was felt that some parts of the executor and
parse-analyzer can use a different design and it wasn't possible to do that in
the available time. So it was decided to revert the patch for PG11 and retry
again in the future.

Thanks again to all reviewers and bug reporters.

List of commits reverted, in reverse chronological order:

 f1464c5380 Improve parse representation for MERGE
 ddb4158579 MERGE syntax diagram correction
 530e69e59b Allow cpluspluscheck to pass by renaming variable
 01b88b4df5 MERGE minor errata
 3af7b2b0d4 MERGE fix variable warning in non-assert builds
 a5d86181ec MERGE INSERT allows only one VALUES clause
 4b2d44031f MERGE post-commit review
 4923550c20 Tab completion for MERGE
 aa3faa3c7a WITH support in MERGE
 83454e3c2b New files for MERGE
 d204ef6377 MERGE SQL Command following SQL:2016

Author: Pavan Deolasee
Reviewed-by: Michael Paquier
2018-04-12 11:22:56 +01:00
Teodor Sigaev c9c875a28f Rename IndexInfo.ii_KeyAttrNumbers array
Rename ii_KeyAttrNumbers to ii_IndexAttrNumbers to prevent confusion with
ii_NumIndexAttrs/ii_NumIndexKeyAttrs. ii_IndexAttrNumbers contains
all attributes including "including" columns, not only key attribute.

Discussion: https://www.postgresql.org/message-id/13123421-1d52-d0e4-c95c-6d69011e0595%40sigaev.ru
2018-04-12 13:02:45 +03:00
Alvaro Herrera 9e9befac4a Set relispartition correctly for index partitions
Oversight in commit 8b08f7d4820f: pg_class.relispartition was not
being set for index partitions, which is a bit odd, and was also causing
the code to unnecessarily call has_superclass() when simply checking the
flag was enough.

Author: Álvaro Herrera
Reported-by: Amit Langote
Discussion: https://postgr.es/m/12085bc4-0bc6-0f3a-4c43-57fe0681772b@lab.ntt.co.jp
2018-04-11 21:27:12 -03:00
Alvaro Herrera 72cf7f310c Fix ALTER TABLE .. ATTACH PARTITION ... DEFAULT
If the table being attached contained values that contradict the default
partition's partition constraint, it would fail to complain, because
CommandCounterIncrement changes in 4dba331cb3 coupled with some bogus
coding in the existing ValidatePartitionConstraints prevented the
partition constraint from being validated after all -- or rather, it
caused to constraint to become an empty one, always succeeding.

Fix by not re-reading the OID of the default partition in
ATExecAttachPartition.  To forestall similar problems, revise the
existing code:
* rename routine from ValidatePartitionConstraints() to
  QueuePartitionConstraintValidation, to better represent what it
  actually does.
* add an Assert() to make sure that when queueing a constraint for a
  partition we're not overwriting a constraint previously queued.
* add an Assert() that we don't try to invoke the special-purpose
  validation of the default partition when attaching the default
  partition itself.

While at it, change some loops to obtain partition OIDs from
partdesc->oids rather than find_all_inheritors; reduce the lock level
of partitions being scanned from AccessExclusiveLock to ShareLock;
rewrite QueuePartitionConstraintValidation in a recursive fashion rather
than repetitive.

Author: Álvaro Herrera.  Tests written by Amit Langote
Reported-by: Rushabh Lathia
Diagnosed-by: Kyotaro HORIGUCHI, who also provided the initial fix.
Reviewed-by: Kyotaro HORIGUCHI, Amit Langote, Jeevan Ladhe
Discussion: https://postgr.es/m/CAGPqQf0W+v-Ci_qNV_5R3A=Z9LsK4+jO7LzgddRncpp_rrnJqQ@mail.gmail.com
2018-04-11 15:32:46 -03:00
Teodor Sigaev 92899992e1 Temporary revert 5c6110c6a9
It discovers one more bug in CompareIndexInfo(), should be fixed first.
2018-04-11 19:32:19 +03:00
Teodor Sigaev 5c6110c6a9 Fix interference between cavering indexes and partitioned tables
The bug is caused due to the original IndexStmt that DefineIndex receives
being overwritten when processing the INCLUDE columns. Use separate list of
index params to propagate to child tables. Add tests covering this case.

Amit Langote and Alexander Korotkov.
2018-04-11 16:44:26 +03:00
Alvaro Herrera 15a8f8caad Fix IndexOnlyScan counter for heap fetches in parallel mode
The HeapFetches counter was using a simple value in IndexOnlyScanState,
which fails to propagate values from parallel workers; so the counts are
wrong when IndexOnlyScan runs in parallel.  Move it to Instrumentation,
like all the other counters.

While at it, change INSERT ON CONFLICT conflicting tuple counter to use
the new ntuples2 instead of nfiltered2, which is a blatant misuse.

Discussion: https://postgr.es/m/20180409215851.idwc75ct2bzi6tea@alvherre.pgsql
2018-04-10 15:56:15 -03:00
Tom Lane af1a949109 Further cleanup of client dependencies on src/include/catalog headers.
In commit 9c0a0de4c, I'd failed to notice that catalog/catalog.h
should also be considered a frontend-unsafe header, because it includes
(and needs) the full form of pg_class.h, not to mention relcache.h.
However, various frontend code was depending on it to get
TABLESPACE_VERSION_DIRECTORY, so refactoring of some sort is called for.

The cleanest answer seems to be to move TABLESPACE_VERSION_DIRECTORY,
as well as the OIDCHARS symbol, to common/relpath.h.  Do that, and mop up
inclusions as necessary.  (I found that quite a few current users of
catalog/catalog.h don't seem to need it at all anymore, apparently as a
result of the refactorings that created common/relpath.[hc].  And
initdb.c needed it only as a route to pg_class_d.h.)

Discussion: https://postgr.es/m/6629.1523294509@sss.pgh.pa.us
2018-04-09 14:39:58 -04:00
Tom Lane b3b7f7898f Fix additional breakage in covering-index patch.
CheckIndexCompatible() misused ComputeIndexAttrs() by not bothering
to fill ii_NumIndexAttrs and ii_NumIndexKeyAttrs in the passed
IndexInfo.  Omission of ii_NumIndexAttrs was previously unimportant,
but now this matters because ComputeIndexAttrs depends on
ii_NumIndexKeyAttrs to decide how many columns it needs to report on.

(BTW, the fact that this oversight wasn't detected earlier implies
that we have no regression test verifying whether CheckIndexCompatible
ever succeeds.  Bad dog.  Not the job of this patch to fix it, though.)

Also, change the API of ComputeIndexAttrs so that it fills the opclass
output array for all column positions, as it does for the options output
array; positions for non-key index columns are filled with zeroes.
This isn't directly fixing any bug, but it seems like a good idea.

Per valgrind failure reports from buildfarm.

Alexander Korotkov, tweaked a bit by me

Discussion: https://postgr.es/m/CAPpHfduWrysrT-qAhn+3Ea5+Mg6Vhc-oA6o2Z-hRCPRdvf3tiw@mail.gmail.com
2018-04-08 17:23:39 -04:00
Tom Lane cefa387153 Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.
Traditionally, include/catalog/pg_foo.h contains extern declarations
for functions in backend/catalog/pg_foo.c, in addition to its function
as the authoritative definition of the pg_foo catalog's rowtype.
In some cases, we'd been forced to split out those extern declarations
into separate pg_foo_fn.h headers so that the catalog definitions
could be #include'd by frontend code.  That problem is gone as of
commit 9c0a0de4c, so let's undo the splits to make things less
confusing.

Discussion: https://postgr.es/m/23690.1523031777@sss.pgh.pa.us
2018-04-08 14:35:29 -04:00
Stephen Frost da9b580d89 Refactor dir/file permissions
Consolidate directory and file create permissions for tools which work
with the PG data directory by adding a new module (common/file_perm.c)
that contains variables (pg_file_create_mode, pg_dir_create_mode) and
constants to initialize them (0600 for files and 0700 for directories).

Convert mkdir() calls in the backend to MakePGDirectory() if the
original call used default permissions (always the case for regular PG
directories).

Add tests to make sure permissions in PGDATA are set correctly by the
tools which modify the PG data directory.

Authors: David Steele <david@pgmasters.net>,
         Adam Brightwell <adam.brightwell@crunchydata.com>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
2018-04-07 17:45:39 -04:00
Alvaro Herrera 499be013de Support partition pruning at execution time
Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query.  This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions.

This commit adds support for pruning subnodes of Append which cannot
possibly contain any matching tuples, during execution, by evaluating
Params to determine the minimum set of subnodes that can possibly match.
We support more than just simple Params in WHERE clauses. Support
additionally includes:

1. Parameterized Nested Loop Joins: The parameter from the outer side of the
   join can be used to determine the minimum set of inner side partitions to
   scan.

2. Initplans: Once an initplan has been executed we can then determine which
   partitions match the value from the initplan.

Partition pruning is performed in two ways.  When Params external to the plan
are found to match the partition key we attempt to prune away unneeded Append
subplans during the initialization of the executor.  This allows us to bypass
the initialization of non-matching subplans meaning they won't appear in the
EXPLAIN or EXPLAIN ANALYZE output.

For parameters whose value is only known during the actual execution
then the pruning of these subplans must wait.  Subplans which are
eliminated during this stage of pruning are still visible in the EXPLAIN
output.  In order to determine if pruning has actually taken place, the
EXPLAIN ANALYZE must be viewed.  If a certain Append subplan was never
executed due to the elimination of the partition then the execution
timing area will state "(never executed)".  Whereas, if, for example in
the case of parameterized nested loops, the number of loops stated in
the EXPLAIN ANALYZE output for certain subplans may appear lower than
others due to the subplan having been scanned fewer times.  This is due
to the list of matching subnodes having to be evaluated whenever a
parameter which was found to match the partition key changes.

This commit required some additional infrastructure that permits the
building of a data structure which is able to perform the translation of
the matching partition IDs, as returned by get_matching_partitions, into
the list index of a subpaths list, as exist in node types such as
Append, MergeAppend and ModifyTable.  This allows us to translate a list
of clauses into a Bitmapset of all the subpath indexes which must be
included to satisfy the clause list.

Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
2018-04-07 17:54:39 -03:00
Andres Freund f16241bef7 Raise error when affecting tuple moved into different partition.
When an update moves a row between partitions (supported since
2f17844104), our normal logic for following update chains in READ
COMMITTED mode doesn't work anymore. Cross partition updates are
modeled as an delete from the old and insert into the new
partition. No ctid chain exists across partitions, and there's no
convenient space to introduce that link.

Not throwing an error in a partitioned context when one would have
been thrown without partitioning is obviously problematic. This commit
introduces infrastructure to detect when a tuple has been moved, not
just plainly deleted. That allows to throw an error when encountering
a deletion that's actually a move, while attempting to following a
ctid chain.

The row deleted as part of a cross partition update is marked by
pointing it's t_ctid to an invalid block, instead of self as a normal
update would.  That was deemed to be the least invasive and most
future proof way to represent the knowledge, given how few infomask
bits are there to be recycled (there's also some locking issues with
using infomask bits).

External code following ctid chains should be updated to check for
moved tuples. The most likely consequence of not doing so is a missed
error.

Author: Amul Sul, editorialized by me
Reviewed-By: Amit Kapila, Pavan Deolasee, Andres Freund, Robert Haas
Discussion: http://postgr.es/m/CAAJ_b95PkwojoYfz0bzXU8OokcTVGzN6vYGCNVUukeUDrnF3dw@mail.gmail.com
2018-04-07 13:24:27 -07:00
Teodor Sigaev 8224de4f42 Indexes with INCLUDE columns and their support in B-tree
This patch introduces INCLUDE clause to index definition.  This clause
specifies a list of columns which will be included as a non-key part in
the index.  The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans.  Also, such columns don't need to have
appropriate operator classes.  Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.

Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine.  For now, only B-tree indexes support INCLUDE clause.

In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys).  Therefore, B-tree indexes
now might have variable number of attributes.  This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid.  Free 13th bit of t_info is used for indicating
that.  This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn't need special
handling of B-tree indexes for that.

Bump catalog version

Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
			 David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru
2018-04-07 23:00:39 +03:00
Peter Eisentraut 039eb6e92f Logical replication support for TRUNCATE
Update the built-in logical replication system to make use of the
previously added logical decoding for TRUNCATE support.  Add the
required truncate callback to pgoutput and a new logical replication
protocol message.

Publications get a new attribute to determine whether to replicate
truncate actions.  When updating a publication via pg_dump from an older
version, this is not set, thus preserving the previous behavior.

Author: Simon Riggs <simon@2ndquadrant.com>
Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it>
Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-04-07 11:34:11 -04:00
Peter Eisentraut 5dfd1e5a66 Logical decoding of TRUNCATE
Add a new WAL record type for TRUNCATE, which is only used when
wal_level >= logical.  (For physical replication, TRUNCATE is already
replicated via SMGR records.)  Add new callback for logical decoding
output plugins to receive TRUNCATE actions.

Author: Simon Riggs <simon@2ndquadrant.com>
Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it>
Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-04-07 11:34:10 -04:00
Robert Haas 3d956d9562 Allow insert and update tuple routing and COPY for foreign tables.
Also enable this for postgres_fdw.

Etsuro Fujita, based on an earlier patch by Amit Langote. The larger
patch series of which this is a part has been reviewed by Amit
Langote, David Fetter, Maksim Milyutin, Álvaro Herrera, Stephen Frost,
and me.  Minor documentation changes to the final version by me.

Discussion: http://postgr.es/m/29906a26-da12-8c86-4fb9-d8f88442f2b9@lab.ntt.co.jp
2018-04-06 19:22:03 -04:00
Stephen Frost 0fdc8495bf Add default roles for file/program access
This patch adds new default roles named 'pg_read_server_files',
'pg_write_server_files', 'pg_execute_server_program' which
allow an administrator to GRANT to a non-superuser role the ability to
access server-side files or run programs through PostgreSQL (as the user
the database is running as).  Having one of these roles allows a
non-superuser to use server-side COPY to read, write, or with a program,
and to use file_fdw (if installed by a superuser and GRANT'd USAGE on
it) to read from files or run a program.

The existing misc file functions are also changed to allow a user with
the 'pg_read_server_files' default role to read any files on the
filesystem, matching the privileges given to that role through COPY and
file_fdw from above.

Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20171231191939.GR2416%40tamriel.snowman.net
2018-04-06 14:47:10 -04:00
Peter Eisentraut bbca77623f Rename MemoryContextCopySetIdentifier() for clarity
MemoryContextCopySetIdentifier -> MemoryContextCopyAndSetIdentifier

Discussion: https://www.postgresql.org/message-id/6421.1522194949@sss.pgh.pa.us
2018-04-06 12:37:54 -04:00
Robert Haas cfbecf8100 Enforce child constraints during COPY TO a partitioned table.
The previous coding inadvertently checked the constraints for the
partitioned table rather than the target partition, which could
lead to data in a partition that fails to satisfy some constraint
on that partition.  This problem seems to date back to when
table partitioning was introduced; prior to that, there was only
one target table for a COPY, so the problem didn't occur, and the
code just didn't get updated.

Etsuro Fujita, reviewed by Amit Langote and Ashutosh Bapat

Discussion: https://postgr.es/message-id/5ABA4074.1090500%40lab.ntt.co.jp
2018-04-06 11:42:28 -04:00
Peter Eisentraut bcf79b5bb6 Split the SetSubscriptionRelState function into two
We don't actually need the insert-or-update logic, so it's clearer to
have separate functions for the inserting and updating.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
2018-04-06 10:00:26 -04:00
Simon Riggs 4b2d44031f MERGE post-commit review
Review comments from Andres Freund

* Consolidate code into AfterTriggerGetTransitionTable()
* Rename nodeMerge.c to execMerge.c
* Rename nodeMerge.h to execMerge.h
* Move MERGE handling in ExecInitModifyTable()
  into a execMerge.c ExecInitMerge()
* Move mt_merge_subcommands flags into execMerge.h
* Rename opt_and_condition to opt_merge_when_and_condition
* Wordsmith various comments

Author: Pavan Deolasee
Reviewer: Simon Riggs
2018-04-05 09:54:07 +01:00
Alvaro Herrera 3de241dba8 Foreign keys on partitioned tables
Author: Álvaro Herrera
Discussion: https://postgr.es/m/20171231194359.cvojcour423ulha4@alvherre.pgsql
Reviewed-by: Peter Eisentraut
2018-04-04 14:02:49 -03:00
Alvaro Herrera 851f4b4e14 Don't clone internal triggers to partitions
Trigger cloning to partitions was supposed to occur for user-visible
triggers only, but during development the protection that prevented it
from occurring to internal triggers was lost.  Reinstate it, as well as
add a test case to ensure internal triggers (in the tested case,
triggers implementing a deferred unique constraint) are not cloned.
Without the code fix, the partitions in the test end up with different
numbers of triggers, which is clearly wrong ...

Bug in 86f575948c.

Discussion: https://postgr.es/m/20180403214903.ozfagwjcpk337uw7@alvherre.pgsql
2018-04-03 19:08:25 -03:00
Simon Riggs d204ef6377 MERGE SQL Command following SQL:2016
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.

MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
  UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
  DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
  INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
  DO NOTHING;

MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.

MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.

MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.

Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.

This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.

Various issues reported via sqlsmith by Andreas Seltenreich

Authors: Pavan Deolasee, Simon Riggs
Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs

Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
2018-04-03 09:28:16 +01:00
Simon Riggs 7cf8a5c302 Revert "Modified files for MERGE"
This reverts commit 354f13855e.
2018-04-02 21:34:15 +01:00
Simon Riggs 354f13855e Modified files for MERGE 2018-04-02 21:12:47 +01:00
Tatsuo Ishii 1b26bd4089 Fix bug with view locking code.
LockViewRecurese() obtains view relation using heap_open() and passes
it to get_view_query() to get view info. It immediately closes the
relation then uses the returned view info by calling
LockViewRecurse_walker().  Since get_view_query() returns a pointer
within the relcache, the relcache should be kept until
LockViewRecurse_walker() returns. Otherwise the relation could point
to a garbage memory area.

Fix is moving the heap_close() call after LockViewRecurse_walker().

Problem reported by Tom Lane (buildfarm is unhappy, especially prion
since it enables -DRELCACHE_FORCE_RELEASE cpp flag), fix by me.
2018-03-31 09:26:43 +09:00
Andres Freund d87510a524 Combine options for RangeVarGetRelidExtended() into a flags argument.
A followup patch will add a SKIP_LOCKED option. To avoid introducing
evermore arguments, breaking existing callers each time, introduce a
flags argument. This'll no doubt break a few external users...

Also change the MISSING_OK behaviour so a DEBUG1 debug message is
emitted when a relation is not found.

Author: Nathan Bossart
Reviewed-By: Michael Paquier and Andres Freund
Discussion: https://postgr.es/m/20180306005349.b65whmvj7z6hbe2y@alap3.anarazel.de
2018-03-30 17:05:16 -07:00
Tatsuo Ishii 34c20de4d0 Allow to lock views.
Now all tables used in view definitions can be recursively locked by a
LOCK command.

Author: Yugo Nagata
Reviewed by Robert Haas, Thomas Munro and me.

Discussion: https://postgr.es/m/20171011183629.eb2817b3.nagata%40sraoss.co.jp
2018-03-30 09:18:02 +09:00
Bruce Momjian 20b4323bd1 C comments: "a" <--> "an" corrections
Reported-by: Michael Paquier, Abhijit Menon-Sen

Discussion: https://postgr.es/m/20180305045854.GB2266@paquier.xyz

Author: Michael Paquier, Abhijit Menon-Sen, me
2018-03-29 15:18:53 -04:00
Tom Lane 851a26e266 While vacuuming a large table, update upper-level FSM data every so often.
VACUUM updates leaf-level FSM entries immediately after cleaning the
corresponding heap blocks.  fsmpage.c updates the intra-page search trees
on the leaf-level FSM pages when this happens, but it does not touch the
upper-level FSM pages, so that the released space might not actually be
findable by searchers.  Previously, updating the upper-level pages happened
only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum()
call.  This is bad because the VACUUM might get canceled before ever
reaching that point, so that from the point of view of searchers no space
has been freed at all, leading to table bloat.

We can improve matters by updating the upper pages immediately after each
cycle of index-cleaning and heap-cleaning, processing just the FSM pages
corresponding to the range of heap blocks we have now fully cleaned.
This adds a small amount of extra work, since the FSM pages leading down
to each range boundary will be touched twice, but it's pretty negligible
compared to everything else going on in a large VACUUM.

If there are no indexes, VACUUM doesn't work in cycles but just cleans
each heap page on first visit.  In that case we just arbitrarily update
upper FSM pages after each 8GB of heap.  That maintains the goal of not
letting all this work slide until the very end, and it doesn't seem worth
expending extra complexity on a case that so seldom occurs in practice.

In either case, the FSM is fully up to date before any attempt is made
to truncate the relation, so that the most likely scenario for VACUUM
cancellation no longer results in out-of-date upper FSM pages.  When
we do successfully truncate, adjusting the FSM to reflect that is now
fully handled within FreeSpaceMapTruncateRel.

Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional
tweaks by me

Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
2018-03-29 11:29:54 -04:00
Andres Freund 1f0c6a9e7d Add EXPLAIN support for JIT.
This just shows a few details about JITing, e.g. how many functions
have been JITed, and how long that took.  To avoid noise in regression
tests with functions sometimes being JITed in --with-llvm builds,
disable display when COSTS OFF is specified.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-28 13:26:51 -07:00
Andrew Dunstan 16828d5c02 Fast ALTER TABLE ADD COLUMN with a non-NULL default
Currently adding a column to a table with a non-NULL default results in
a rewrite of the table. For large tables this can be both expensive and
disruptive. This patch removes the need for the rewrite as long as the
default value is not volatile. The default expression is evaluated at
the time of the ALTER TABLE and the result stored in a new column
(attmissingval) in pg_attribute, and a new column (atthasmissing) is set
to true. Any existing row when fetched will be supplied with the
attmissingval. New rows will have the supplied value or the default and
so will never need the attmissingval.

Any time the table is rewritten all the atthasmissing and attmissingval
settings for the attributes are cleared, as they are no longer needed.

The most visible code change from this is in heap_attisnull, which
acquires a third TupleDesc argument, allowing it to detect a missing
value if there is one. In many cases where it is known that there will
not be any (e.g.  catalog relations) NULL can be passed for this
argument.

Andrew Dunstan, heavily modified from an original patch from Serge
Rielau.
Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley.

Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com
2018-03-28 10:43:52 +10:30
Tom Lane 442accc3fe Allow memory contexts to have both fixed and variable ident strings.
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident".  The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to.  The "ident" string, if
wanted, is supplied later.  This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.

The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there.  And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.

All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead.  This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other.  Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.

I also arranged for plancache contexts to use the query source string
as their identifier.  This is basically free for CachedPlanSources, as
they contained a copy of that string already.  We pay an extra pstrdup
to do it for CachedPlans.  That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first).  I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.

This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format.  This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.

The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1.  Seems
better for extension authors to endure just one round of API changes
not two.

Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
2018-03-27 16:46:51 -04:00
Alvaro Herrera 86f575948c Allow FOR EACH ROW triggers on partitioned tables
Previously, FOR EACH ROW triggers were not allowed in partitioned
tables.  Now we allow AFTER triggers on them, and on trigger creation we
cascade to create an identical trigger in each partition.  We also clone
the triggers to each partition that is created or attached later.

This means that deferred unique keys are allowed on partitioned tables,
too.

Author: Álvaro Herrera
Reviewed-by: Peter Eisentraut, Simon Riggs, Amit Langote, Robert Haas,
	Thomas Munro
Discussion: https://postgr.es/m/20171229225319.ajltgss2ojkfd3kp@alvherre.pgsql
2018-03-23 10:48:22 -03:00
Tom Lane 7c91a0364f Sync up our various ways of estimating pg_class.reltuples.
VACUUM thought that reltuples represents the total number of tuples in
the relation, while ANALYZE counted only live tuples.  This can cause
"flapping" in the value when background vacuums and analyzes happen
separately.  The planner's use of reltuples essentially assumes that
it's the count of live (visible) tuples, so let's standardize on having
it mean live tuples.

Another issue is that the definition of "live tuple" isn't totally clear;
what should be done with INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples?
ANALYZE's choices in this regard are made on the assumption that if the
originating transaction commits at all, it will happen after ANALYZE
finishes, so we should ignore the effects of the in-progress transaction
--- unless it is our own transaction, and then we should count it.
Let's propagate this definition into VACUUM, too.

Likewise propagate this definition into CREATE INDEX, and into
contrib/pgstattuple's pgstattuple_approx() function.

Tomas Vondra, reviewed by Haribabu Kommi, some corrections by me

Discussion: https://postgr.es/m/16db4468-edfa-830a-f921-39a50498e77e@2ndquadrant.com
2018-03-22 15:47:41 -04:00
Dean Rasheed b5db1d93d2 Improve ANALYZE's strategy for finding MCVs.
Previously, a value was included in the MCV list if its frequency was
25% larger than the estimated average frequency of all nonnull values
in the table.  For uniform distributions, that can lead to values
being included in the MCV list and significantly overestimated on the
basis of relatively few (sometimes just 2) instances being seen in the
sample.  For non-uniform distributions, it can lead to too few values
being included in the MCV list, since the overall average frequency
may be dominated by a small number of very common values, while the
remaining values may still have a large spread of frequencies, causing
both substantial overestimation and underestimation of the remaining
values.  Furthermore, increasing the statistics target may have little
effect because the overall average frequency will remain relatively
unchanged.

Instead, populate the MCV list with the largest set of common values
that are statistically significantly more common than the average
frequency of the remaining values.  This takes into account the
variance of the sample counts, which depends on the counts themselves
and on the proportion of the table that was sampled.  As a result, it
constrains the relative standard error of estimates based on the
frequencies of values in the list, reducing the chances of too many
values being included.  At the same time, it allows more values to be
included, since the MCVs need only be more common than the remaining
non-MCVs, rather than the overall average.  Thus it tends to produce
fewer MCVs than the previous code for uniform distributions, and more
for non-uniform distributions, reducing estimation errors in both
cases.  In addition, the algorithm responds better to increasing the
statistics target, allowing more values to be included in the MCV list
when more of the table is sampled.

Jeff Janes, substantially modified by me. Reviewed by John Naylor and
Tomas Vondra.

Discussion: https://postgr.es/m/CAMkU=1yvdGvW9TmiLAhz2erFnvnPFYHbOZuO+a=4DVkzpuQ2tw@mail.gmail.com
2018-03-22 09:37:36 +00:00
Alvaro Herrera 56163004b8 Fix relcache handling of the 'default' partition
My commit 4dba331cb3 that moved around CommandCounterIncrement calls
in partitioning DDL code unearthed a problem with the relcache handling
for the 'default' partition: the construction of a correct relcache
entry for the partitioned table was at the mercy of lack of CCI calls in
non-trivial amounts of code.  This was prone to creating problems later
on, as the code develops.  This was visible as a test failure in a
compile with RELCACHE_FORCE_RELASE (buildfarm member prion).

The problem is that after the mentioned commit it was possible to create
a relcache entry that had incomplete information regarding the default
partition because I introduced a CCI between adding the catalog entries
for the default partition (StorePartitionBound) and the update of
pg_partitioned_table entry for its parent partitioned table
(update_default_partition_oid).  It seems the best fix is to move the
latter so that it occurs inside the former; the purposeful lack of
intervening CCI should be more obvious, and harder to break.

I also remove a check in RelationBuildPartitionDesc that returns NULL if
the key is not set.  I couldn't find any place that needs this hack
anymore; probably it was required because of bugs that have since been
fixed.

Fix a few typos I noticed while reviewing the code involved.

Discussion: https://postgr.es/m/20180320182659.nyzn3vqtjbbtfgwq@alvherre.pgsql
2018-03-21 12:03:35 -03:00
Peter Eisentraut 325f2ec555 Handle heap rewrites even better in logical decoding
Logical decoding should not publish anything about tables created as
part of a heap rewrite during DDL.  Those tables don't exist externally,
so consumers of logical decoding cannot do anything sensible with that
information.  In ab28feae2b, we worked
around this for built-in logical replication, but that was hack.

This is a more proper fix: We mark such transient heaps using the new
field pg_class.relwrite, linking to the original relation OID.  By
default, we ignore them in logical decoding before they get to the
output plugin.  Optionally, a plugin can register their interest in
getting such changes, if they handle DDL specially, in which case the
new field will help them get information about the actual table.

Reviewed-by: Craig Ringer <craig@2ndquadrant.com>
2018-03-21 09:15:04 -04:00
Alvaro Herrera 4dba331cb3 Fix CommandCounterIncrement in partition-related DDL
It makes sense to do the CCIs in the places that do catalog updates,
rather than before the places that error out because the former ones
fail to do it.  In particular, it looks like StorePartitionBound() and
IndexSetParentIndex() ought to make their own CCIs.

Per review comments from Peter Eisentraut for row-level triggers on
partitioned tables.

Discussion: https://postgr.es/m/20171229225319.ajltgss2ojkfd3kp@alvherre.pgsql
2018-03-20 11:19:41 -03:00
Tom Lane 6497a18e6c Fix some corner-case issues in REFRESH MATERIALIZED VIEW CONCURRENTLY.
refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:

1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query.  (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.

The way to fix #4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause.  I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.

While #1, #3, and #5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching.  That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.

Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us
2018-03-19 18:50:05 -04:00
Tom Lane 6fbd5cce22 Fix performance hazard in REFRESH MATERIALIZED VIEW CONCURRENTLY.
Jeff Janes discovered that commit 7ca25b7de made one of the queries run by
REFRESH MATERIALIZED VIEW CONCURRENTLY perform badly.  The root cause is
bad cardinality estimation for correlated quals, but a principled solution
to that problem is some way off, especially since the planner lacks any
statistics about whole-row variables.  Moreover, in non-error cases this
query produces no rows, meaning it must be run to completion; but use of
LIMIT 1 encourages the planner to pick a fast-start, slow-completion plan,
exactly not what we want.  Remove the LIMIT clause, and instead rely on
the count parameter we pass to SPI_execute() to prevent excess work if the
query does return some rows.

While we've heard no field reports of planner misbehavior with this query,
it could be that people are having performance issues that haven't reached
the level of pain needed to cause a bug report.  In any case, that LIMIT
clause can't possibly do anything helpful with any existing version of the
planner, and it demonstrably can cause bad choices in some cases, so
back-patch to 9.4 where the code was introduced.

Thomas Munro

Discussion: https://postgr.es/m/CAMkU=1z-JoGymHneGHar1cru4F1XDfHqJDzxP_CtK5cL3DOfmg@mail.gmail.com
2018-03-19 17:23:21 -04:00
Alvaro Herrera 6666ee49f4 Fix state reversal after partition tuple routing
We make some changes to ModifyTableState and the EState it uses whenever
we route tuples to partitions; but we weren't restoring properly in all
cases, possibly causing crashes when partitions with different tuple
descriptors are targeted by tuples inserted in the same command.
Refactor some code, creating ExecPrepareTupleRouting, to encapsulate the
needed state changing logic, and have it invoked one level above its
current place (ie. put it in ExecModifyTable instead of ExecInsert);
this makes it all more readable.

Add a test case to exercise this.

We don't support having views as partitions; and since only views can
have INSTEAD OF triggers, there is no point in testing for INSTEAD OF
when processing insertions into a partitioned table.  Remove code that
appears to support this (but which is actually never relevant.)

In passing, fix location of some very confusing comments in
ModifyTableState.

Reported-by: Amit Langote
Author: Etsuro Fujita, Amit Langote
Discussion: https://postgr/es/m/0473bf5c-57b1-f1f7-3d58-455c2230bc5f@lab.ntt.co.jp
2018-03-19 17:45:53 -03:00
Andres Freund 7a50bb690b Add 'unit' parameter to ExplainProperty{Integer,Float}.
This allows to deduplicate some existing code, but mainly avoids some
duplication in upcoming commits.

In passing, fix variable names indicating wrong unit (seconds instead
of ms).

Author: Andres Freund
Discussion: https://postgr.es/m/20180314002740.cah3mdsonz5mxney@alap3.anarazel.de
2018-03-16 23:16:04 -07:00
Andres Freund f3e4b95edb Make ExplainPropertyInteger accept 64bit input, remove *Long variant.
'long' is not useful type across platforms, as it's 32bit on 32 bit
platforms, and even on some 64bit platforms (e.g. windows) it's still
only 32bits wide.

As ExplainPropertyInteger should never be performance critical, change
it to accept a 64bit argument and remove ExplainPropertyLong.

Author: Andres Freund
Discussion: https://postgr.es/m/20180314164832.n56wt7zcbpzi6zxe@alap3.anarazel.de
2018-03-16 23:13:12 -07:00
Peter Eisentraut 04700b685f Rename TransactionChain functions
We call this thing a "transaction block" everywhere except in a few
functions, where it is mysteriously called a "transaction chain".  In
the SQL standard, a transaction chain is something different.  So rename
these functions to match the common terminology.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-03-16 13:18:06 -04:00
Peter Eisentraut 3a4b891964 Fix more format truncation issues
Fix the warnings created by the compiler warning options
-Wformat-overflow=2 -Wformat-truncation=2, supported since GCC 7.  This
is a more aggressive variant of the fixes in
6275f5d28a, which GCC 7 warned about by
default.

The issues are all harmless, but some dubious coding patterns are
cleaned up.

One issue that is of external interest is that BGW_MAXLEN is increased
from 64 to 96.  Apparently, the old value would cause the bgw_name of
logical replication workers to be truncated in some circumstances.

But this doesn't actually add those warning options.  It appears that
the warnings depend a bit on compilation and optimization options, so it
would be annoying to have to keep up with that.  This is more of a
once-in-a-while cleanup.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-15 11:41:42 -04:00
Peter Eisentraut f66e8bf875 Remove pg_class.relhaspkey
It is not used for anything internally, and it cannot be relied on for
external uses, so it can just be removed.  To correct recommended way to
check for a primary key is in pg_index.

Discussion: https://www.postgresql.org/message-id/flat/b1a24c6c-6913-f89c-674e-0704f0ed69db@2ndquadrant.com
2018-03-14 15:31:34 -04:00
Peter Eisentraut 33803f67f1 Support INOUT arguments in procedures
In a top-level CALL, the values of INOUT arguments will be returned as a
result row.  In PL/pgSQL, the values are assigned back to the input
arguments.  In other languages, the same convention as for return a
record from a function is used.  That does not require any code changes
in the PL implementations.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2018-03-14 12:07:28 -04:00
Tom Lane d04900de7d When updating reltuples after ANALYZE, just extrapolate from our sample.
The existing logic for updating pg_class.reltuples trusted the sampling
results only for the pages ANALYZE actually visited, preferring to
believe the previous tuple density estimate for all the unvisited pages.
While there's some rationale for doing that for VACUUM (first that
VACUUM is likely to visit a very nonrandom subset of pages, and second
that we know for sure that the unvisited pages did not change), there's
no such rationale for ANALYZE: by assumption, it's looked at an unbiased
random sample of the table's pages.  Furthermore, in a very large table
ANALYZE will have examined only a tiny fraction of the table's pages,
meaning it cannot slew the overall density estimate very far at all.
In a table that is physically growing, this causes reltuples to increase
nearly proportionally to the change in relpages, regardless of what is
actually happening in the table.  This has been observed to cause reltuples
to become so much larger than reality that it effectively shuts off
autovacuum, whose threshold for doing anything is a fraction of reltuples.
(Getting to the point where that would happen seems to require some
additional, not well understood, conditions.  But it's undeniable that if
reltuples is seriously off in a large table, ANALYZE alone will not fix it
in any reasonable number of iterations, especially not if the table is
continuing to grow.)

Hence, restrict the use of vac_estimate_reltuples() to VACUUM alone,
and in ANALYZE, just extrapolate from the sample pages on the assumption
that they provide an accurate model of the whole table.  If, by very bad
luck, they don't, at least another ANALYZE will fix it; in the old logic
a single bad estimate could cause problems indefinitely.

In HEAD, let's remove vac_estimate_reltuples' is_analyze argument
altogether; it was never used for anything and now it's totally pointless.
But keep it in the back branches, in case any third-party code is calling
this function.

Per bug #15005.  Back-patch to all supported branches.

David Gould, reviewed by Alexander Kuzmenkov, cosmetic changes by me

Discussion: https://postgr.es/m/20180117164916.3fdcf2e9@engels
2018-03-13 13:24:27 -04:00
Peter Eisentraut 377b5ac484 Fix CREATE TABLE / LIKE with bigint identity column
CREATE TABLE / LIKE with a bigint identity column would fail on
platforms where long is 32 bits.  Copying the sequence values used
makeInteger(), which would truncate the 64-bit sequence data to 32 bits.
To fix, use makeFloat() instead, like the parser.  (This does not
actually make use of floats, but stores the values as strings.)

Bug: #15096
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-13 09:41:30 -04:00
Alvaro Herrera 1f8a3327a9 Avoid having two PKs in a partition
If a table containing a primary key is attach as partition to a
partitioned table which has a primary key with a different definition,
we would happily create a second one in the new partition.  Oops.  It
turns out that this is because an error check in DefineIndex is executed
only if you tell it that it's being run by ALTER TABLE, and the original
code here wasn't.  Change it so that it does.

Added a couple of test cases for this, also.  A previously working test
started to fail in a different way than before patch because the new
check is called earlier; change the PK to plain UNIQUE so that the new
behavior isn't invoked, so that the test continues to verify what we
want it to verify.

Reported by: Noriyoshi Shinoda
Discussion: https://postgr.es/m/DF4PR8401MB102060EC2615EC9227CC73F7EEDF0@DF4PR8401MB1020.NAMPRD84.PROD.OUTLOOK.COM
2018-03-12 19:42:32 -03:00
Tom Lane 4a4e2442a7 Fix improper uses of canonicalize_qual().
One of the things canonicalize_qual() does is to remove constant-NULL
subexpressions of top-level AND/OR clauses.  It does that on the assumption
that what it's given is a top-level WHERE clause, so that NULL can be
treated like FALSE.  Although this is documented down inside a subroutine
of canonicalize_qual(), it wasn't mentioned in the documentation of that
function itself, and some callers hadn't gotten that memo.

Notably, commit d007a9505 caused get_relation_constraints() to apply
canonicalize_qual() to CHECK constraints.  That allowed constraint
exclusion to misoptimize situations in which a CHECK constraint had a
provably-NULL subclause, as seen in the regression test case added here,
in which a child table that should be scanned is not.  (Although this
thinko is ancient, the test case doesn't fail before 9.2, for reasons
I've not bothered to track down in detail.  There may be related cases
that do fail before that.)

More recently, commit f0e44751d added an independent bug by applying
canonicalize_qual() to index expressions, which is even sillier since
those might not even be boolean.  If they are, though, I think this
could lead to making incorrect index entries for affected index
expressions in v10.  I haven't attempted to prove that though.

To fix, add an "is_check" parameter to canonicalize_qual() to specify
whether it should assume WHERE or CHECK semantics, and make it perform
NULL-elimination accordingly.  Adjust the callers to apply the right
semantics, or remove the call entirely in cases where it's not known
that the expression has one or the other semantics.  I also removed
the call in some cases involving partition expressions, where it should
be a no-op because such expressions should be canonical already ...
and was a no-op, independently of whether it could in principle have
done something, because it was being handed the qual in implicit-AND
format which isn't what it expects.  In HEAD, add an Assert to catch
that type of mistake in future.

This represents an API break for external callers of canonicalize_qual().
While that's intentional in HEAD to make such callers think about which
case applies to them, it seems like something we probably wouldn't be
thanked for in released branches.  Hence, in released branches, the
extra parameter is added to a new function canonicalize_qual_ext(),
and canonicalize_qual() is a wrapper that retains its old behavior.

Patch by me with suggestions from Dean Rasheed.  Back-patch to all
supported branches.

Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us
2018-03-11 18:10:42 -04:00
Tom Lane 5748f3a0aa Improve predtest.c's internal docs, and enhance its functionality a bit.
Commit b08df9cab left things rather poorly documented as far as the
exact semantics of "clause_is_check" mode went.  Also, that mode did
not really work correctly for predicate_refuted_by; although given the
lack of specification as to what it should do, as well as the lack
of any actual use-case, that's perhaps not surprising.

Rename "clause_is_check" to "weak" proof mode, and provide specifications
for what it should do.  I defined weak refutation as meaning "truth of A
implies non-truth of B", which makes it possible to use the mode in the
part of relation_excluded_by_constraints that checks for mutually
contradictory WHERE clauses.  Fix up several places that did things wrong
for that definition.  (As far as I can see, these errors would only lead
to failure-to-prove, not incorrect claims of proof, making them not
serious bugs even aside from the fact that v10 contains no use of this
mode.  So there seems no need for back-patching.)

In addition, teach predicate_refuted_by_recurse that it can use
predicate_implied_by_recurse after all when processing a strong NOT-clause,
so long as it asks for the correct proof strength.  This is an optimization
that could have been included in commit b08df9cab, but wasn't.

Also, simplify and generalize the logic that checks for whether nullness of
the argument of IS [NOT] NULL would force overall nullness of the predicate
or clause.  (This results in a change in the partition_prune test's output,
as it is now able to prune an all-nulls partition that it did not recognize
before.)

In passing, in PartConstraintImpliedByRelConstraint, remove bogus
conversion of the constraint list to explicit-AND form and then right back
again; that accomplished nothing except forcing a useless extra level of
recursion inside predicate_implied_by.

Discussion: https://postgr.es/m/5983.1520487191@sss.pgh.pa.us
2018-03-09 16:58:26 -05:00
Alvaro Herrera 1ffb63a2a1 Fix bogus Name assignment in CreateStatistics
Apparently, it doesn't work to use a plain cstring as a Name datum: you
may end up having random bytes because of failing to zero the bytes
after the terminating \0, as indicated by valgrind.  I introduced this
bug in 5564c11815, so backpatch this fix to REL_10_STABLE, like that
commit.

While at it, fix a slightly misleading comment, pointed out by David
Rowley.
2018-03-06 13:20:40 -03:00
Alvaro Herrera 5564c11815 Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)
The LIKE INCLUDING ALL clause to CREATE TABLE intuitively indicates
cloning of extended statistics on the source table, but it failed to do
so.  Patch it up so that it does.  Also include an INCLUDING STATISTICS
option to the LIKE clause, so that the behavior can be requested
individually, or excluded individually.

While at it, reorder the INCLUDING options, both in code and in docs, in
alphabetical order which makes more sense than feature-implementation
order that was previously used.

Backpatch this to Postgres 10, where extended statistics were
introduced, because this is seen as an oversight in a fresh feature
which is better to get consistent from the get-go instead of changing
only in pg11.

In pg11, comments on statistics objects are cloned too.  In pg10 they
are not, because I (Álvaro) was too coward to change the parse node as
required to support it.  Also, in pg10 I chose not to renumber the
parser symbols for the various INCLUDING options in LIKE, for the same
reason.  Any corresponding user-visible changes (docs) are backpatched,
though.

Reported-by: Stephen Froehlich
Author: David Rowley
Reviewed-by: Álvaro Herrera, Tomas Vondra
Discussion: https://postgr.es/m/CY1PR0601MB1927315B45667A1B679D0FD5E5EF0@CY1PR0601MB1927.namprd06.prod.outlook.com
2018-03-05 19:37:19 -03:00
Peter Eisentraut fd1a421fe6 Add prokind column, replacing proisagg and proiswindow
The new column distinguishes normal functions, procedures, aggregates,
and window functions.  This replaces the existing columns proisagg and
proiswindow, and replaces the convention that procedures are indicated
by prorettype == 0.  Also change prorettype to be VOIDOID for procedures.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-02 13:48:33 -05:00
Tom Lane 25b692568f Prevent dangling-pointer access when update trigger returns old tuple.
A before-update row trigger may choose to return the "new" or "old" tuple
unmodified.  ExecBRUpdateTriggers failed to consider the second
possibility, and would proceed to free the "old" tuple even if it was the
one returned, leading to subsequent access to already-deallocated memory.
In debug builds this reliably leads to an "invalid memory alloc request
size" failure; in production builds it might accidentally work, but data
corruption is also possible.

This is a very old bug.  There are probably a couple of reasons it hasn't
been noticed up to now.  It would be more usual to return NULL if one
wanted to suppress the update action; returning "old" is significantly less
efficient since the update will occur anyway.  Also, none of the standard
PLs would ever cause this because they all returned freshly-manufactured
tuples even if they were just copying "old".  But commit 4b93f5799 changed
that for plpgsql, making it possible to see the bug with a plpgsql trigger.
Still, this is certainly legal behavior for a trigger function, so it's
ExecBRUpdateTriggers's fault not plpgsql's.

It seems worth creating a test case that exercises returning "old" directly
with a C-language trigger; testing this through plpgsql seems unreliable
because its behavior might change again.

Report and fix by Rushabh Lathia; regression test case by me.
Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAGPqQf1P4pjiNPrMof=P_16E-DFjt457j+nH2ex3=nBTew7tXw@mail.gmail.com
2018-02-27 13:28:02 -05:00
Peter Eisentraut 76b6aa41f4 Support parameters in CALL
To support parameters in CALL, move the parse analysis of the procedure
and arguments into the global transformation phase, so that the parser
hooks can be applied.  And then at execution time pass the parameters
from ProcessUtility on to ExecuteCallStmt.
2018-02-22 21:36:48 -05:00
Robert Haas edd44738bc Be lazier about partition tuple routing.
It's not necessary to fully initialize the executor data structures
for partitions to which no tuples are ever routed.  Consider, for
example, an INSERT statement that inserts only one row: it only cares
about the partition to which that one row is routed.  The new function
ExecInitPartitionInfo performs the initialization in question only
when a particular partition is about to receive a tuple. This includes
creating, validating, and saving a pointer to the ResultRelInfo,
setting up for speculative insertions, translating WCOs and
initializing the resulting expressions, translating returning lists
and building the appropriate projection information, and setting up a
tuple conversion map.

One thing that's not deferred is locking the child partitions; that
seems desirable but would need more thought.  Still, testing shows
that this makes single-row inserts significantly faster on a table
with many partitions without harming the bulk-insert case.

Amit Langote, reviewed by Etsuro Fujita, with a few changes by me

Discussion: http://postgr.es/m/8975331d-d961-cbdd-f862-fdd3d97dc2d0@lab.ntt.co.jp
2018-02-22 10:55:54 -05:00
Peter Eisentraut c2ff42c6c1 Error message improvement 2018-02-20 17:58:27 -05:00
Alvaro Herrera eb7ed3f306 Allow UNIQUE indexes on partitioned tables
If we restrict unique constraints on partitioned tables so that they
must always include the partition key, then our standard approach to
unique indexes already works --- each unique key is forced to exist
within a single partition, so enforcing the unique restriction in each
index individually is enough to have it enforced globally.  Therefore we
can implement unique indexes on partitions by simply removing a few
restrictions (and adding others.)

Discussion: https://postgr.es/m/20171222212921.hi6hg6pem2w2t36z@alvherre.pgsql
Discussion: https://postgr.es/m/20171229230607.3iib6b62fn3uaf47@alvherre.pgsql
Reviewed-by: Simon Riggs, Jesper Pedersen, Peter Eisentraut, Jaime
	Casanova, Amit Langote
2018-02-19 17:40:00 -03:00
Peter Eisentraut 97a804cb2b Message style fix 2018-02-18 17:16:11 -05:00
Peter Eisentraut 1a1adb215c Move function comment to the right place 2018-02-17 20:45:28 -05:00
Andres Freund ad7dbee368 Allow tupleslots to have a fixed tupledesc, use in executor nodes.
The reason for doing so is that it will allow expression evaluation to
optimize based on the underlying tupledesc. In particular it will
allow to JIT tuple deforming together with the expression itself.

For that expression initialization needs to be moved after the
relevant slots are initialized - mostly unproblematic, except in the
case of nodeWorktablescan.c.

After doing so there's no need for ExecAssignResultType() and
ExecAssignResultTypeFromTL() anymore, as all former callers have been
converted to create a slot with a fixed descriptor.

When creating a slot with a fixed descriptor, tts_values/isnull can be
allocated together with the main slot, reducing allocation overhead
and increasing cache density a bit.

Author: Andres Freund
Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de
2018-02-16 21:17:38 -08:00
Tom Lane d02d4a6d4f Avoid premature free of pass-by-reference CALL arguments.
Prematurely freeing the EState used to evaluate CALL arguments led, in some
cases, to passing dangling pointers to the procedure.  This was masked in
trivial cases because the argument pointers would point to Const nodes in
the original expression tree, and in some other cases because the result
value would end up in the standalone ExprContext rather than in memory
belonging to the EState --- but that wasn't exactly high quality
programming either, because the standalone ExprContext was never
explicitly freed, breaking assorted API contracts.

In addition, using a separate EState for each argument was just silly.

So let's use just one EState, and one ExprContext, and make the latter
belong to the former rather than be standalone, and clean up the EState
(and hence the ExprContext) post-call.

While at it, improve the function's commentary a bit.

Discussion: https://postgr.es/m/29173.1518282748@sss.pgh.pa.us
2018-02-10 13:37:12 -05:00