Commit Graph

3514 Commits

Author SHA1 Message Date
Peter Eisentraut a7a1b44516 Fix crash in multi-insert COPY
A bug introduced in 0d5f05cde0
considered the *previous* partition's triggers when deciding whether
multi-insert can be used.  Rearrange the code so that the current
partition is considered.

Author: Ashutosh Sharma <ashu.coek88@gmail.com>
2018-10-17 20:31:20 +02:00
Andres Freund 62649bad83 Correct constness of a few variables.
This allows the compiler / linker to mark affected pages as read-only.

There's other cases, but they're a bit more invasive, and should go
through some review. These are easy.

They were found with
objdump -j .data -t src/backend/postgres|awk '{print $4, $5, $6}'|sort -r|less

Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de
2018-10-15 21:01:14 -07:00
Alvaro Herrera c7d43c4d8a Correct attach/detach logic for FKs in partitions
There was no code to handle foreign key constraints on partitioned
tables in the case of ALTER TABLE DETACH; and if you happened to ATTACH
a partition that already had an equivalent constraint, that one was
ignored and a new constraint was created.  Adding this to the fact that
foreign key cloning reuses the constraint name on the partition instead
of generating a new name (as it probably should, to cater to SQL
standard rules about constraint naming within schemas), the result was a
pretty poor user experience -- the most visible failure was that just
detaching a partition and re-attaching it failed with an error such as

  ERROR:  duplicate key value violates unique constraint "pg_constraint_conrelid_contypid_conname_index"
  DETAIL:  Key (conrelid, contypid, conname)=(26702, 0, test_result_asset_id_fkey) already exists.

because it would try to create an identically-named constraint in the
partition.  To make matters worse, if you tried to drop the constraint
in the now-independent partition, that would fail because the constraint
was still seen as dependent on the constraint in its former parent
partitioned table:
  ERROR:  cannot drop inherited constraint "test_result_asset_id_fkey" of relation "test_result_cbsystem_0001_0050_monthly_2018_09"

This fix attacks the problem from two angles: first, when the partition
is detached, the constraint is also marked as independent, so the drop
now works.  Second, when the partition is re-attached, we scan existing
constraints searching for one matching the FK in the parent, and if one
exists, we link that one to the parent constraint.  So we don't end up
with a duplicate -- and better yet, we don't need to scan the referenced
table to verify that the constraint holds.

To implement this I made a small change to previously planner-only
struct ForeignKeyCacheInfo to contain the constraint OID; also relcache
now maintains the list of FKs for partitioned tables too.

Backpatch to 11.

Reported-by: Michael Vitale (bug #15425)
Discussion: https://postgr.es/m/15425-2dbc9d2aa999f816@postgresql.org
2018-10-12 12:37:37 -03:00
Peter Eisentraut f8c10f616f Turn transaction_isolation into GUC enum
It was previously a string setting that was converted into an enum by
custom code, but using the GUC enum facility seems much simpler and
doesn't change any functionality, except that

    set transaction_isolation='default';

no longer works, but that was never documented and doesn't work with
any other transaction characteristics.  (Note that this is not the
same as RESET or SET TO DEFAULT, which still work.)

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/457db615-e84c-4838-310e-43841eb806e5@iki.fi
2018-10-09 21:26:00 +02:00
Thomas Munro 212fab9926 Relax transactional restrictions on ALTER TYPE ... ADD VALUE (redux).
Originally committed as 15bc038f (plus some follow-ups), this was
reverted in 28e07270 due to a problem discovered in parallel
workers.  This new version corrects that problem by sending the
list of uncommitted enum values to parallel workers.

Here follows the original commit message describing the change:

To prevent possibly breaking indexes on enum columns, we must keep
uncommitted enum values from getting stored in tables, unless we
can be sure that any such column is new in the current transaction.

Formerly, we enforced this by disallowing ALTER TYPE ... ADD VALUE
from being executed at all in a transaction block, unless the target
enum type had been created in the current transaction.  This patch
removes that restriction, and instead insists that an uncommitted enum
value can't be referenced unless it belongs to an enum type created
in the same transaction as the value.  Per discussion, this should be
a bit less onerous.  It does require each function that could possibly
return a new enum value to SQL operations to check this restriction,
but there aren't so many of those that this seems unmaintainable.

Author: Andrew Dunstan and Tom Lane, with parallel query fix by Thomas Munro
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAEepm%3D0Ei7g6PaNTbcmAh9tCRahQrk%3Dr5ZWLD-jr7hXweYX3yg%40mail.gmail.com
Discussion: https://postgr.es/m/4075.1459088427%40sss.pgh.pa.us
2018-10-09 12:51:01 +13:00
Alvaro Herrera eee01d606e Silence compiler warning in Assert()
gcc 6.3 does not whine about this mistake I made in 39808e8868 but
evidently lots of other compilers do, according to Michael Paquier,
Peter Eisentraut, Arthur Zakirov, Tomas Vondra.

Discussion: too many to list
2018-10-08 10:37:21 -03:00
Peter Eisentraut 634b4b79cb Track procedure calls in pg_stat_user_functions
This was forgotten when procedures were implemented.

Reported-by: Lukas Fittl <lukas@fittl.com>
2018-10-08 11:22:53 +02:00
Michael Paquier 9c2a970d1f Improve two error messages related to foreign keys on partitioned tables
Error messages for creating a foreign key on a partitioned table using
ONLY or NOT VALID were wrong in mentioning the objects they worked on.
This commit adds on the way some regression tests missing for those
cases.

Author: Laurenz Albe
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/c11c05810a9ed65e9b2c817a9ef442275a32fe80.camel@cybertec.at
2018-10-08 17:56:13 +09:00
Alvaro Herrera 39808e8868 Fix catalog insertion order for ATTACH PARTITION
Commit 2fbdf1b38b changed the order in which we inserted catalog rows
when creating partitions, so that we could remove an unsightly hack
required for untimely relcache invalidations.  However, that commit only
changed the ordering for CREATE TABLE PARTITION OF, and left ALTER TABLE
ATTACH PARTITION unchanged, so the latter can be affected when catalog
invalidations occur, for instance when the partition key involves an SQL
function.

Reported-by: Rajkumar Raghuwanshi
Author: Amit Langote
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/CAKcux6=nTz9KSfTr_6Z2mpzLJ_09JN-rK6=dWic6gGyTSWueyQ@mail.gmail.com
2018-10-06 22:13:19 -03:00
Alvaro Herrera ad08006ba0 Fix event triggers for partitioned tables
Index DDL cascading on partitioned tables introduced a way for ALTER
TABLE to be called reentrantly.  This caused an an important deficiency
in event trigger support to be exposed: on exiting the reentrant call,
the alter table state object was clobbered, causing a crash when the
outer alter table tries to finalize its processing.  Fix the crash by
creating a stack of event trigger state objects.  There are still ways
to cause things to misbehave (and probably other crashers) with more
elaborate tricks, but at least it now doesn't crash in the obvious
scenario.

Backpatch to 9.5, where DDL deparsing of event triggers was introduced.

Reported-by: Marco Slot
Authors: Michaël Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/CANNhMLCpi+HQ7M36uPfGbJZEQLyTy7XvX=5EFkpR-b1bo0uJew@mail.gmail.com
2018-10-06 19:17:46 -03:00
Michael Paquier 38921d1416 Assign constraint name when cloning FK definition for partitions
This is for example used when attaching a partition to a partitioned
table which includes foreign keys, and in this case the constraint name
has been missing in the data cloned.  This could lead to hard crashes,
as when validating the foreign key constraint, the constraint name is
always expected.  Particularly, when using log_min_messages >= DEBUG1, a
log message would be generated with this unassigned constraint name,
leading to an assertion failure on HEAD.

While on it, rename a variable in ATExecAttachPartition which was
declared twice with the same name.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20181005042236.GG1629@paquier.xyz
Backpatch-through: 11
2018-10-06 14:59:36 +09:00
Tom Lane d73f4c74dd In the executor, use an array of pointers to access the rangetable.
Instead of doing a lot of list_nth() accesses to es_range_table,
create a flattened pointer array during executor startup and index
into that to get at individual RangeTblEntrys.

This eliminates one source of O(N^2) behavior with lots of partitions.
(I'm not exactly convinced that it's the most important source, but
it's an easy one to fix.)

Amit Langote and David Rowley

Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-10-04 15:48:17 -04:00
Tom Lane 9ddef36278 Centralize executor's opening/closing of Relations for rangetable entries.
Create an array estate->es_relations[] paralleling the es_range_table,
and store references to Relations (relcache entries) there, so that any
given RT entry is opened and closed just once per executor run.  Scan
nodes typically still call ExecOpenScanRelation, but ExecCloseScanRelation
is no more; relation closing is now done centrally in ExecEndPlan.

This is slightly more complex than one would expect because of the
interactions with relcache references held in ResultRelInfo nodes.
The general convention is now that ResultRelInfo->ri_RelationDesc does
not represent a separate relcache reference and so does not need to be
explicitly closed; but there is an exception for ResultRelInfos in the
es_trig_target_relations list, which are manufactured by
ExecGetTriggerResultRel and have to be cleaned up by
ExecCleanUpTriggerState.  (That much was true all along, but these
ResultRelInfos are now more different from others than they used to be.)

To allow the partition pruning logic to make use of es_relations[] rather
than having its own relcache references, adjust PartitionedRelPruneInfo
to store an RT index rather than a relation OID.

Amit Langote, reviewed by David Rowley and Jesper Pedersen,
some mods by me

Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-10-04 14:03:42 -04:00
Michael Paquier 803b1301e8 Add option SKIP_LOCKED to VACUUM and ANALYZE
When specified, this option allows VACUUM to skip the work on a relation
if there is a conflicting lock on it when trying to open it at the
beginning of its processing.

Similarly to autovacuum, this comes with a couple of limitations while
the relation is processed which can cause the process to still block:
- when opening the relation indexes.
- when acquiring row samples for table inheritance trees, partition trees
or certain types of foreign tables, and that a lock is taken on some
leaves of such trees.

Author: Nathan Bossart
Reviewed-by: Michael Paquier, Andres Freund, Masahiko Sawada
Discussion: https://postgr.es/m/9EF7EBE4-720D-4CF1-9D0E-4403D7E92990@amazon.com
Discussion: https://postgr.es/m/20171201160907.27110.74730@wrigleys.postgresql.org
2018-10-04 09:00:33 +09:00
Andres Freund c03c1449c0 Fix issues around EXPLAIN with JIT.
I (Andres) was more than a bit hasty in committing 33001fd7a7
after last minute changes, leading to a number of problems (jit output
was only shown for JIT in parallel workers, and just EXPLAIN without
ANALYZE didn't work).  Lukas luckily found these issues quickly.

Instead of combining instrumentation in in standard_ExecutorEnd(), do
so on demand in the new ExplainPrintJITSummary().

Also update a documentation example of the JIT output, changed in
52050ad8eb.

Author: Lukas Fittl, with minor changes by me
Discussion: https://postgr.es/m/CAP53PkxmgJht69pabxBXJBM+0oc6kf3KHMborLP7H2ouJ0CCtQ@mail.gmail.com
Backpatch: 11, where JIT compilation was introduced
2018-10-03 12:48:37 -07:00
Andres Freund cc2905e963 Use slots more widely in tuple mapping code and make naming more consistent.
It's inefficient to use a single slot for mapping between tuple
descriptors for multiple tuples, as previously done when using
ConvertPartitionTupleSlot(), as that means the slot's tuple descriptors
change for every tuple.

Previously we also, via ConvertPartitionTupleSlot(), built new tuples
after the mapping even in cases where we, immediately afterwards,
access individual columns again.

Refactor the code so one slot, on demand, is used for each
partition. That avoids having to change the descriptor (and allows to
use the more efficient "fixed" tuple slots). Then use slot->slot
mapping, to avoid unnecessarily forming a tuple.

As the naming between the tuple and slot mapping functions wasn't
consistent, rename them to execute_attr_map_{tuple,slot}.  It's likely
that we'll also rename convert_tuples_by_* to denote that these
functions "only" build a map, but that's left for later.

Author: Amit Khandekar and Amit Langote, editorialized by me
Reviewed-By: Amit Langote, Amit Khandekar, Andres Freund
Discussion:
    https://postgr.es/m/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB+Jvpg@mail.gmail.com
    https://postgr.es/m/e4f9d743-cd4b-efb0-7574-da21d86a7f36%40lab.ntt.co.jp
Backpatch: -
2018-10-02 11:14:26 -07:00
Michael Paquier e3a25ab9ea Refactor relation opening for VACUUM and ANALYZE
VACUUM and ANALYZE share similar logic when it comes to opening a
relation to work on in terms of how the relation is opened, in which
order locks are tried and how logs should be generated when something
does not work as expected.

This commit refactors things so as both use the same code path to handle
the way a relation is opened, so as the integration of new options
becomes easier.

Author: Michael Paquier
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/20180927075152.GT1659@paquier.xyz
2018-10-02 08:53:38 +09:00
Tom Lane e27453bd83 Fix ALTER COLUMN TYPE to not open a relation without any lock.
If the column being modified is referenced by a foreign key constraint
of another table, ALTER TABLE would open the other table (to re-parse
the constraint's definition) without having first obtained a lock on it.
This was evidently intentional, but that doesn't mean it's really safe.
It's especially not safe in 9.3, which pre-dates use of MVCC scans for
catalog reads, but even in current releases it doesn't seem like a good
idea.

We know we'll need AccessExclusiveLock shortly to drop the obsoleted
constraint, so just get that a little sooner to close the hole.

Per testing with a patch that complains if we open a relation without
holding any lock on it.  I don't plan to back-patch that patch, but we
should close the holes it identifies in all supported branches.

Discussion: https://postgr.es/m/2038.1538335244@sss.pgh.pa.us
2018-10-01 11:39:13 -04:00
Tom Lane fdba460a26 Create an RTE field to record the query's lock mode for each relation.
Add RangeTblEntry.rellockmode, which records the appropriate lock mode for
each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock,
or RowExclusiveLock depending on the RTE's role in the query).

This patch creates the field and makes all creators of RTE nodes fill it
in reasonably, but for the moment nothing much is done with it.  The plan
is to replace assorted post-parser logic that re-determines the right
lockmode to use with simple uses of rte->rellockmode.  For now, just add
Asserts in each of those places that the rellockmode matches what they are
computing today.  (In some cases the match isn't perfect, so the Asserts
are weaker than you might expect; but this seems OK, as per discussion.)

This passes check-world for me, but it seems worth pushing in this state
to see if the buildfarm finds any problems in cases I failed to test.

catversion bump due to change of stored rules.

Amit Langote, reviewed by David Rowley and Jesper Pedersen,
and whacked around a bit more by me

Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-09-30 13:55:51 -04:00
Peter Eisentraut 0320ddaf3c Recurse to sequences on ownership change for all relkinds
When a table ownership is changed, we must apply that also to any owned
sequences.  (Otherwise, it would result in a situation that cannot be
restored, because linked sequences must have the same owner as the
table.)  But this was previously only applied to regular tables and
materialized views.  But it should also apply to at least foreign
tables.  This patch removes the relkind check altogether, because it
doesn't save very much and just introduces the possibility of similar
omissions.

Bug: #15238
Reported-by: Christoph Berg <christoph.berg@credativ.de>
2018-09-26 20:19:15 +02:00
Andres Freund 29c94e03c7 Split ExecStoreTuple into ExecStoreHeapTuple and ExecStoreBufferHeapTuple.
Upcoming changes introduce further types of tuple table slots, in
preparation of making table storage pluggable. New storage methods
will have different representation of tuples, therefore the slot
accessor should refer explicitly to heap tuples.

Instead of just renaming the functions, split it into one function
that accepts heap tuples not residing in buffers, and one accepting
ones in buffers.  Previously one function was used for both, but that
was a bit awkward already, and splitting will allow us to represent
slot types for tuples in buffers and normal memory separately.

This is split out from the patch introducing abstract slots, as this
largely consists out of mechanical changes.

Author: Ashutosh Bapat
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
2018-09-25 16:27:48 -07:00
Andres Freund 33001fd7a7 Collect JIT instrumentation from workers.
Previously, when using parallel query, EXPLAIN (ANALYZE)'s JIT
compilation timings did not include the overhead from doing so on the
workers.  Fix that.

We do so by simply aggregating the cost of doing JIT compilation on
workers and the leader together. Arguably that's not quite accurate,
because the total time spend doing so is spent in parallel - but it's
hard to do much better.  For additional detail, when VERBOSE is
specified, the stats for workers are displayed separately.

Author: Amit Khandekar and Andres Freund
Discussion: https://postgr.es/m/CAJ3gD9eLrz51RK_gTkod+71iDcjpB_N8eC6vU2AW-VicsAERpQ@mail.gmail.com
Backpatch: 11-
2018-09-25 13:12:44 -07:00
Andres Freund 52050ad8eb Make EXPLAIN output for JIT compilation more dense.
A discussion about also reporting JIT compilation overhead on workers
brought unhappiness with the verbosity of the current explain format
to light.  Make the text format more dense, and restructure the
structured output to mirror that more closely.

As we're re-jiggering the output format anyway: The denser format
allows us to report all flags for JIT compilation (now also reporting
PGJIT_EXPR and PGJIT_DEFORM), and report the total time in addition to
the individual times.

Per complaint from Tom Lane.

Author: Andres Freund
Discussion: https://postgr.es/m/27812.1537221015@sss.pgh.pa.us
Backpatch: 11-, where JIT compilation was introduced
2018-09-24 13:35:45 -07:00
Andrew Dunstan 7636e5c60f Fast default trigger and expand_tuple fixes
Ensure that triggers get properly filled in tuples for the OLD value.
Also fix the logic of detecting missing null values. The previous logic
failed to detect a missing null column before the first missing column
with a default. Fixing this has simplified the logic a bit.

Regression tests are added to test changes. This should ensure better
coverage of expand_tuple().

Original bug reports, and some code and test scripts from Tomas Vondra

Backpatch to release 11.
2018-09-24 16:11:24 -04:00
Alvaro Herrera 20bef2c311 Fix ALTER/TYPE on columns referenced by FKs in partitioned tables
When ALTER TABLE ... SET DATA TYPE affects a column referenced by
constraints and indexes, it drop those constraints and indexes and
recreates them afterwards, so that the definitions match the new data
type.  The original code did this by dropping one object at a time
(commit 077db40fa1 of May 2004), which worked fine because the
dependencies between the objects were pretty straightforward, and
ordering the objects in a specific way was enough to make this work.
However, when there are foreign key constraints in partitioned tables,
the dependencies are no longer so straightforward, and we were getting
errors when attempted:
  ERROR:  cache lookup failed for constraint 16398

This can be fixed by doing all the drops in one pass instead, using
performMultipleDeletions (introduced by df18c51f29 of Aug 2006).  With
this change we can also remove the code to carefully order the list of
objects to be deleted.

Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAKcux6nWS_m+s=1Udk_U9B+QY7pA-Ac58qR5BdUfOyrwnWHDew@mail.gmail.com
2018-09-14 13:41:20 -03:00
Michael Paquier 28a8fa984c Improve autovacuum logging for aggressive and anti-wraparound runs
A log message was being generated when log_min_duration is reached for
autovacuum on a given relation to indicate if it was an aggressive run,
and missed the point of mentioning if it is doing an anti-wrapround
run.  The log message generated is improved so as one, both or no extra
details are added depending on the option set.

Author: Sergei Kornilov
Reviewed-by: Masahiko Sawada, Michael Paquier
Discussion: https://postgr.es/m/11587951532155118@sas1-19a94364928d.qloud-c.yandex.net
2018-09-14 07:35:39 +09:00
Michael Paquier 514a731ddc Simplify static function in extension.c
An extra argument for the filename defining the extension script
location was present, aimed at being used for error reporting, but has
never been used.  This was around since extensions have been added in
d9572c4.

Author: Yugo Nagata
Reviewed-by: Tatsuo Ishii
Discussion: https://postgr.es/m/20180907180504.1ff19e1675bb44a67e9c7ab1@sraoss.co.jp
2018-09-13 16:56:57 +09:00
Michael Paquier 9226a3b89b Remove duplicated words split across lines in comments
This has been detected using some interesting tricks with sed, and the
method used is mentioned in details in the discussion below.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20180908013109.GB15350@telsasoft.com
2018-09-08 12:24:19 -07:00
Tom Lane 54b01b9293 Remove no-longer-used variable.
Oversight in 2fbdf1b38.  Per buildfarm.
2018-09-05 14:29:58 -04:00
Alvaro Herrera 2fbdf1b38b Simplify partitioned table creation vs. relcache
In the original code, we were storing the pg_inherits row for a
partitioned table too early: enough that we had a hack for relcache to
avoid falling flat on its face while reading such a partial entry.  If
we finish the pg_class creation first and *then* store the pg_inherits
entry, we don't need that hack.

Also recognize that pg_class.relpartbound is not marked NOT NULL and
therefore it's entirely possible to read null values, so having only
Assert() protection isn't enough.  Change those to if/elog tests
instead.  This qualifies as a robustness fix, so backpatch to pg11.

In passing, remove one access that wasn't actually needed, and reword
one message to be like all the others that check for the same thing.

Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/20180903213916.hh6wasnrdg6xv2ud@alvherre.pgsql
2018-09-05 14:36:13 -03:00
Tom Lane 17b7c302b5 Fully enforce uniqueness of constraint names.
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.

Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.

To fix, create a unique index on (conrelid, contypid, conname).  Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain.  (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed.  But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.)  This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.

Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.

Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André.  And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.

While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.

Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 13:45:35 -04:00
Tom Lane 44cac93464 Avoid using potentially-under-aligned page buffers.
There's a project policy against using plain "char buf[BLCKSZ]" local
or static variables as page buffers; preferred style is to palloc or
malloc each buffer to ensure it is MAXALIGN'd.  However, that policy's
been ignored in an increasing number of places.  We've apparently got
away with it so far, probably because (a) relatively few people use
platforms on which misalignment causes core dumps and/or (b) the
variables chance to be sufficiently aligned anyway.  But this is not
something to rely on.  Moreover, even if we don't get a core dump,
we might be paying a lot of cycles for misaligned accesses.

To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock
that the compiler must allocate with sufficient alignment, and use
those in place of plain char arrays.

I used these types even for variables where there's no risk of a
misaligned access, since ensuring proper alignment should make
kernel data transfers faster.  I also changed some places where
we had been palloc'ing short-lived buffers, for coding style
uniformity and to save palloc/pfree overhead.

Since this seems to be a live portability hazard (despite the lack
of field reports), back-patch to all supported versions.

Patch by me; thanks to Michael Paquier for review.

Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
2018-09-01 15:27:17 -04:00
Peter Eisentraut 1e5e4efd02 Error position support for partition specifications
Add support for error position reporting for partition specifications.

Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2018-08-30 08:20:23 +02:00
Peter Eisentraut a4a232b1e7 Error position support for defaults and check constraints
Add support for error position reporting for the expressions contained
in defaults and check constraint definitions.  This currently works only
for CREATE TABLE, not ALTER TABLE, because the latter is not set up to
pass around the original query string.

Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
2018-08-30 08:20:23 +02:00
Michael Paquier a556549d7e Improve VACUUM and ANALYZE by avoiding early lock queue
A caller of VACUUM can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a vacuum fill of a
critical catalog table to block even all incoming connection attempts.

Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after
building the list of relations to VACUUM, which can cause vacuum_rel()
or analyze_rel() to try to lock the relation but the operation would
just block.  When the client specifies a list of relations and the
relation needs to be skipped, ownership checks are done when building
the list of relations to work on, preventing a later lock attempt.

vacuum_rel() already had the sanity checks needed, except that those
were applied too late.  This commit refactors the code so as relation
skips are checked beforehand, making it safer to avoid too early locks,
for both manual VACUUM with and without a list of relations specified.

An isolation test is added emulating the fact that early locks do not
happen anymore, issuing a WARNING message earlier if the user calling
VACUUM is not a relation owner.

When a partitioned table is listed in a manual VACUUM or ANALYZE
command, its full list of partitions is fetched, all partitions get
added to the list to work on, and then each one of them is processed one
by one, with ownership checks happening at the later phase of
vacuum_rel() or analyze_rel().  Trying to do early ownership checks for
each partition is proving to be tedious as this would result in deadlock
risks with lock upgrades, and skipping all partitions if the listed
partitioned table is not owned would result in a behavior change
compared to how Postgres 10 has implemented vacuum for partitioned
tables.  The original problem reported related to early lock queue for
critical relations is fixed anyway, so priority is given to avoiding a
backward-incompatible behavior.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20180812222142.GA6097@paquier.xyz
2018-08-27 09:11:12 +09:00
Peter Eisentraut d12782898e Change PROCEDURE to FUNCTION in CREATE OPERATOR syntax
Since procedures are now a different thing from functions, change the
CREATE OPERATOR syntax to use FUNCTION in the clause that specifies the
function.  PROCEDURE is still accepted for compatibility.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>
2018-08-22 14:44:49 +02:00
Peter Eisentraut b19495772e doc: Update uses of the word "procedure"
Historically, the term procedure was used as a synonym for function in
Postgres/PostgreSQL.  Now we have procedures as separate objects from
functions, so we need to clean up the documentation to not mix those
terms.

In particular, mentions of "trigger procedures" are changed to "trigger
functions", and access method "support procedures" are changed to
"support functions".  (The latter already used FUNCTION in the SQL
syntax anyway.)  Also, the terminology in the SPI chapter has been
cleaned up.

A few tests, examples, and code comments are also adjusted to be
consistent with documentation changes, but not everything.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>
2018-08-22 14:44:49 +02:00
Michael Paquier 72be8c29a1 Fix set of NLS translation issues
While monitoring the code, a couple of issues related to string
translation has showed up:
- Some routines for auto-updatable views return an error string, which
sometimes missed the shot.  A comment regarding string translation is
added for each routine to help with future features.
- GSSAPI authentication missed two translations.
- vacuumdb handles non-translated strings.
- GetConfigOptionByNum should translate strings.  This part is not
back-patched as after a minor upgrade this could be surprising for
users.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20180810.152131.31921918.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch-through: 9.3
2018-08-21 15:17:13 +09:00
Peter Eisentraut e4597ee65d InsertPgAttributeTuple() to set attcacheoff
InsertPgAttributeTuple() is the interface between in-memory tuple
descriptors and on-disk pg_attribute, so it makes sense to give it the
job of resetting attcacheoff.  This avoids having all the callers having
to do so.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
2018-08-17 22:08:21 +02:00
Peter Eisentraut 3ebdd21b79 Remove obsolete comment
The sequence name is no longer stored in the sequence relation, since
1753b1b027.
2018-08-13 21:07:31 +02:00
Michael Paquier f841ceb26d Improve TRUNCATE by avoiding early lock queue
A caller of TRUNCATE could previously queue for an access exclusive lock
on a relation it may not have permission to truncate, potentially
interfering with users authorized to work on it.  This can be very
intrusive depending on the lock attempted to be taken.  For example,
pg_authid could be blocked, preventing any authentication attempt to
happen on a PostgreSQL instance.

This commit fixes the case of TRUNCATE so as RangeVarGetRelidExtended is
used with a callback doing the necessary ACL checks at an earlier stage,
avoiding lock queuing issues, so as an immediate failure happens for
unprivileged users instead of waiting on a lock that would not be
taken.

This is rather similar to the type of work done in cbe24a6 for CLUSTER,
and the code of TRUNCATE is this time refactored so as there is no
user-facing changes.  As the commit for CLUSTER, no back-patch is done.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20180806165816.GA19883@paquier.xyz
2018-08-10 18:26:59 +02:00
Michael Paquier 661dd23950 Restrict access to reindex of shared catalogs for non-privileged users
A database owner running a database-level REINDEX has the possibility to
also do the operation on shared system catalogs without being an owner
of them, which allows him to block resources it should not have access
to.  The same goes for a schema owner.  For example, PostgreSQL would go
unresponsive and even block authentication if a lock is waited for
pg_authid.  This commit makes sure that a user running a REINDEX SYSTEM,
DATABASE or SCHEMA only works on the following relations:
- The user is a superuser
- The user is the table owner
- The user is the database/schema owner, only if the relation worked on
is not shared.

Robert has worded most the documentation changes, and I have coded the
core part.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier, Robert Haas
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20180805211059.GA2185@paquier.xyz
Backpatch-through: 11- as the current behavior has been around for a
very long time and could be disruptive for already released branches.
2018-08-09 09:40:15 +02:00
Tom Lane 9b7c56d6cb Don't record FDW user mappings as members of extensions.
CreateUserMapping has a recordDependencyOnCurrentExtension call that's
been there since extensions were introduced (very possibly my fault).
However, there's no support anywhere else for user mappings as members
of extensions, nor are they listed as a possible member object type in
the documentation.  Nor does it really seem like a good idea for user
mappings to belong to extensions when roles don't.  Hence, remove the
bogus call.

(As we saw in bug #15310, the lack of any pg_dump support for this case
ensures that any such membership record would silently disappear during
pg_upgrade.  So there's probably no need for us to do anything else
about cleaning up after this mistake.)

Discussion: https://postgr.es/m/27952.1533667213@sss.pgh.pa.us
2018-08-07 16:32:50 -04:00
Peter Eisentraut 0d5f05cde0 Allow multi-inserts during COPY into a partitioned table
CopyFrom allows multi-inserts to be used for non-partitioned tables, but
this was disabled for partitioned tables.  The reason for this appeared
to be that the tuple may not belong to the same partition as the
previous tuple did.  Not allowing multi-inserts here greatly slowed down
imports into partitioned tables.  These could take twice as long as a
copy to an equivalent non-partitioned table.  It seems wise to do
something about this, so this change allows the multi-inserts by
flushing the so-far inserted tuples to the partition when the next tuple
does not belong to the same partition, or when the buffer fills.  This
improves performance when the next tuple in the stream commonly belongs
to the same partition as the previous tuple.

In cases where the target partition changes on every tuple, using
multi-inserts slightly slows the performance.  To get around this we
track the average size of the batches that have been inserted and
adaptively enable or disable multi-inserts based on the size of the
batch.  Some testing was done and the regression only seems to exist
when the average size of the insert batch is close to 1, so let's just
enable multi-inserts when the average size is at least 1.3.  More
performance testing might reveal a better number for, this, but since
the slowdown was only 1-2% it does not seem critical enough to spend too
much time calculating it.  In any case it may depend on other factors
rather than just the size of the batch.

Allowing multi-inserts for partitions required a bit of work around the
per-tuple memory contexts as we must flush the tuples when the next
tuple does not belong the same partition.  In which case there is no
good time to reset the per-tuple context, as we've already built the new
tuple by this time.  In order to work around this we maintain two
per-tuple contexts and just switch between them every time the partition
changes and reset the old one.  This does mean that the first of each
batch of tuples is not allocated in the same memory context as the
others, but that does not matter since we only reset the context once
the previous batch has been inserted.

Author: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
2018-08-01 10:23:09 +02:00
Michael Paquier 9f7ba88aa4 Fix two oversights from 9ebe0572 which refactored cluster_rel
The recheck option became a no-op as ClusterOption failed to set proper
values for each element.  There was a second code path where local
options got overwritten.

Both issues have been spotted by Coverity.
2018-07-29 22:00:42 +09:00
Michael Paquier 9ebe0572ce Refactor cluster_rel() to handle more options
This extends cluster_rel() in such a way that more options can be added
in the future, which will reduce the amount of chunk code for an
upcoming SKIP_LOCKED aimed for VACUUM.  As VACUUM FULL is a different
flavor of CLUSTER, we want to make that extensible to ease integration.

This only reworks the API and its callers, without providing anything
user-facing.  Two options are present now: verbose mode and relation
recheck when doing the cluster command work across multiple
transactions.  This could be used as well as a base to extend the
grammar of CLUSTER later on.

Author: Michael Paquier
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/20180723031058.GE2854@paquier.xyz
2018-07-24 11:37:32 +09:00
Tom Lane 2409716755 Remove undocumented restriction against duplicate partition key columns.
transformPartitionSpec rejected duplicate simple partition columns
(e.g., "PARTITION BY RANGE (x,x)") but paid no attention to expression
columns, resulting in inconsistent behavior.  Worse, cases like
"PARTITION BY RANGE (x,(x))") were accepted but would then result in
dump/reload failures, since the expression (x) would get simplified
to a plain column later.

There seems no better reason for this restriction than there was for
the one against duplicate included index columns (cf commit 701fd0bbc),
so let's just remove it.

Back-patch to v10 where this code was added.

Report and patch by Yugo Nagata.

Discussion: https://postgr.es/m/20180712165939.36b12aff.nagata@sraoss.co.jp
2018-07-19 15:41:46 -04:00
Tom Lane 701fd0bbc9 Drop the rule against included index columns duplicating key columns.
The initial version of the included-index-column feature stated that
included columns couldn't be the same as any key column of the index.
While it'd be pretty silly to do that, since the included column would be
entirely redundant, we've never prohibited redundant index columns before
so it's not very consistent to do so here.  Moreover, the prohibition
was itself badly implemented, so that it failed to reject columns that
were effectively identical but not spelled quite alike, as reported by
Aditya Toshniwal.

(Moreover, it's not hard to imagine that for some non-btree index types,
such cases would be non-silly anyhow: the index might use a lossy
representation for key columns but be able to support retrieval of the
original form of included columns.)

Hence, let's just drop the prohibition.

In passing, do some copy-editing on the documentation for the
included-column feature.

Yugo Nagata; documentation and test corrections by me

Discussion: https://postgr.es/m/CAM9w-_mhBCys4fQNfaiQKTRrVWtoFrZ-wXmDuE9Nj5y-Y7aDKQ@mail.gmail.com
2018-07-18 14:43:03 -04:00
Heikki Linnakangas 6b387179ba Fix misc typos, mostly in comments.
A collection of typos I happened to spot while reading code, as well as
grepping for common mistakes.

Backpatch to all supported versions, as applicable, to avoid conflicts
when backporting other commits in the future.
2018-07-18 16:17:32 +03:00
Alvaro Herrera cb9db2ab06 Fix ALTER TABLE...SET STATS error message for included columns
The existing error message was complaining that the column is not an
expression, which is not correct.  Introduce a suitable wording
variation and a test.

Co-authored-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/20180628182803.e4632d5a.nagata@sraoss.co.jp
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
2018-07-16 20:00:39 -04:00