Commit Graph

3345 Commits

Author SHA1 Message Date
Andrew Dunstan b6fb534f10 Add IF NOT EXISTS for CREATE SERVER and CREATE USER MAPPING
There is still some inconsistency with the error messages surrounding
foreign servers. Some use the word "foreign" and some don't. My
inclination is to remove all such uses of "foreign" on the basis that
the  CREATE/ALTER/DROP SERVER commands don't use the word. However, that
is left for another day. In this patch I have kept to the existing usage
in the affected commands, which omits "foreign".

Anastasia Lubennikova, reviewed by Arthur Zakirov and Ashtosh Bapat.

Discussion: http://postgr.es/m/7c2ab9b8-388a-1ce0-23a3-7acf2a0ed3c6@postgrespro.ru
2017-03-20 16:40:45 -04:00
Andrew Dunstan 839cb0649a Use a consistent error message style for user mappings.
User mappings are essentially anonymous, so messages referring to "user
mapping foo on server bar" are wrong, and inconsistent with other error
messages referring to user mappings. To be consistent with existing use,
use "user mapping for foo on server bar" instead.

I dropped the noise word "user" from the original suggestion to be
consistent with other uses.

Discussion: http://postgr.es/m/56c6f8ab-b2d6-f1fa-deb0-1d18cf67f7b9@2ndQuadrant.com
2017-03-20 16:01:45 -04:00
Tom Lane e3044f6184 Avoid use of already-closed relcache entry.
Oversight in commit 17f8ffa1e.  Per buildfarm member prion.
2017-03-18 18:43:06 -04:00
Tom Lane 17f8ffa1e3 Fix REFRESH MATERIALIZED VIEW to report activity to the stats collector.
The non-concurrent code path for REFRESH MATERIALIZED VIEW failed to
report its updates to the stats collector.  This is bad since it means
auto-analyze doesn't know there's any work to be done.  Adjust it to
report the refresh as a table truncate followed by insertion of an
appropriate number of rows.

Since a matview could contain more than INT_MAX rows, change the
signature of pgstat_count_heap_insert() to accept an int64 rowcount.
(The accumulator it's adding into is already int64, but existing
callers could not insert more than a small number of rows at once,
so the argument had been declared just "int n".)

This is surely a bug fix, but changing pgstat_count_heap_insert()'s API
seems too risky for the back branches.  Given the lack of previous
complaints, I'm not sure it's a big enough problem to justify a kluge
solution that would avoid that.  So, no back-patch, at least for now.

Jim Mlodgenski, adjusted a bit by me

Discussion: https://postgr.es/m/CAB_5SRchSz7-WmdO5szdiknG8Oj_GGqJytrk1KRd11yhcMs1KQ@mail.gmail.com
2017-03-18 17:49:39 -04:00
Robert Haas 88e66d193f Rename "pg_clog" directory to "pg_xact".
Names containing the letters "log" sometimes confuse users into
believing that only non-critical data is present.  It is hoped
this renaming will discourage ill-considered removals of transaction
status data.

Michael Paquier

Discussion: http://postgr.es/m/CA+Tgmoa9xFQyjRZupbdEFuwUerFTvC6HjZq1ud6GYragGDFFgA@mail.gmail.com
2017-03-17 09:48:38 -04:00
Andrew Gierth 1914c5ea7d Avoid having vacuum set reltuples to 0 on non-empty relations in the
presence of page pins, which leads to serious estimation errors in the
planner.  This particularly affects small heavily-accessed tables,
especially where locking (e.g. from FK constraints) forces frequent
vacuums for mxid cleanup.

Fix by keeping separate track of pages whose live tuples were actually
counted vs. pages that were only scanned for freezing purposes.  Thus,
reltuples can only be set to 0 if all pages of the relation were
actually counted.

Backpatch to all supported versions.

Per bug #14057 from Nicolas Baccelli, analyzed by me.

Discussion: https://postgr.es/m/20160331103739.8956.94469@wrigleys.postgresql.org
2017-03-16 22:28:03 +00:00
Peter Eisentraut eb4da3e380 Add option to control snapshot export to CREATE_REPLICATION_SLOT
We used to export snapshots unconditionally in CREATE_REPLICATION_SLOT
in the replication protocol, but several upcoming patches want more
control over what happens.

Suppress snapshot export in pg_recvlogical, which neither needs nor can
use the exported snapshot.  Since snapshot exporting can fail this
improves reliability.

This also paves the way for allowing the creation of replication slots
on standbys, which cannot export snapshots because they cannot allocate
new XIDs.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-03-14 17:34:22 -04:00
Robert Haas c11453ce0a hash: Add write-ahead logging support.
The warning about hash indexes not being write-ahead logged and their
use being discouraged has been removed.  "snapshot too old" is now
supported for tables with hash indexes.  Most importantly, barring
bugs, hash indexes will now be crash-safe and usable on standbys.

This commit doesn't yet add WAL consistency checking for hash
indexes, as we now have for other index types; a separate patch has
been submitted to cure that lack.

Amit Kapila, reviewed and slightly modified by me.  The larger patch
series of which this is a part has been reviewed and tested by Álvaro
Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper
Pedersen.

Discussion: http://postgr.es/m/CAA4eK1JOBX=YU33631Qh-XivYXtPSALh514+jR8XeD7v+K3r_Q@mail.gmail.com
2017-03-14 13:27:02 -04:00
Peter Eisentraut f97a028d8e Spelling fixes in code comments
From: Josh Soref <jsoref@gmail.com>
2017-03-14 12:58:39 -04:00
Tom Lane 5ed6fff6b7 Make logging about multixact wraparound protection less chatty.
The original messaging design, introduced in commit 068cfadf9, seems too
chatty now that some time has elapsed since the bug fix; most installations
will be in good shape and don't really need a reminder about this on every
postmaster start.

Hence, arrange to suppress the "wraparound protections are now enabled"
message during startup (specifically, during the TrimMultiXact() call).
The message will still appear if protection becomes effective at some
later point.

Discussion: https://postgr.es/m/17211.1489189214@sss.pgh.pa.us
2017-03-14 12:47:53 -04:00
Noah Misch 3a0d473192 Use wrappers of PG_DETOAST_DATUM_PACKED() more.
This makes almost all core code follow the policy introduced in the
previous commit.  Specific decisions:

- Text search support functions with char* and length arguments, such as
  prsstart and lexize, may receive unaligned strings.  I doubt
  maintainers of non-core text search code will notice.

- Use plain VARDATA() on values detoasted or synthesized earlier in the
  same function.  Use VARDATA_ANY() on varlenas sourced outside the
  function, even if they happen to always have four-byte headers.  As an
  exception, retain the universal practice of using VARDATA() on return
  values of SendFunctionCall().

- Retain PG_GETARG_BYTEA_P() in pageinspect.  (Page images are too large
  for a one-byte header, so this misses no optimization.)  Sites that do
  not call get_page_from_raw() typically need the four-byte alignment.

- For now, do not change btree_gist.  Its use of four-byte headers in
  memory is partly entangled with storage of 4-byte headers inside
  GBT_VARKEY, on disk.

- For now, do not change gtrgm_consistent() or gtrgm_distance().  They
  incorporate the varlena header into a cache, and there are multiple
  credible implementation strategies to consider.
2017-03-12 19:35:34 -04:00
Peter Eisentraut cd603a4d6b Use SQL standard error code for nextval 2017-03-09 10:56:44 -05:00
Robert Haas 355d3993c5 Add a Gather Merge executor node.
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers.  (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)

This works out to a win if it saves us from having to perform an
expensive Sort.  In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't.  But if this avoids an expensive sort then it's a win.

Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.

Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
2017-03-09 07:49:29 -05:00
Stephen Frost f9b1a0dd40 Expose explain's SUMMARY option
This exposes the existing explain summary option to users to allow them
to choose if they wish to have the planning time and totalled run time
included in the EXPLAIN result.  The existing default behavior is
retained if SUMMARY is not specified- running explain without analyze
will not print the summary lines (just the planning time, currently)
while running explain with analyze will include the summary lines (both
the planning time and the totalled execution time).

Users who wish to see the summary information for plain explain can now
use: EXPLAIN (SUMMARY ON) query;  Users who do not want to have the
summary printed for an analyze run can use:
EXPLAIN (ANALYZE ON, SUMMARY OFF) query;

With this, we can now also have EXPLAIN ANALYZE queries included in our
regression tests by using:
EXPLAIN (ANALYZE ON, TIMING OFF, SUMMARY off) query;

I went ahead and added an example of this, which will hopefully not make
the buildfarm complain.

Author: Ashutosh Bapat
Discussion: https://postgr.es/m/CAFjFpReE5z2h98U2Vuia8hcEkpRRwrauRjHmyE44hNv8-xk+XA@mail.gmail.com
2017-03-08 15:14:03 -05:00
Fujii Masao 4eafdcc276 Prevent logical rep workers with removed subscriptions from starting.
Any logical rep workers must have their subscription entries in
pg_subscription. To ensure this, we need to prevent the launcher
from starting new worker corresponding to the subscription that
DROP SUBSCRIPTION command is removing. To implement this,
previously LogicalRepLauncherLock was introduced and held until
the end of transaction running DROP SUBSCRIPTION. But using
LWLock for that purpose was not valid.

Instead, this commit changes DROP SUBSCRIPTION so that it takes
AccessExclusiveLock on pg_subscription, in order to ensure that
the launcher cannot see any subscriptions being removed. Also this
commit gets rid of LogicalRepLauncherLock.

Patch by me, reviewed by Petr Jelinek

Discussion: https://www.postgresql.org/message-id/CAHGQGwHPi8ky-yANFfe0sgmhKtsYcQLTnKx07bW9S7-Rn1746w@mail.gmail.com
2017-03-09 01:44:23 +09:00
Alvaro Herrera fcec6caafa Support XMLTABLE query expression
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.

This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE.  (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).

The implemented syntax deviates slightly from what the standard
requires.  First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only.  Second, we don't
currently support a default namespace to be specified.

This implementation relies on a new executor node based on a hardcoded
method table.  (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)

Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
2017-03-08 12:40:26 -03:00
Fujii Masao 77d21970ae Fix connection leak in DROP SUBSCRIPTION command, take 2.
Commit 898a792eb8 fixed the connection
leak issue, but it was an unreliable way of bugfix. This bugfix was
assuming that walrcv_command() subroutine cannot throw an error,
but it's untenable assumption. For example, if it will be changed
so that an error is thrown, connection leak issue will happen again.

This patch ensures that the connection is closed even when
walrcv_command() subroutine throws an error.

Patch by me, reviewed by Petr Jelinek and Michael Paquier

Discussion: https://www.postgresql.org/message-id/2058.1487704345@sss.pgh.pa.us
2017-03-08 23:43:38 +09:00
Robert Haas d88d06cd07 Fix relcache reference leak.
Reported by Kevin Grittner.  Faulty commit identified by Tom Lane.
Patch by Amit Langote, reviewed by Michael Paquier.

Discussion: http://postgr.es/m/CACjxUsOHbH1=99u8mGxmLHfy5hov4ENEpvM6=3ARjos7wG7rtQ@mail.gmail.com
2017-03-07 11:27:21 -05:00
Heikki Linnakangas 818fd4a67d Support SCRAM-SHA-256 authentication (RFC 5802 and 7677).
This introduces a new generic SASL authentication method, similar to the
GSS and SSPI methods. The server first tells the client which SASL
authentication mechanism to use, and then the mechanism-specific SASL
messages are exchanged in AuthenticationSASLcontinue and PasswordMessage
messages. Only SCRAM-SHA-256 is supported at the moment, but this allows
adding more SASL mechanisms in the future, without changing the overall
protocol.

Support for channel binding, aka SCRAM-SHA-256-PLUS is left for later.

The SASLPrep algorithm, for pre-processing the password, is not yet
implemented. That could cause trouble, if you use a password with
non-ASCII characters, and a client library that does implement SASLprep.
That will hopefully be added later.

Authorization identities, as specified in the SCRAM-SHA-256 specification,
are ignored. SET SESSION AUTHORIZATION provides more or less the same
functionality, anyway.

If a user doesn't exist, perform a "mock" authentication, by constructing
an authentic-looking challenge on the fly. The challenge is derived from
a new system-wide random value, "mock authentication nonce", which is
created at initdb, and stored in the control file. We go through these
motions, in order to not give away the information on whether the user
exists, to unauthenticated users.

Bumps PG_CONTROL_VERSION, because of the new field in control file.

Patch by Michael Paquier and Heikki Linnakangas, reviewed at different
stages by Robert Haas, Stephen Frost, David Steele, Aleksander Alekseev,
and many others.

Discussion: https://www.postgresql.org/message-id/CAB7nPqRbR3GmFYdedCAhzukfKrgBLTLtMvENOmPrVWREsZkF8g%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAB7nPqSMXU35g%3DW9X74HVeQp0uvgJxvYOuA4A-A3M%2B0wfEBv-w%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/55192AFE.6080106@iki.fi
2017-03-07 14:25:40 +02:00
Tom Lane a8df75b0a4 Avoid dangling pointer to relation name in RLS code path in DoCopy().
With RLS active, "COPY tab TO ..." failed under -DRELCACHE_FORCE_RELEASE,
and would sometimes fail without that, because it used the relation name
directly from the relcache as part of the parsetree it's building.  That
becomes a potentially-dangling pointer as soon as the relcache entry is
closed, a bit further down.  Typical symptom if the relcache entry chanced
to get cleared would be "relation does not exist" error with a garbage
relation name, or possibly a core dump; but if you were really truly
unlucky, the COPY might copy from the wrong table.

Per report from Andrew Dunstan that regression tests fail with
-DRELCACHE_FORCE_RELEASE.  The core tests now pass for me (but have
not tried "make check-world" yet).

Discussion: https://postgr.es/m/7b52f900-0579-cda9-ae2e-de5da17090e6@2ndQuadrant.com
2017-03-06 16:50:47 -05:00
Peter Eisentraut 2ca64c6f71 Replace LookupFuncNameTypeNames() with LookupFuncWithArgs()
The old function took function name and function argument list as
separate arguments.  Now that all function signatures are passed around
as ObjectWithArgs structs, this is no longer necessary and can be
replaced by a function that takes ObjectWithArgs directly.  Similarly
for aggregates and operators.

Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-03-06 13:31:47 -05:00
Peter Eisentraut 8b6d6cf853 Remove objname/objargs split for referring to objects
In simpler times, it might have worked to refer to all kinds of objects
by a list of name components and an optional argument list.  But this
doesn't work for all objects, which has resulted in a collection of
hacks to place various other nodes types into these fields, which have
to be unpacked at the other end.  This makes it also weird to represent
lists of such things in the grammar, because they would have to be lists
of singleton lists, to make the unpacking work consistently.  The other
problem is that keeping separate name and args fields makes it awkward
to deal with lists of functions.

Change that by dropping the objargs field and have objname, renamed to
object, be a generic Node, which can then be flexibly assigned and
managed using the normal Node mechanisms.  In many cases it will still
be a List of names, in some cases it will be a string Value, for types
it will be the existing Typename, for functions it will now use the
existing ObjectWithArgs node type.  Some of the more obscure object
types still use somewhat arbitrary nested lists.

Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-03-06 13:31:47 -05:00
Peter Eisentraut 550214a4ef Add operator_with_argtypes grammar rule
This makes the handling of operators similar to that of functions and
aggregates.

Rename node FuncWithArgs to ObjectWithArgs, to reflect the expanded use.

Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-03-06 13:31:47 -05:00
Peter Eisentraut 63ebd377a6 Use class_args field in opclass_drop
This makes it consistent with the usage in opclass_item.

Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-03-06 13:31:47 -05:00
Simon Riggs 8b4d582d27 Allow partitioned tables to be dropped without CASCADE
Record partitioned table dependencies as DEPENDENCY_AUTO
rather than DEPENDENCY_NORMAL, so that DROP TABLE just works.

Remove all the tests for partitioned tables where earlier
work had deliberately avoided using CASCADE.

Amit Langote, reviewed by Ashutosh Bapat and myself
2017-03-06 15:50:53 +05:30
Tom Lane dbca84f04e In rebuild_relation(), don't access an already-closed relcache entry.
This reliably fails with -DRELCACHE_FORCE_RELEASE, as reported by
Andrew Dunstan, and could sometimes fail in normal operation, resulting
in a wrong persistence value being used for the transient table.
It's not immediately clear to me what effects that might have beyond
the risk of a crash while accessing OldHeap->rd_rel->relpersistence,
but it's probably not good.

Bug introduced by commit f41872d0c, and made substantially worse by
commit 85b506bbf, which added a second such access significantly
later than the heap_close.  I doubt the first reference could fail
in a production scenario, but the second one definitely could.

Discussion: https://postgr.es/m/7b52f900-0579-cda9-ae2e-de5da17090e6@2ndQuadrant.com
2017-03-04 16:09:33 -05:00
Peter Eisentraut 272adf4f9c Disallow CREATE/DROP SUBSCRIPTION in transaction block
Disallow CREATE SUBSCRIPTION and DROP SUBSCRIPTION in a transaction
block when the replication slot is to be created or dropped, since that
cannot be rolled back.

based on patch by Masahiko Sawada <sawada.mshk@gmail.com>
2017-03-03 23:29:13 -05:00
Peter Eisentraut 2357c12b49 Fix typo 2017-03-03 18:21:06 -05:00
Peter Eisentraut 6da9759a03 Add RENAME support for PUBLICATIONs and SUBSCRIPTIONs
From: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-03-03 10:47:04 -05:00
Simon Riggs 9eb344faf5 Allow vacuums to report oldestxmin
Allow VACUUM and Autovacuum to report the oldestxmin value they
used while cleaning tables, helping to make better sense out of
the other statistics we report in various cases.
2017-03-03 19:18:25 +05:30
Robert Haas 3c3bb99330 Don't uselessly rewrite, truncate, VACUUM, or ANALYZE partitioned tables.
Also, recursively perform VACUUM and ANALYZE on partitions when the
command is applied to a partitioned table.  In passing, some related
documentation updates.

Amit Langote, reviewed by Michael Paquier, Ashutosh Bapat, and by me.

Discussion: http://postgr.es/m/47288cf1-f72c-dfc2-5ff0-4af962ae5c1b@lab.ntt.co.jp
2017-03-02 17:23:44 +05:30
Tom Lane 9e3755ecb2 Remove useless duplicate inclusions of system header files.
c.h #includes a number of core libc header files, such as <stdio.h>.
There's no point in re-including these after having read postgres.h,
postgres_fe.h, or c.h; so remove code that did so.

While at it, also fix some places that were ignoring our standard pattern
of "include postgres[_fe].h, then system header files, then other Postgres
header files".  While there's not any great magic in doing it that way
rather than system headers last, it's silly to have just a few files
deviating from the general pattern.  (But I didn't attempt to enforce this
globally, only in files I was touching anyway.)

I'd be the first to say that this is mostly compulsive neatnik-ism,
but over time it might save enough compile cycles to be useful.
2017-02-25 16:12:55 -05:00
Tom Lane c29aff959d Consistently declare timestamp variables as TimestampTz.
Twiddle the replication-related code so that its timestamp variables
are declared TimestampTz, rather than the uninformative "int64" that
was previously used for meant-to-be-always-integer timestamps.
This resolves the int64-vs-TimestampTz declaration inconsistencies
introduced by commit 7c030783a, though in the opposite direction to
what was originally suggested.

This required including datatype/timestamp.h in a couple more places
than before.  I decided it would be a good idea to slim down that
header by not having it pull in <float.h> etc, as those headers are
no longer at all relevant to its purpose.  Unsurprisingly, a small number
of .c files turn out to have been depending on those inclusions, so add
them back in the .c files as needed.

Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
Discussion: https://postgr.es/m/27694.1487456324@sss.pgh.pa.us
2017-02-23 15:57:08 -05:00
Tom Lane b9d092c962 Remove now-dead code for !HAVE_INT64_TIMESTAMP.
This is a basically mechanical removal of #ifdef HAVE_INT64_TIMESTAMP
tests and the negative-case controlled code.

Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
2017-02-23 14:04:43 -05:00
Peter Eisentraut 74321d87fb Fix whitespace 2017-02-21 15:44:07 -05:00
Fujii Masao 898a792eb8 Fix connection leak in DROP SUBSCRIPTION command.
Previously the command forgot to close the connection to the publisher
when it failed to drop the replication slot.
2017-02-22 03:36:02 +09:00
Peter Eisentraut 38d103763d Make more use of castNode() 2017-02-21 11:59:09 -05:00
Robert Haas a3dc8e495b Make partitions automatically inherit OIDs.
Previously, if the parent was specified as WITH OIDS, each child
also had to be explicitly specified as WITH OIDS.

Amit Langote, per a report from Simon Riggs.  Some additional
work on the documentation changes by me.

Discussion: http://postgr.es/m/CANP8+jJBpWocfKrbJcaf3iBt9E3U=WPE_NC8YE6rye+YJ1sYnQ@mail.gmail.com
2017-02-19 21:29:27 +05:30
Peter Eisentraut 6d16ecc646 Add CREATE COLLATION IF NOT EXISTS clause
The core of the functionality was already implemented when
pg_import_system_collations was added.  This just exposes it as an
option in the SQL command.
2017-02-15 10:01:28 -05:00
Robert Haas e28b115612 Don't disallow dropping NOT NULL for a list partition key.
Range partitioning doesn't support nulls in the partitioning columns,
but list partitioning does.

Amit Langote, per a complaint from Amul Sul
2017-02-14 12:13:41 -05:00
Noah Misch f30f34e589 Ignore tablespace ACLs when ignoring schema ACLs.
The ALTER TABLE ALTER TYPE implementation can issue DROP INDEX and
CREATE INDEX to refit existing indexes for the new column type.  Since
this CREATE INDEX is an implementation detail of an index alteration,
the ensuing DefineIndex() should skip ACL checks specific to index
creation.  It already skips the namespace ACL check.  Make it skip the
tablespace ACL check, too.  Back-patch to 9.2 (all supported versions).

Reviewed by Tom Lane.
2017-02-12 16:03:41 -05:00
Peter Eisentraut 2ea5b06c7a Add CREATE SEQUENCE AS <data type> clause
This stores a data type, required to be an integer type, with the
sequence.  The sequences min and max values default to the range
supported by the type, and they cannot be set to values exceeding that
range.  The internal implementation of the sequence is not affected.

Change the serial types to create sequences of the appropriate type.
This makes sure that the min and max values of the sequence for a serial
column match the range of values supported by the table column.  So the
sequence can no longer overflow the table column.

This also makes monitoring for sequence exhaustion/wraparound easier,
which currently requires various contortions to cross-reference the
sequences with the table columns they are used with.

This commit also effectively reverts the pg_sequence column reordering
in f3b421da5f, because the new seqtypid
column allows us to fill the hole in the struct and create a more
natural overall column ordering.

Reviewed-by: Steve Singer <steve@ssinger.info>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-02-10 15:34:35 -05:00
Tom Lane 86d911ec0f Allow index AMs to cache data across aminsert calls within a SQL command.
It's always been possible for index AMs to cache data across successive
amgettuple calls within a single SQL command: the IndexScanDesc.opaque
field is meant for precisely that.  However, no comparable facility
exists for amortizing setup work across successive aminsert calls.
This patch adds such a feature and teaches GIN, GIST, and BRIN to use it
to amortize catalog lookups they'd previously been doing on every call.
(The other standard index AMs keep everything they need in the relcache,
so there's little to improve there.)

For GIN, the overall improvement in a statement that inserts many rows
can be as much as 10%, though it seems a bit less for the other two.
In addition, this makes a really significant difference in runtime
for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated
catalog lookups are vastly more expensive.

The reason this has been hard up to now is that the aminsert function is
not passed any useful place to cache per-statement data.  What I chose to
do is to add suitable fields to struct IndexInfo and pass that to aminsert.
That's not widening the index AM API very much because IndexInfo is already
within the ken of ambuild; in fact, by passing the same info to aminsert
as to ambuild, this is really removing an inconsistency in the AM API.

Discussion: https://postgr.es/m/27568.1486508680@sss.pgh.pa.us
2017-02-09 11:52:12 -05:00
Robert Haas a507b86900 Add WAL consistency checking facility.
When the new GUC wal_consistency_checking is set to a non-empty value,
it triggers recording of additional full-page images, which are
compared on the standby against the results of applying the WAL record
(without regard to those full-page images).  Allowable differences
such as hints are masked out, and the resulting pages are compared;
any difference results in a FATAL error on the standby.

Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki
Linnakangas.  Extensively reviewed and revised by Michael Paquier and
by me, with additional reviews and comments from Amit Kapila, Álvaro
Herrera, Simon Riggs, and Peter Eisentraut.
2017-02-08 15:45:30 -05:00
Heikki Linnakangas 181bdb90ba Fix typos in comments.
Backpatch to all supported versions, where applicable, to make backpatching
of future fixes go more smoothly.

Josh Soref

Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
2017-02-06 11:33:58 +02:00
Fujii Masao 39b8cc991f Be sure to release LogicalRepLauncherLock in DROP SUBSCRIPTION command.
Previously DROP SUBSCRIPTION command forgot to release the lock at all.

Original patches by Kyotaro Horiguchi and Michael Paquier,
but I didn't use them.
Discussion: http://postgr.es/m/20170201.173623.66249355.horiguchi.kyotaro@lab.ntt.co.jp
2017-02-04 03:18:13 +09:00
Tom Lane aedd554f84 Fix CatalogTupleInsert/Update abstraction for case of shared indstate.
Add CatalogTupleInsertWithInfo and CatalogTupleUpdateWithInfo to let
callers use the CatalogTupleXXX abstraction layer even in cases where
we want to share the results of CatalogOpenIndexes across multiple
inserts/updates for efficiency.  This finishes the job begun in commit
2f5c9d9c9, by allowing some remaining simple_heap_insert/update
calls to be replaced.  The abstraction layer is now complete enough
that we don't have to export CatalogIndexInsert at all anymore.

Also, this fixes several places in which 2f5c9d9c9 introduced performance
regressions by using retail CatalogTupleInsert or CatalogTupleUpdate even
though the previous coding had been able to amortize CatalogOpenIndexes
work across multiple tuples.

A possible future improvement is to arrange for the indexing.c functions
to cache the CatalogIndexState somewhere, maybe in the relcache, in which
case we could get rid of CatalogTupleInsertWithInfo and
CatalogTupleUpdateWithInfo again.  But that's a task for another day.

Discussion: https://postgr.es/m/27502.1485981379@sss.pgh.pa.us
2017-02-01 17:18:36 -05:00
Tom Lane ab02896510 Provide CatalogTupleDelete() as a wrapper around simple_heap_delete().
This extends the work done in commit 2f5c9d9c9 to provide a more nearly
complete abstraction layer hiding the details of index updating for catalog
changes.  That commit only invented abstractions for catalog inserts and
updates, leaving nearby code for catalog deletes still calling the
heap-level routines directly.  That seems rather ugly from here, and it
does little to help if we ever want to shift to a storage system in which
indexing work is needed at delete time.

Hence, create a wrapper function CatalogTupleDelete(), and replace calls
of simple_heap_delete() on catalog tuples with it.  There are now very
few direct calls of [simple_]heap_delete remaining in the tree.

Discussion: https://postgr.es/m/462.1485902736@sss.pgh.pa.us
2017-02-01 16:13:30 -05:00
Heikki Linnakangas dbd69118c0 Replace isMD5() with a more future-proof way to check if pw is encrypted.
The rule is that if pg_authid.rolpassword begins with "md5" and has the
right length, it's an MD5 hash, otherwise it's a plaintext password. The
idiom has been to use isMD5() to check for that, but that gets awkward,
when we add new kinds of verifiers, like the verifiers for SCRAM
authentication in the pending SCRAM patch set. Replace isMD5() with a new
get_password_type() function, so that when new verifier types are added, we
don't need to remember to modify every place that currently calls isMD5(),
to also recognize the new kinds of verifiers.

Also, use the new plain_crypt_verify function in passwordcheck, so that it
doesn't need to know about MD5, or in the future, about other kinds of
hashes or password verifiers.

Reviewed by Michael Paquier and Peter Eisentraut.

Discussion: https://www.postgresql.org/message-id/2d07165c-1793-e243-a2a9-e45b624c7580@iki.fi
2017-02-01 13:11:37 +02:00
Alvaro Herrera 2f5c9d9c9c Tweak catalog indexing abstraction for upcoming WARM
Split the existing CatalogUpdateIndexes into two different routines,
CatalogTupleInsert and CatalogTupleUpdate, which do both the heap
insert/update plus the index update.  This removes over 300 lines of
boilerplate code all over src/backend/catalog/ and src/backend/commands.
The resulting code is much more pleasing to the eye.

Also, by encapsulating what happens in detail during an UPDATE, this
facilitates the upcoming WARM patch, which is going to add a few more
lines to the update case making the boilerplate even more boring.

The original CatalogUpdateIndexes is removed; there was only one use
left, and since it's just three lines, we can as well expand it in place
there.  We could keep it, but WARM is going to break all the UPDATE
out-of-core callsites anyway, so there seems to be no benefit in doing
so.

Author: Pavan Deolasee
Discussion: https://www.postgr.es/m/CABOikdOcFYSZ4vA2gYfs=M2cdXzXX4qGHeEiW3fu9PCfkHLa2A@mail.gmail.com
2017-01-31 18:42:24 -03:00
Stephen Frost e54f75722c Handle ALTER EXTENSION ADD/DROP with pg_init_privs
In commit 6c268df, pg_init_privs was added to track the initial
privileges of catalog objects and extensions.  Unfortunately, that
commit didn't include understanding of ALTER EXTENSION ADD/DROP, which
allows the objects associated with an extension to be changed after the
initial CREATE EXTENSION script has been run.

The result of this meant that ACLs for objects added through
ALTER EXTENSION ADD were not recorded into pg_init_privs and we would
end up including those ACLs in pg_dump when we shouldn't have.

This commit corrects that by making sure to have pg_init_privs updated
when ALTER EXTENSION ADD/DROP is run, recording the permissions as they
are at ALTER EXTENSION ADD time, and removing any if/when ALTER
EXTENSION DROP is called.

This issue was pointed out by Moshe Jacobson as commentary on bug #14456
(which was actually a bug about versions prior to 9.6 not handling
custom ACLs on extensions correctly, an issue now addressed with
pg_init_privs in 9.6).

Back-patch to 9.6 where pg_init_privs was introduced.
2017-01-29 23:05:07 -05:00
Tom Lane 7afd56c3c6 Use castNode() in a bunch of statement-list-related code.
When I wrote commit ab1f0c822, I really missed the castNode() macro that
Peter E. had proposed shortly before.  This back-fills the uses I would
have put it to.  It's probably not all that significant, but there are
more assertions here than there were before, and conceivably they will
help catch any bugs associated with those representation changes.

I left behind a number of usages like "(Query *) copyObject(query_var)".
Those could have been converted as well, but Peter has proposed another
notational improvement that would handle copyObject cases automatically,
so I let that be for now.
2017-01-26 22:09:34 -05:00
Andres Freund 9ba8a9ce45 Use the new castNode() macro in a number of places.
This is far from a pervasive conversion, but it's a good starting
point.

Author: Peter Eisentraut, with some minor changes by me
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/c5d387d9-3440-f5e0-f9d4-71d53b9fbe52@2ndquadrant.com
2017-01-26 16:47:03 -08:00
Peter Eisentraut 3d9e73ea5f Update copyright years in some recently added files 2017-01-25 12:32:05 -05:00
Peter Eisentraut 65df150a18 Close replication connection when slot creation errors
From: Petr Jelinek <pjmodos@pjmodos.net>
2017-01-25 10:47:53 -05:00
Robert Haas 27cdb3414b Reindent table partitioning code.
We've accumulated quite a bit of stuff with which pgindent is not
quite happy in this code; clean it up to provide a less-annoying base
for future pgindent runs.
2017-01-24 10:20:02 -05:00
Robert Haas b1ecb9b3fc Fix interaction of partitioned tables with BulkInsertState.
When copying into a partitioned table, the target heap may change from
one tuple to next.  We must ask ReadBufferBI() to get a new buffer
every time such change occurs.  To do that, use new function
ReleaseBulkInsertStatePin().  This fixes the bug that tuples ended up
being inserted into the wrong partition, which occurred exactly
because the wrong buffer was used.

Amit Langote, per a suggestion from Robert Haas.  Some cosmetic
adjustments by me.

Reports by 高增琦 (Gao Zengqi), Venkata B Nagothi, and
Ragnar Ouchterlony.

Discussion: http://postgr.es/m/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com
Discussion: http://postgr.es/m/CAEyp7J9WiX0L3DoiNcRrY-9iyw%3DqP%2Bj%3DDLsAnNFF1xT2J1ggfQ%40mail.gmail.com
Discussion: http://postgr.es/m/16d73804-c9cd-14c5-463e-5caad563ff77%40agama.tv
Discussion: http://postgr.es/m/CA+TgmoaiZpDVUUN8LZ4jv1qFE_QyR+H9ec+79f5vNczYarg5Zg@mail.gmail.com
2017-01-24 08:50:16 -05:00
Peter Eisentraut 0bc1207aeb Fix default minimum value for descending sequences
For some reason that is lost in history, a descending sequence would
default its minimum value to -2^63+1 (-PG_INT64_MAX) instead of
-2^63 (PG_INT64_MIN), even though explicitly specifying a minimum value
of -2^63 would work.  Fix this inconsistency by using the full range by
default.

Reported-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-01-23 14:00:58 -05:00
Peter Eisentraut 46d482814c Don't error when no system locales were found
initdb used to warn about that, but it was changed to an error in
pg_import_system_locales, but some build farm members failed because of
that.  Change it back to a warning.
2017-01-23 13:45:32 -05:00
Alvaro Herrera 7e26e02eec Prefetch blocks during lazy vacuum's truncation scan
Vacuum truncation scan can be sped up on rotating media by prefetching
blocks in forward direction.  That makes the blocks already present in
memory by the time they are needed, while also letting OS read-ahead
kick in.

The truncate scan has been measured to be five times faster than without
this patch (that was on a slow disk, but it shouldn't hurt on fast
disks.)

Author: Álvaro Herrera, loosely based on a submission by Claudio Freire
Discussion: https://postgr.es/m/CAGTBQpa6NFGO_6g_y_7zQx8L9GcHDSQKYdo1tGuh791z6PYgEg@mail.gmail.com
2017-01-23 12:55:18 -03:00
Peter Eisentraut f21a563d25 Move some things from builtins.h to new header files
This avoids that builtins.h has to include additional header files.
2017-01-20 20:29:53 -05:00
Alvaro Herrera 50cf1c80e6 Record dependencies on owners for logical replication objects
This was forgotten in 665d1fad99 and
caused the whole buildfarm to become red for a little while.

Author: Petr Jelínek

Also fix a typo in a nearby error message.
2017-01-20 16:45:02 -03:00
Peter Eisentraut 665d1fad99 Logical replication
- Add PUBLICATION catalogs and DDL
- Add SUBSCRIPTION catalog and DDL
- Define logical replication protocol and output plugin
- Add logical replication workers

From: Petr Jelinek <petr@2ndquadrant.com>
Reviewed-by: Steve Singer <steve@ssinger.info>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Erik Rijkers <er@xs4all.nl>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
2017-01-20 09:04:49 -05:00
Andres Freund ea15e18677 Remove obsoleted code relating to targetlist SRF evaluation.
Since 69f4b9c plain expression evaluation (and thus normal projection)
can't return sets of tuples anymore. Thus remove code dealing with
that possibility.

This will require adjustments in external code using
ExecEvalExpr()/ExecProject() - that should neither be hard nor very
common.

Author: Andres Freund and Tom Lane
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-19 14:40:41 -08:00
Alvaro Herrera 8eace46d34 Fix race condition in reading commit timestamps
If a user requests the commit timestamp for a transaction old enough
that its data is concurrently being truncated away by vacuum at just the
right time, they would receive an ugly internal file-not-found error
message from slru.c rather than the expected NULL return value.

In a primary server, the window for the race is very small: the lookup
has to occur exactly between the two calls by vacuum, and there's not a
lot that happens between them (mostly just a multixact truncate).  In a
standby server, however, the window is larger because the truncation is
executed as soon as the WAL record for it is replayed, but the advance
of the oldest-Xid is not executed until the next checkpoint record.

To fix in the primary, simply reverse the order of operations in
vac_truncate_clog.  To fix in the standby, augment the WAL truncation
record so that the standby is aware of the new oldest-XID value and can
apply the update immediately.  WAL version bumped because of this.

No backpatch, because of the low importance of the bug and its rarity.

Author: Craig Ringer
Reviewed-By: Petr Jelínek, Peter Eisentraut
Discussion: https://postgr.es/m/CAMsr+YFhVtRQT1VAwC+WGbbxZZRzNou=N9Ed-FrCqkwQ8H8oJQ@mail.gmail.com
2017-01-19 18:24:17 -03:00
Robert Haas 05bd889904 Fix RETURNING to work correctly with partition tuple routing.
In ExecInsert(), do not switch back to the root partitioned table
ResultRelInfo until after we finish ExecProcessReturning(), so that
RETURNING projection is done using the partition's descriptor.  For
the projection to work correctly, we must initialize the same for each
leaf partition during ModifyTableState initialization.

Amit Langote
2017-01-19 13:20:11 -05:00
Robert Haas 39162b2030 Fix failure to enforce partitioning contraint for internal partitions.
When a tuple is inherited into a partitioning root, no partition
constraints need to be enforced; when it is inserted into a leaf, the
parent's partitioning quals needed to be enforced.  The previous
coding got both of those cases right.  When a tuple is inserted into
an intermediate level of the partitioning hierarchy (i.e. a table
which is both a partition itself and in turn partitioned), it must
enforce the partitioning qual inherited from its parent.  That case
got overlooked; repair.

Amit Langote
2017-01-19 12:30:27 -05:00
Andres Freund 69f4b9c85f Move targetlist SRF handling from expression evaluation to new executor node.
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.

This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.

That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.

To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist.  The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes.  The ProjectSet nodes always get
input from an underlying node.

We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.

While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e.  continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones.  We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.

As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.

Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node.  As that's quite confusing, we're
likely to instead prohibit SRFs in those places.  But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.

There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.

Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2017-01-18 13:40:27 -08:00
Alvaro Herrera 9a34123bc3 Make messages mentioning type names more uniform
This avoids additional translatable strings for each distinct type, as
well as making our quoting style around type names more consistent
(namely, that we don't quote type names).  This continues what started
as f402b99501.

Discussion: https://postgr.es/m/20160401170642.GA57509@alvherre.pgsql
2017-01-18 16:08:20 -03:00
Tom Lane 0333a73400 Avoid conflicts with collation aliases generated by stripping.
This resulted in failures depending on the order of "locale -a" output.
The original coding in initdb sorted the results, but that should be
unnecessary as long as "locale -a" doesn't print duplicate names.  The
original entries will then all be non-dups, and while we might generate
duplicate aliases by stripping, they should be for different encodings and
thus not conflict.  Even if the latter assumption fails somehow, it won't
be fatal because we're using if_not_exists mode for the aliases.

Discussion: https://postgr.es/m/26116.1484751196%40sss.pgh.pa.us
2017-01-18 13:44:19 -05:00
Peter Eisentraut aa17c06fb5 Add function to import operating system collations
Move this logic out of initdb into a user-callable function.  This
simplifies the code and makes it possible to update the standard
collations later on if additional operating system collations appear.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Euler Taveira <euler@timbira.com.br>
2017-01-18 09:35:56 -05:00
Peter Eisentraut 352a24a1f9 Generate fmgr prototypes automatically
Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains
prototypes for all functions registered in pg_proc.h.  This avoids
having to manually maintain these prototypes across a random variety of
header files.  It also automatically enforces a correct function
signature, and since there are warnings about missing prototypes, it
will detect functions that are defined but not registered in
pg_proc.h (or otherwise used).

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2017-01-17 14:06:07 -05:00
Tom Lane ab1f0c8225 Change representation of statement lists, and add statement location info.
This patch makes several changes that improve the consistency of
representation of lists of statements.  It's always been the case
that the output of parse analysis is a list of Query nodes, whatever
the types of the individual statements in the list.  This patch brings
similar consistency to the outputs of raw parsing and planning steps:

* The output of raw parsing is now always a list of RawStmt nodes;
the statement-type-dependent nodes are one level down from that.

* The output of pg_plan_queries() is now always a list of PlannedStmt
nodes, even for utility statements.  In the case of a utility statement,
"planning" just consists of wrapping a CMD_UTILITY PlannedStmt around
the utility node.  This list representation is now used in Portal and
CachedPlan plan lists, replacing the former convention of intermixing
PlannedStmts with bare utility-statement nodes.

Now, every list of statements has a consistent head-node type depending
on how far along it is in processing.  This allows changing many places
that formerly used generic "Node *" pointers to use a more specific
pointer type, thus reducing the number of IsA() tests and casts needed,
as well as improving code clarity.

Also, the post-parse-analysis representation of DECLARE CURSOR is changed
so that it looks more like EXPLAIN, PREPARE, etc.  That is, the contained
SELECT remains a child of the DeclareCursorStmt rather than getting flipped
around to be the other way.  It's now true for both Query and PlannedStmt
that utilityStmt is non-null if and only if commandType is CMD_UTILITY.
That allows simplifying a lot of places that were testing both fields.
(I think some of those were just defensive programming, but in many places,
it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.)

Because PlannedStmt carries a canSetTag field, we're also able to get rid
of some ad-hoc rules about how to reconstruct canSetTag for a bare utility
statement; specifically, the assumption that a utility is canSetTag if and
only if it's the only one in its list.  While I see no near-term need for
relaxing that restriction, it's nice to get rid of the ad-hocery.

The API of ProcessUtility() is changed so that what it's passed is the
wrapper PlannedStmt not just the bare utility statement.  This will affect
all users of ProcessUtility_hook, but the changes are pretty trivial; see
the affected contrib modules for examples of the minimum change needed.
(Most compilers should give pointer-type-mismatch warnings for uncorrected
code.)

There's also a change in the API of ExplainOneQuery_hook, to pass through
cursorOptions instead of expecting hook functions to know what to pick.
This is needed because of the DECLARE CURSOR changes, but really should
have been done in 9.6; it's unlikely that any extant hook functions
know about using CURSOR_OPT_PARALLEL_OK.

Finally, teach gram.y to save statement boundary locations in RawStmt
nodes, and pass those through to Query and PlannedStmt nodes.  This allows
more intelligent handling of cases where a source query string contains
multiple statements.  This patch doesn't actually do anything with the
information, but a follow-on patch will.  (Passing this information through
cleanly is the true motivation for these changes; while I think this is all
good cleanup, it's unlikely we'd have bothered without this end goal.)

catversion bump because addition of location fields to struct Query
affects stored rules.

This patch is by me, but it owes a good deal to Fabien Coelho who did
a lot of preliminary work on the problem, and also reviewed the patch.

Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
2017-01-14 16:02:35 -05:00
Robert Haas 0563a3a8b5 Fix a bug in how we generate partition constraints.
Move the code for doing parent attnos to child attnos mapping for Vars
in partition constraint expressions to a separate function
map_partition_varattnos() and call it from the appropriate places.
Doing it in get_qual_from_partbound(), as is now, would produce wrong
result in certain multi-level partitioning cases, because it only
considers the current pair of parent-child relations.  In certain
multi-level partitioning cases, attnums for the same key attribute(s)
might differ between various levels causing the same attribute to be
numbered differently in different instances of the Var corresponding
to a given attribute.

With this commit, in generate_partition_qual(), we first generate the
the whole partition constraint (considering all levels of partitioning)
and then do the mapping, so that Vars in the final expression are
numbered according the leaf relation (to which it is supposed to apply).

Amit Langote, reviewed by me.
2017-01-13 14:04:35 -05:00
Alvaro Herrera 3957b58b88 Fix ALTER TABLE / SET TYPE for irregular inheritance
If inherited tables don't have exactly the same schema, the USING clause
in an ALTER TABLE / SET DATA TYPE misbehaves when applied to the
children tables since commit 9550e8348b.  Starting with that commit,
the attribute numbers in the USING expression are fixed during parse
analysis.  This can lead to bogus errors being reported during
execution, such as:
   ERROR:  attribute 2 has wrong type
   DETAIL:  Table has type smallint, but query expects integer.

Since it wouldn't do to revert to the original coding, we now apply a
transformation to map the attribute numbers to the correct ones for each
child.

Reported by Justin Pryzby
Analysis by Tom Lane; patch by me.
Discussion: https://postgr.es/m/20170102225618.GA10071@telsasoft.com
2017-01-09 19:26:58 -03:00
Tom Lane 7c3abe3c92 Get rid of ParseState.p_value_substitute; use a columnref hook instead.
I noticed that p_value_substitute, which is a single-purpose kluge I added
in 2002 (commit b0422b215), could be replaced by having domainAddConstraint
install a parser hook that looks for the name "value".  The parser hook
code only dates back to 2009, so it's not surprising that we had to kluge
this in 2002, but we can do it more cleanly now.
2017-01-07 16:02:16 -05:00
Tom Lane c52d37c8b3 Invalidate cached plans on FDW option changes.
This fixes problems where a plan must change but fails to do so,
as seen in a bug report from Rajkumar Raghuwanshi.

For ALTER FOREIGN TABLE OPTIONS, do this through the standard method of
forcing a relcache flush on the table.  For ALTER FOREIGN DATA WRAPPER
and ALTER SERVER, just flush the whole plan cache on any change in
pg_foreign_data_wrapper or pg_foreign_server.  That matches the way
we handle some other low-probability cases such as opclass changes, and
it's unclear that the case arises often enough to be worth working harder.
Besides, that gives a patch that is simple enough to back-patch with
confidence.

Back-patch to 9.3.  In principle we could apply the code change to 9.2 as
well, but (a) we lack postgres_fdw to test it with, (b) it's doubtful that
anyone is doing anything exciting enough with FDWs that far back to need
this desperately, and (c) the patch doesn't apply cleanly.

Patch originally by Amit Langote, reviewed by Etsuro Fujita and Ashutosh
Bapat, who each contributed substantial changes as well.

Discussion: https://postgr.es/m/CAKcux6m5cA6rRPTKkqVdJ-R=KKDfe35Q_ZuUqxDSV_4hwga=og@mail.gmail.com
2017-01-06 14:12:52 -05:00
Tom Lane d86f40009b Handle OID column inheritance correctly in ALTER TABLE ... INHERIT.
Inheritance operations must treat the OID column, if any, much like
regular user columns.  But MergeAttributesIntoExisting() neglected to
do that, leading to weird results after a table with OIDs is associated
to a parent with OIDs via ALTER TABLE ... INHERIT.

Report and patch by Amit Langote, reviewed by Ashutosh Bapat, some
adjustments by me.  It's been broken all along, so back-patch to
all supported branches.

Discussion: https://postgr.es/m/cb13cfe7-a48c-5720-c383-bb843ab28298@lab.ntt.co.jp
2017-01-04 18:00:11 -05:00
Robert Haas 18fc5192a6 Remove unnecessary arguments from partitioning functions.
RelationGetPartitionQual() and generate_partition_qual() are always
called with recurse = true, so we don't need an argument for that.

Extracted by me from a larger patch by Amit Langote.
2017-01-04 14:56:37 -05:00
Robert Haas f1b4c771ea Fix reporting of constraint violations for table partitioning.
After a tuple is routed to a partition, it has been converted from the
root table's row type to the partition's row type.  ExecConstraints
needs to report the failure using the original tuple and the parent's
tuple descriptor rather than the ones for the selected partition.

Amit Langote
2017-01-04 14:36:34 -05:00
Robert Haas 345b2dcf07 Move partition_tuple_slot out of EState.
Commit 2ac3ef7a01 added a TupleTapleSlot
for partition tuple slot to EState (es_partition_tuple_slot) but it's
more logical to have it as part of ModifyTableState
(mt_partition_tuple_slot) and CopyState (partition_tuple_slot).

Discussion: http://postgr.es/m/1bd459d9-4c0c-197a-346e-e5e59e217d97@lab.ntt.co.jp

Amit Langote, per a gripe from me
2017-01-04 13:16:59 -05:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Peter Eisentraut db779d941e Fix typo in comment 2016-12-29 11:27:41 -05:00
Peter Eisentraut 2e254130d1 Make more use of RoleSpec struct
Most code was casting this through a generic Node.  By declaring
everything as RoleSpec appropriately, we can remove a bunch of casts and
ad-hoc node type checking.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
2016-12-29 10:49:39 -05:00
Magnus Hagander 6cfa54e384 Fix typo comments
Erik Rijkers
2016-12-27 10:24:21 +01:00
Tom Lane fe591f8bf6 Replace enum InhOption with simple boolean.
Now that it has only INH_NO and INH_YES values, it's just weird that
it's not a plain bool, so make it that way.

Also rename RangeVar.inhOpt to "inh", to be like RangeTblEntry.inh.
My recollection is that we gave it a different name specifically because
it had a different representation than the derived bool value, but it
no longer does.  And this is a good forcing function to be sure we
catch any places that are affected by the change.

Bump catversion because of possible effect on stored RangeVar nodes.
I'm not exactly convinced that we ever store RangeVar on disk, but
we have a readfuncs function for it, so be cautious.  (If we do do so,
then commit e13486eba was in error not to bump catversion.)

Follow-on to commit e13486eba.

Discussion: http://postgr.es/m/CA+TgmoYe+EG7LdYX6pkcNxr4ygkP4+A=jm9o-CPXyOvRiCNwaQ@mail.gmail.com
2016-12-23 13:35:18 -05:00
Peter Eisentraut 158df30359 Remove unnecessary casts of makeNode() result
makeNode() is already a macro that has the right result pointer type, so
casting it again to the same type is unnecessary.
2016-12-23 13:17:20 -05:00
Robert Haas e13486eba0 Remove sql_inheritance GUC.
This backward-compatibility GUC is long overdue for removal.

Discussion: http://postgr.es/m/CA+TgmoYe+EG7LdYX6pkcNxr4ygkP4+A=jm9o-CPXyOvRiCNwaQ@mail.gmail.com
2016-12-23 07:35:01 -05:00
Robert Haas 2ac3ef7a01 Fix tuple routing in cases where tuple descriptors don't match.
The previous coding failed to work correctly when we have a
multi-level partitioned hierarchy where tables at successive levels
have different attribute numbers for the partition key attributes.  To
fix, have each PartitionDispatch object store a standalone
TupleTableSlot initialized with the TupleDesc of the corresponding
partitioned table, along with a TupleConversionMap to map tuples from
the its parent's rowtype to own rowtype.  After tuple routing chooses
a leaf partition, we must use the leaf partition's tuple descriptor,
not the root table's.  To that end, a dedicated TupleTableSlot for
tuple routing is now allocated in EState.

Amit Langote
2016-12-22 17:36:37 -05:00
Stephen Frost 12bd7dd317 Use TSConfigRelationId in AlterTSConfiguration()
When we are altering a text search configuration, we are getting the
tuple from pg_ts_config and using its OID, so use TSConfigRelationId
when invoking any post-alter hooks and setting the object address.

Further, in the functions called from AlterTSConfiguration(), we're
saving information about the command via
EventTriggerCollectAlterTSConfig(), so we should be setting
commandCollected to true.  Also add a regression test to
test_ddl_deparse for ALTER TEXT SEARCH CONFIGURATION.

Author: Artur Zakirov, a few additional comments by me
Discussion: https://www.postgresql.org/message-id/57a71eba-f2c7-e7fd-6fc0-2126ec0b39bd%40postgrespro.ru

Back-patch the fix for the InvokeObjectPostAlterHook() call to 9.3 where
it was introduced, and the fix for the ObjectAddressSet() call and
setting commandCollected to true to 9.5 where those changes to
ProcessUtilitySlow() were introduced.
2016-12-22 17:08:43 -05:00
Robert Haas 3ee8067284 Code review for ATExecAttachPartition.
Amit Langote.  Most of this reported by Álvaro Herrera.
2016-12-22 12:40:45 -05:00
Dean Rasheed 58b1362642 Fix order of operations in CREATE OR REPLACE VIEW.
When CREATE OR REPLACE VIEW acts on an existing view, don't update the
view options until after the view query has been updated.

This is necessary in the case where CREATE OR REPLACE VIEW is used on
an existing view that is not updatable, and the new view is updatable
and specifies the WITH CHECK OPTION. In this case, attempting to apply
the new options to the view before updating its query fails, because
the options are applied using the ALTER TABLE infrastructure which
checks that WITH CHECK OPTION is only applied to an updatable view.

If new columns are being added to the view, that is also done using
the ALTER TABLE infrastructure, but it is important that that still be
done before updating the view query, because the rules system checks
that the query columns match those on the view relation. Added a
comment to explain that, in case someone is tempted to move that to
where the view options are now being set.

Back-patch to 9.4 where WITH CHECK OPTION was added.

Report: https://postgr.es/m/CAEZATCUp%3Dz%3Ds4SzZjr14bfct_bdJNwMPi-gFi3Xc5k1ntbsAgQ%40mail.gmail.com
2016-12-21 16:58:18 +00:00
Robert Haas cd510f0413 Convert elog() to ereport() and do some wordsmithing.
It's not entirely clear that we should log a message here at all, but
it's certainly wrong to use elog() for a message that should clearly
be translatable.

Amit Langote
2016-12-21 11:47:50 -05:00
Robert Haas 1fc5c49450 Refactor partition tuple routing code to reduce duplication.
Amit Langote
2016-12-21 11:36:10 -05:00
Peter Eisentraut f3b421da5f Reorder pg_sequence columns to avoid alignment issue
On AIX, doubles are aligned at 4 bytes, but int64 is aligned at 8 bytes.
Our code assumes that doubles have alignment that can also be applied to
int64, but that fails in this case.  One effect is that
heap_form_tuple() writes tuples in a different layout than
Form_pg_sequence expects.

Rather than rewrite the whole alignment code, work around the issue by
reordering the columns in pg_sequence so that the first int64 column
naturally comes out at an 8-byte boundary.
2016-12-21 09:06:49 -05:00
Peter Eisentraut 1753b1b027 Add pg_sequence system catalog
Move sequence metadata (start, increment, etc.) into a proper system
catalog instead of storing it in the sequence heap object.  This
separates the metadata from the sequence data.  Sequence metadata is now
operated on transactionally by DDL commands, whereas previously
rollbacks of sequence-related DDL commands would be ignored.

Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-12-20 08:28:18 -05:00
Robert Haas 7cd0fd655d Invalid parent's relcache after CREATE TABLE .. PARTITION OF.
Otherwise, subsequent commands in the same transaction see the wrong
partition descriptor.

Amit Langote.  Reported by Tomas Vondra and David Fetter.  Reviewed
by me.

Discussion: http://postgr.es/m/22dd313b-d7fd-22b5-0787-654845c8f849%402ndquadrant.com
Discussion: http://postgr.es/m/20161215090916.GB20659%40fetter.org
2016-12-19 22:53:30 -05:00
Robert Haas 4b9a98e154 Clean up code, comments, and formatting for table partitioning.
Amit Langote, plus pgindent-ing by me.  Inspired in part by review
comments from Tomas Vondra.
2016-12-13 10:59:14 -05:00
Robert Haas 3856cf9607 Remove should_free arguments to tuplesort routines.
Since commit e94568ecc1, the answer is
always "false", and we do not need to complicate the API by arranging
to return a constant value.

Peter Geoghegan

Discussion: http://postgr.es/m/CAM3SWZQWZZ_N=DmmL7tKy_OUjGH_5mN=N=A6h7kHyyDvEhg2DA@mail.gmail.com
2016-12-12 15:57:35 -05:00
Tom Lane 563d575fd7 Fix creative, but unportable, spelling of "ptr != NULL".
Or at least I suppose that's what was really meant here.  But even
aside from the not-per-project-style use of "0" to mean "NULL",
I doubt it's safe to assume that all valid pointers are > NULL.
Per buildfarm member pademelon.
2016-12-12 11:23:23 -05:00
Robert Haas f0e44751d7 Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.

Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.

Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.

Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.
2016-12-07 13:17:55 -05:00
Robert Haas 4212cb7326 Fix interaction of parallel query with prepared statements.
Previously, a prepared statement created via a Parse message could get
a parallel plan, but one created with a PREPARE statement could not.
This state of affairs was due to confusion on my (rhaas) part: I
erroneously believed that a CREATE TABLE .. AS EXECUTE statement could
only be performed with a prepared statement by PREPARE, but in fact
one created by a Prepare message works just as well.  Therefore, it
makes no sense to allow parallel query in one case but not the other.

To fix, allow parallel query with all prepared statements, but run
the parallel plan serially (i.e. without workers) in the case of
CREATE TABLE .. AS EXECUTE.  Also, document this.

Amit Kapila and Tobias Bussman, plus an extra sentence of
documentation by me.
2016-12-06 11:11:54 -05:00
Stephen Frost 093129c9d9 Add support for restrictive RLS policies
We have had support for restrictive RLS policies since 9.5, but they
were only available through extensions which use the appropriate hooks.
This adds support into the grammer, catalog, psql and pg_dump for
restrictive RLS policies, thus reducing the cases where an extension is
necessary.

In passing, also move away from using "AND"d and "OR"d in comments.
As pointed out by Alvaro, it's not really appropriate to attempt
to make verbs out of "AND" and "OR", so reword those comments which
attempted to.

Reviewed By: Jeevan Chalke, Dean Rasheed
Discussion: https://postgr.es/m/20160901063404.GY4028@tamriel.snowman.net
2016-12-05 15:50:55 -05:00
Alvaro Herrera fa2fa99552 Permit dump/reload of not-too-large >1GB tuples
Our documentation states that our maximum field size is 1 GB, and that
our maximum row size of 1.6 TB.  However, while this might be attainable
in theory with enough contortions, it is not workable in practice; for
starters, pg_dump fails to dump tables containing rows larger than 1 GB,
even if individual columns are well below the limit; and even if one
does manage to manufacture a dump file containing a row that large, the
server refuses to load it anyway.

This commit enables dumping and reloading of such tuples, provided two
conditions are met:

1. no single column is larger than 1 GB (in output size -- for bytea
   this includes the formatting overhead)
2. the whole row is not larger than 2 GB

There are three related changes to enable this:

a. StringInfo's API now has two additional functions that allow creating
a string that grows beyond the typical 1GB limit (and "long" string).
ABI compatibility is maintained.  We still limit these strings to 2 GB,
though, for reasons explained below.

b. COPY now uses long StringInfos, so that pg_dump doesn't choke
trying to emit rows longer than 1GB.

c. heap_form_tuple now uses the MCXT_ALLOW_HUGE flag in its allocation
for the input tuple, which means that large tuples are accepted on
input.  Note that at this point we do not apply any further limit to the
input tuple size.

The main reason to limit to 2 GB is that the FE/BE protocol uses 32 bit
length words to describe each row; and because the documentation is
ambiguous on its signedness and libpq does consider it signed, we cannot
use the highest-order bit.  Additionally, the StringInfo API uses "int"
(which is 4 bytes wide in most platforms) in many places, so we'd need
to change that API too in order to improve, which has lots of fallout.

Backpatch to 9.5, which is the oldest that has
MemoryContextAllocExtended, a necessary piece of infrastructure.  We
could apply to 9.4 with very minimal additional effort, but any further
than that would require backpatching "huge" allocations too.

This is the largest set of changes we could find that can be
back-patched without breaking compatibility with existing systems.
Fixing a bigger set of problems (for example, dumping tuples bigger than
2GB, or dumping fields bigger than 1GB) would require changing the FE/BE
protocol and/or changing the StringInfo API in an ABI-incompatible way,
neither of which would be back-patchable.

Authors: Daniel Vérité, Álvaro Herrera
Reviewed by: Tomas Vondra
Discussion: https://postgr.es/m/20160229183023.GA286012@alvherre.pgsql
2016-12-02 00:34:01 -03:00
Tom Lane 4e026b32d4 Check for pending trigger events on far end when dropping an FK constraint.
When dropping a foreign key constraint with ALTER TABLE DROP CONSTRAINT,
we refuse the drop if there are any pending trigger events on the named
table; this ensures that we won't remove the pg_trigger row that will be
consulted by those events.  But we should make the same check for the
referenced relation, else we might remove a due-to-be-referenced pg_trigger
row for that relation too, resulting in "could not find trigger NNN" or
"relation NNN has no triggers" errors at commit.  Per bug #14431 from
Benjie Gillam.  Back-patch to all supported branches.

Report: <20161124114911.6530.31200@wrigleys.postgresql.org>
2016-11-25 13:44:47 -05:00
Peter Eisentraut 67dc4ccbb2 Add pg_sequences view
Like pg_tables, pg_views, and others, this view contains information
about sequences in a way that is independent of the system catalog
layout but more comprehensive than the information schema.

To help implement the view, add a new internal function
pg_sequence_last_value() to return the last value of a sequence.  This
is kept separate from pg_sequence_parameters() to separate querying
run-time state from catalog-like information.

Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-11-18 14:59:03 -05:00
Tom Lane 279c439c7f Support "COPY view FROM" for views with INSTEAD OF INSERT triggers.
We just pass the data to the INSTEAD trigger.

Haribabu Kommi, reviewed by Dilip Kumar

Patch: <CAJrrPGcSQkrNkO+4PhLm4B8UQQQmU9YVUuqmtgM=pmzMfxWaWQ@mail.gmail.com>
2016-11-10 14:13:43 -05:00
Robert Haas dce429b117 Fix typo.
Michael Paquier
2016-11-08 15:33:57 -05:00
Kevin Grittner 8c48375e5f Implement syntax for transition tables in AFTER triggers.
This is infrastructure for the complete SQL standard feature.  No
support is included at this point for execution nodes or PLs.  The
intent is to add that soon.

As this patch leaves things, standard syntax can create tuplestores
to contain old and/or new versions of rows affected by a statement.
References to these tuplestores are in the TriggerData structure.
C triggers can access the tuplestores directly, so they are usable,
but they cannot yet be referenced within a SQL statement.
2016-11-04 10:49:50 -05:00
Tom Lane a522fc3d80 Fix incorrect trigger-property updating in ALTER CONSTRAINT.
The code to change the deferrability properties of a foreign-key constraint
updated all the associated triggers to match; but a moment's examination of
the code that creates those triggers in the first place shows that only
some of them should track the constraint's deferrability properties.  This
leads to odd failures in subsequent exercise of the foreign key, as the
triggers are fired at the wrong times.  Fix that, and add a regression test
comparing the trigger properties produced by ALTER CONSTRAINT with those
you get by creating the constraint as-intended to begin with.

Per report from James Parks.  Back-patch to 9.4 where this ALTER
functionality was introduced.

Report: <CAJ3Xv+jzJ8iNNUcp4RKW8b6Qp1xVAxHwSXVpjBNygjKxcVuE9w@mail.gmail.com>
2016-10-26 17:05:06 -04:00
Tom Lane 709e461bef Fix EXPLAIN so that it doesn't emit invalid XML in corner cases.
With track_io_timing = on, EXPLAIN (ANALYZE, BUFFERS) will emit fields
named like "I/O Read Time".  The slash makes that invalid as an XML
element name, so that adding FORMAT XML would produce invalid XML.

We already have code in there to translate spaces to dashes, so let's
generalize that to convert anything that isn't a valid XML name character,
viz letters, digits, hyphens, underscores, and periods.  We could just
reject slashes, which would run a bit faster.  But the fact that this went
unnoticed for so long doesn't give me a warm feeling that we'd notice the
next creative violation, so let's make it a permanent fix.

Reported by Markus Winand, though this isn't his initial patch proposal.

Back-patch to 9.2 where track_io_timing was added.  The problem is only
latent in 9.1, so I don't feel a need to fix it there.

Discussion: <E0BF6A45-68E8-45E6-918F-741FB332C6BB@winand.at>
2016-10-20 17:17:50 -04:00
Tom Lane 2f1eaf87e8 Drop server support for FE/BE protocol version 1.0.
While this isn't a lot of code, it's been essentially untestable for
a very long time, because libpq doesn't support anything older than
protocol 2.0, and has not since release 6.3.  There's no reason to
believe any other client-side code still uses that protocol, either.

Discussion: <2661.1475849167@sss.pgh.pa.us>
2016-10-11 12:19:18 -04:00
Heikki Linnakangas 6fb12cbcd6 Remove some unnecessary #includes.
Amit Langote
2016-10-10 12:22:58 +03:00
Tom Lane e55a946a81 Fix two bugs in merging of inherited CHECK constraints.
Historically, we've allowed users to add a CHECK constraint to a child
table and then add an identical CHECK constraint to the parent.  This
results in "merging" the two constraints so that the pre-existing
child constraint ends up with both conislocal = true and coninhcount > 0.
However, if you tried to do it in the other order, you got a duplicate
constraint error.  This is problematic for pg_dump, which needs to issue
separated ADD CONSTRAINT commands in some cases, but has no good way to
ensure that the constraints will be added in the required order.
And it's more than a bit arbitrary, too.  The goal of complaining about
duplicated ADD CONSTRAINT commands can be served if we reject the case of
adding a constraint when the existing one already has conislocal = true;
but if it has conislocal = false, let's just make the ADD CONSTRAINT set
conislocal = true.  In this way, either order of adding the constraints
has the same end result.

Another problem was that the code allowed creation of a parent constraint
marked convalidated that is merged with a child constraint that is
!convalidated.  In this case, an inheritance scan of the parent table could
emit some rows violating the constraint condition, which would be an
unexpected result given the marking of the parent constraint as validated.
Hence, forbid merging of constraints in this case.  (Note: valid child and
not-valid parent seems fine, so continue to allow that.)

Per report from Benedikt Grundmann.  Back-patch to 9.2 where we introduced
possibly-not-valid check constraints.  The second bug obviously doesn't
apply before that, and I think the first doesn't either, because pg_dump
only gets into this situation when dealing with not-valid constraints.

Report: <CADbMkNPT-Jz5PRSQ4RbUASYAjocV_KHUWapR%2Bg8fNvhUAyRpxA%40mail.gmail.com>
Discussion: <22108.1475874586@sss.pgh.pa.us>
2016-10-08 19:29:27 -04:00
Stephen Frost 814b9e9b8e Fix RLS with COPY (col1, col2) FROM tab
Attempting to COPY a subset of columns from a table with RLS enabled
would fail due to an invalid query being constructed (using a single
ColumnRef with the list of fields to exact in 'fields', but that's for
the different levels of an indirection for a single column, not for
specifying multiple columns).

Correct by building a ColumnRef and then RestTarget for each column
being requested and then adding those to the targetList for the select
query.  Include regression tests to hopefully catch if this is broken
again in the future.

Patch-By: Adam Brightwell
Reviewed-By: Michael Paquier
2016-10-03 16:22:57 -04:00
Peter Eisentraut 967ed9205b Remove dead line of code 2016-09-28 12:00:00 -04:00
Heikki Linnakangas babe05bc2b Turn password_encryption GUC into an enum.
This makes the parameter easier to extend, to support other password-based
authentication protocols than MD5. (SCRAM is being worked on.)

The GUC still accepts on/off as aliases for "md5" and "plain", although
we may want to remove those once we actually add support for another
password hash type.

Michael Paquier, reviewed by David Steele, with some further edits by me.

Discussion: <CAB7nPqSMXU35g=W9X74HVeQp0uvgJxvYOuA4A-A3M+0wfEBv-w@mail.gmail.com>
2016-09-28 12:22:44 +03:00
Robert Haas 5225c66336 Clarify policy on marking inherited constraints as valid.
Amit Langote and Robert Haas
2016-09-15 17:24:54 -04:00
Tom Lane 55c3391d1e Be pickier about converting between Name and Datum.
We were misapplying NameGetDatum() to plain C strings in some places.
This worked, because it was just a pointer cast anyway, but it's a type
cheat in some sense.  Use CStringGetDatum instead, and modify the
NameGetDatum macro so it won't compile if applied to something that's
not a pointer to NameData.  This should result in no changes to
generated code, but it is logically cleaner.

Mark Dilger, tweaked a bit by me

Discussion: <EFD8AC94-4C1F-40C1-A5EA-304080089C1B@gmail.com>
2016-09-13 17:17:48 -04:00
Tom Lane 40b449ae84 Allow CREATE EXTENSION to follow extension update paths.
Previously, to update an extension you had to produce both a version-update
script and a new base installation script.  It's become more and more
obvious that that's tedious, duplicative, and error-prone.  This patch
attempts to improve matters by allowing the new base installation script
to be omitted.  CREATE EXTENSION will install a requested version if it
can find a base script and a chain of update scripts that will get there.
As in the existing update logic, shorter chains are preferred if there's
more than one possibility, with an arbitrary tie-break rule for chains
of equal length.

Also adjust the pg_available_extension_versions view to show such versions
as installable.

While at it, refactor the code so that CASCADE processing works for
extensions requested during ApplyExtensionUpdates().  Without this,
addition of a new requirement in an updated extension would require
creating a new base script, even if there was no other reason to do that.
(It would be easy at this point to add a CASCADE option to ALTER EXTENSION
UPDATE, to allow the same thing to happen during a manually-commanded
version update, but I have not done that here.)

Tom Lane, reviewed by Andres Freund

Discussion: <20160905005919.jz2m2yh3und2dsuy@alap3.anarazel.de>
2016-09-11 14:15:07 -04:00
Tom Lane 0ab9c56d0f Support renaming an existing value of an enum type.
Not much to be said about this patch: it does what it says on the tin.

In passing, rename AlterEnumStmt.skipIfExists to skipIfNewValExists
to clarify what it actually does.  In the discussion of this patch
we considered supporting other similar options, such as IF EXISTS
on the type as a whole or IF NOT EXISTS on the target name.  This
patch doesn't actually add any such feature, but it might happen later.

Dagfinn Ilmari Mannsåker, reviewed by Emre Hasegeli

Discussion: <CAO=2mx6uvgPaPDf-rHqG8=1MZnGyVDMQeh8zS4euRyyg4D35OQ@mail.gmail.com>
2016-09-07 16:11:56 -04:00
Tom Lane 4f405c8ef4 Add a HINT for client-vs-server COPY failure cases.
Users often get confused between COPY and \copy and try to use client-side
paths with COPY.  The server then cannot find the file (if remote), or sees
a permissions problem (if local), or some variant of that.  Emit a hint
about this in the most common cases.

In future we might want to expand the set of errnos for which the hint
gets printed, but be conservative for now.

Craig Ringer, reviewed by Christoph Berg and Tom Lane

Discussion: <CAMsr+YEqtD97qPEzQDqrCt5QiqPbWP_X4hmvy2pQzWC0GWiyPA@mail.gmail.com>
2016-09-06 23:55:55 -04:00
Peter Eisentraut 49eb0fd097 Add location field to DefElem
Add a location field to the DefElem struct, used to parse many utility
commands.  Update various error messages to supply error position
information.

To propogate the error position information in a more systematic way,
create a ParseState in standard_ProcessUtility() and pass that to
interested functions implementing the utility commands.  This seems
better than passing the query string and then reassembling a parse state
ad hoc, which violates the encapsulation of the ParseState type.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2016-09-06 12:00:00 -04:00
Simon Riggs dcb12ce8d8 Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL
lazy_truncate_heap() was waiting for
VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL, but in microseconds
not milliseconds as originally intended.

Found by code inspection.

Simon Riggs
2016-09-06 15:35:47 +01:00
Tom Lane 25794e841e Cosmetic code cleanup in commands/extension.c.
Some of the comments added by the CREATE EXTENSION CASCADE patch were
a bit sloppy, and I didn't care for redeclaring the same local variable
inside a nested block either.  No functional changes.
2016-09-05 18:53:33 -04:00
Tom Lane 15bc038f9b Relax transactional restrictions on ALTER TYPE ... ADD VALUE.
To prevent possibly breaking indexes on enum columns, we must keep
uncommitted enum values from getting stored in tables, unless we
can be sure that any such column is new in the current transaction.

Formerly, we enforced this by disallowing ALTER TYPE ... ADD VALUE
from being executed at all in a transaction block, unless the target
enum type had been created in the current transaction.  This patch
removes that restriction, and instead insists that an uncommitted enum
value can't be referenced unless it belongs to an enum type created
in the same transaction as the value.  Per discussion, this should be
a bit less onerous.  It does require each function that could possibly
return a new enum value to SQL operations to check this restriction,
but there aren't so many of those that this seems unmaintainable.

Andrew Dunstan and Tom Lane

Discussion: <4075.1459088427@sss.pgh.pa.us>
2016-09-05 12:59:55 -04:00
Heikki Linnakangas ec136d19b2 Move code shared between libpq and backend from backend/libpq/ to common/.
When building libpq, ip.c and md5.c were symlinked or copied from
src/backend/libpq into src/interfaces/libpq, but now that we have a
directory specifically for routines that are shared between the server and
client binaries, src/common/, move them there.

Some routines in ip.c were only used in the backend. Keep those in
src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file
that's now in common.

Fix the comment in src/common/Makefile to reflect how libpq actually links
those files.

There are two more files that libpq symlinks directly from src/backend:
encnames.c and wchar.c. I don't feel compelled to move those right now,
though.

Patch by Michael Paquier, with some changes by me.

Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>
2016-09-02 13:49:59 +03:00
Tom Lane ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Tom Lane b5bce6c1ec Final pgindent + perltidy run for 9.6. 2016-08-15 13:42:51 -04:00
Tom Lane ed0097e4f9 Add SQL-accessible functions for inspecting index AM properties.
Per discussion, we should provide such functions to replace the lost
ability to discover AM properties by inspecting pg_am (cf commit
65c5fcd35).  The added functionality is also meant to displace any code
that was looking directly at pg_index.indoption, since we'd rather not
believe that the bit meanings in that field are part of any client API
contract.

As future-proofing, define the SQL API to not assume that properties that
are currently AM-wide or index-wide will remain so unless they logically
must be; instead, expose them only when inquiring about a specific index
or even specific index column.  Also provide the ability for an index
AM to override the behavior.

In passing, document pg_am.amtype, overlooked in commit 473b93287.

Andrew Gierth, with kibitzing by me and others

Discussion: <87mvl5on7n.fsf@news-spur.riddles.org.uk>
2016-08-13 18:31:14 -04:00
Tom Lane 4b234fd8bf Fix inappropriate printing of never-measured times in EXPLAIN.
EXPLAIN (ANALYZE, TIMING OFF) would print an elapsed time of zero for
a trigger function, because no measurement has been taken but it printed
the field anyway.  This isn't what EXPLAIN does elsewhere, so suppress it.

In the same vein, EXPLAIN (ANALYZE, BUFFERS) with non-text output format
would print buffer I/O timing numbers even when no measurement has been
taken because track_io_timing is off.  That seems not per policy, either,
so change it.

Back-patch to 9.2 where these features were introduced.

Maksim Milyutin

Discussion: <081c0540-ecaa-bd29-3fd2-6358f3b359a9@postgrespro.ru>
2016-08-12 12:13:04 -04:00
Tom Lane 95bee941be Fix misestimation of n_distinct for a nearly-unique column with many nulls.
If ANALYZE found no repeated non-null entries in its sample, it set the
column's stadistinct value to -1.0, intending to indicate that the entries
are all distinct.  But what this value actually means is that the number
of distinct values is 100% of the table's rowcount, and thus it was
overestimating the number of distinct values by however many nulls there
are.  This could lead to very poor selectivity estimates, as for example
in a recent report from Andreas Joseph Krogh.  We should discount the
stadistinct value by whatever we've estimated the nulls fraction to be.
(That is what will happen if we choose to use a negative stadistinct for
a column that does have repeated entries, so this code path was just
inconsistent.)

In addition to fixing the stadistinct entries stored by several different
ANALYZE code paths, adjust the logic where get_variable_numdistinct()
forces an "all distinct" estimate on the basis of finding a relevant unique
index.  Unique indexes don't reject nulls, so there's no reason to assume
that the null fraction doesn't apply.

Back-patch to all supported branches.  Back-patching is a bit of a judgment
call, but this problem seems to affect only a few users (else we'd have
identified it long ago), and it's bad enough when it does happen that
destabilizing plan choices in a worse direction seems unlikely.

Patch by me, with documentation wording suggested by Dean Rasheed

Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena>
Discussion: <16143.1470350371@sss.pgh.pa.us>
2016-08-07 18:52:02 -04:00
Tom Lane 9ee1cf04ab Fix TOAST access failure in RETURNING queries.
Discussion of commit 3e2f3c2e4 exposed a problem that is of longer
standing: since we don't detoast data while sticking it into a portal's
holdStore for PORTAL_ONE_RETURNING and PORTAL_UTIL_SELECT queries, and we
release the query's snapshot as soon as we're done loading the holdStore,
later readout of the holdStore can do TOAST fetches against data that can
no longer be seen by any of the session's live snapshots.  This means that
a concurrent VACUUM could remove the TOAST data before we can fetch it.
Commit 3e2f3c2e4 exposed the problem by showing that sometimes we had *no*
live snapshots while fetching TOAST data, but we'd be at risk anyway.

I believe this code was all right when written, because our management of a
session's exposed xmin was such that the TOAST references were safe until
end of transaction.  But that's no longer true now that we can advance or
clear our PGXACT.xmin intra-transaction.

To fix, copy the query's snapshot during FillPortalStore() and save it in
the Portal; release it only when the portal is dropped.  This essentially
implements a policy that we must hold a relevant snapshot whenever we
access potentially-toasted data.  We had already come to that conclusion
in other places, cf commits 08e261cbc9 and ec543db77b.

I'd have liked to add a regression test case for this, but I didn't see
a way to make one that's not unreasonably bloated; it seems to require
returning a toasted value to the client, and those will be big.

In passing, improve PortalRunUtility() so that it positively verifies
that its ending PopActiveSnapshot() call will pop the expected snapshot,
removing a rather shaky assumption about which utility commands might
do their own PopActiveSnapshot().  There's no known bug here, but now
that we're actively referencing the snapshot it's almost free to make
this code a bit more bulletproof.

We might want to consider back-patching something like this into older
branches, but it would be prudent to let it prove itself more in HEAD
beforehand.

Discussion: <87vazemeda.fsf@credativ.de>
2016-08-07 17:46:08 -04:00
Peter Eisentraut 40fcfec82c Message style improvements 2016-07-25 22:07:44 -04:00
Kevin Grittner 1c15aac53f Add comment & docs about no vacuum truncation with sto.
Omission noted by Andres Freund.
2016-07-19 16:25:53 -05:00
Andres Freund eca0f1db14 Clear all-frozen visibilitymap status when locking tuples.
Since a892234 & fd31cd265 the visibilitymap's freeze bit is used to
avoid vacuuming the whole relation in anti-wraparound vacuums. Doing so
correctly relies on not adding xids to the heap without also unsetting
the visibilitymap flag.  Tuple locking related code has not done so.

To allow selectively resetting all-frozen - to avoid pessimizing
heap_lock_tuple - allow to selectively reset the all-frozen with
visibilitymap_clear(). To avoid having to use
visibilitymap_get_status (e.g. via VM_ALL_FROZEN) inside a critical
section, have visibilitymap_clear() return whether any bits have been
reset.

There's a remaining issue (denoted by XXX): After the PageIsAllVisible()
check in heap_lock_tuple() and heap_lock_updated_tuple_rec() the page
status could theoretically change. Practically that currently seems
impossible, because updaters will hold a page level pin already.  Due to
the next beta coming up, it seems better to get the required WAL magic
bump done before resolving this issue.

The added flags field fields to xl_heap_lock and xl_heap_lock_updated
require bumping the WAL magic. Since there's already been a catversion
bump since the last beta, that's not an issue.

Reviewed-By: Robert Haas, Amit Kapila and Andres Freund
Author: Masahiko Sawada, heavily revised by Andres Freund
Discussion: CAEepm=3fWAbWryVW9swHyLTY4sXVf0xbLvXqOwUoDiNCx9mBjQ@mail.gmail.com
Backpatch: -
2016-07-18 02:01:13 -07:00
Tom Lane 4d042999f9 Print a given subplan only once in EXPLAIN.
We have, for a very long time, allowed the same subplan (same member of the
PlannedStmt.subplans list) to be referenced by more than one SubPlan node;
this avoids problems for cases such as subplans within an IndexScan's
indxqual and indxqualorig fields.  However, EXPLAIN had not gotten the memo
and would print each reference as though it were an independent identical
subplan.  To fix, track plan_ids of subplans we've printed and don't print
the same plan_id twice.  Per report from Pavel Stehule.

BTW: the particular case of IndexScan didn't cause visible duplication
in a plain EXPLAIN, only EXPLAIN ANALYZE, because in the former case we
short-circuit executor startup before the indxqual field is processed by
ExecInitExpr.  That seems like it could easily lead to other EXPLAIN
problems in future, but it's not clear how to avoid it without breaking
the "EXPLAIN a plan using hypothetical indexes" use-case.  For now I've
left that issue alone.

Although this is a longstanding bug, it's purely cosmetic (no great harm
is done by the repeat printout) and we haven't had field complaints before.
So I'm hesitant to back-patch it, especially since there is some small risk
of ABI problems due to the need to add a new field to ExplainState.

In passing, rearrange order of fields in ExplainState to be less random,
and update some obsolete comments about when/where to initialize them.

Report: <CAFj8pRAimq+NK-menjt+3J4-LFoodDD8Or6=Lc_stcFD+eD4DA@mail.gmail.com>
2016-07-11 18:14:29 -04:00
Robert Haas 10c0558ffe Fix several mistakes around parallel workers and client_encoding.
Previously, workers sent data to the leader using the client encoding.
That mostly worked, but the leader the converted the data back to the
server encoding.  Since not all encoding conversions are reversible,
that could provoke failures.  Fix by using the database encoding for
all communication between worker and leader.

Also, while temporary changes to GUC settings, as from the SET clause
of a function, are in general OK for parallel query, changing
client_encoding this way inside of a parallel worker is not OK.
Previously, that would have confused the leader; with these changes,
it would not confuse the leader, but it wouldn't do anything either.
So refuse such changes in parallel workers.

Also, the previous code naively assumed that when it received a
NotifyResonse from the worker, it could pass that directly back to the
user.  But now that worker-to-leader communication always uses the
database encoding, that's clearly no longer correct - though,
actually, the old way was always broken for V2 clients.  So
disassemble and reconstitute the message instead.

Issues reported by Peter Eisentraut.  Patch by me, reviewed by
Peter Eisentraut.
2016-06-30 18:35:32 -04:00
Tom Lane 8ebb69f854 Fix some infelicities in EXPLAIN output for parallel query plans.
In non-text output formats, parallelized aggregates were reporting
"Partial" or "Finalize" as a field named "Operation", which might be all
right in the absence of any context --- but other plan node types use that
field to report SQL-visible semantics, such as Select/Insert/Update/Delete.
So that naming choice didn't seem good to me.  I changed it to "Partial
Mode".

Also, the field did not appear at all for a non-parallelized Agg plan node,
which is contrary to expectation in non-text formats.  We're notionally
producing objects that conform to a schema, so the set of fields for a
given node type and EXPLAIN mode should be well-defined.  I set it up to
fill in "Simple" in such cases.

Other fields that were added for parallel query, namely "Parallel Aware"
and Gather's "Single Copy", had not gotten the word on that point either.
Make them appear always in non-text output.

Also, the latter two fields were nominally producing boolean output, but
were getting it wrong, because bool values shouldn't be quoted in JSON or
YAML.  Somehow we'd not needed an ExplainPropertyBool formatting subroutine
before 9.6; but now we do, so invent it.

Discussion: <16002.1466972724@sss.pgh.pa.us>
2016-06-29 18:51:20 -04:00
Tom Lane 874fe3aea1 Fix CREATE MATVIEW/CREATE TABLE AS ... WITH NO DATA to not plan the query.
Previously, these commands always planned the given query and went through
executor startup before deciding not to actually run the query if WITH NO
DATA is specified.  This behavior is problematic for pg_dump because it
may cause errors to be raised that we would rather not see before a
REFRESH MATERIALIZED VIEW command is issued.  See for example bug #13907
from Marian Krucina.  This change is not sufficient to fix that particular
bug, because we also need to tweak pg_dump to issue the REFRESH later,
but it's a necessary step on the way.

A user-visible side effect of doing things this way is that the returned
command tag for WITH NO DATA cases will now be "CREATE MATERIALIZED VIEW"
or "CREATE TABLE AS", not "SELECT 0".  We could preserve the old behavior
but it would take more code, and arguably that was just an implementation
artifact not intended behavior anyhow.

In 9.5 and HEAD, also get rid of the static variable CreateAsReladdr, which
was trouble waiting to happen; there is not any prohibition on nested
CREATE commands.

Back-patch to 9.3 where CREATE MATERIALIZED VIEW was introduced.

Michael Paquier and Tom Lane

Report: <20160202161407.2778.24659@wrigleys.postgresql.org>
2016-06-27 15:57:50 -04:00
Tom Lane 19e972d558 Rethink node-level representation of partial-aggregation modes.
The original coding had three separate booleans representing partial
aggregation behavior, which was confusing, unreadable, and error-prone,
not least because the booleans weren't always listed in the same order.
It was also inadequate for the allegedly-desirable future extension to
support intermediate partial aggregation, because we'd need separate
markers for serialization and deserialization in such a case.

Merge these bools into an enum "AggSplit" to provide symbolic names for
the supported operating modes (and document what those are).  By assigning
the values of the enum constants carefully, we can treat AggSplit values
as options bitmasks so that tests of what to do aren't noticeably more
expensive than before.

While at it, get rid of Aggref.aggoutputtype.  That's not needed since
commit 59a3795c2 got rid of setrefs.c's special-purpose Aggref comparison
code, and it likewise seemed more confusing than helpful.

Assorted comment cleanup as well (there's still more that I want to do
in that line).

catversion bump for change in Aggref node contents.  Should be the last
one for partial-aggregation changes.

Discussion: <29309.1466699160@sss.pgh.pa.us>
2016-06-26 14:33:38 -04:00
Tom Lane f8ace5477e Fix type-safety problem with parallel aggregate serial/deserialization.
The original specification for this called for the deserialization function
to have signature "deserialize(serialtype) returns transtype", which is a
security violation if transtype is INTERNAL (which it always would be in
practice) and serialtype is not (which ditto).  The patch blithely overrode
the opr_sanity check for that, which was sloppy-enough work in itself,
but the indisputable reason this cannot be allowed to stand is that CREATE
FUNCTION will reject such a signature and thus it'd be impossible for
extensions to create parallelizable aggregates.

The minimum fix to make the signature type-safe is to add a second, dummy
argument of type INTERNAL.  But to lock it down a bit more and make misuse
of INTERNAL-accepting functions less likely, let's get rid of the ability
to specify a "serialtype" for an aggregate and just say that the only
useful serialtype is BYTEA --- which, in practice, is the only interesting
value anyway, due to the usefulness of the send/recv infrastructure for
this purpose.  That means we only have to allow "serialize(internal)
returns bytea" and "deserialize(bytea, internal) returns internal" as
the signatures for these support functions.

In passing fix bogus signature of int4_avg_combine, which I found thanks
to adding an opr_sanity check on combinefunc signatures.

catversion bump due to removing pg_aggregate.aggserialtype and adjusting
signatures of assorted built-in functions.

David Rowley and Tom Lane

Discussion: <27247.1466185504@sss.pgh.pa.us>
2016-06-22 16:52:41 -04:00
Robert Haas ede62e56fb Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.
If you really want to vacuum every single page in the relation,
regardless of apparent visibility status or anything else, you can use
this option.  In previous releases, this behavior could be achieved
using VACUUM (FREEZE), but because we can now recognize all-frozen
pages as not needing to be frozen again, that no longer works.  There
should be no need for routine use of this option, but maybe bugs or
disaster recovery will necessitate its use.

Patch by me, reviewed by Andres Freund.
2016-06-17 15:48:57 -04:00
Robert Haas 38e9f90a22 Fix lazy_scan_heap so that it won't mark pages all-frozen too soon.
Commit a892234f83 added a new bit per
page to the visibility map fork indicating whether the page is
all-frozen, but incorrectly assumed that if lazy_scan_heap chose to
freeze a tuple then that tuple would not need to later be frozen
again. This turns out to be false, because xmin and xmax (and
conceivably xvac, if dealing with tuples from very old releases) could
be frozen at separate times.

Thanks to Andres Freund for help in uncovering and tracking down this
issue.
2016-06-15 14:30:06 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Peter Eisentraut 5c6d2a5e7c Message style and wording fixes 2016-06-07 14:18:55 -04:00
Tom Lane f64340e743 Don't reset changes_since_analyze after a selective-columns ANALYZE.
If we ANALYZE only selected columns of a table, we should not postpone
auto-analyze because of that; other columns may well still need stats
updates.  As committed, the counter is left alone if a column list is
given, whether or not it includes all analyzable columns of the table.
Per complaint from Tomasz Ostrowski.

It's been like this a long time, so back-patch to all supported branches.

Report: <ef99c1bd-ff60-5f32-2733-c7b504eb960c@ato.waw.pl>
2016-06-06 17:44:17 -04:00
Robert Haas c6dbf1fe79 Stop the executor if no more tuples can be sent from worker to leader.
If a Gather node has read as many tuples as it needs (for example, due
to Limit) it may detach the queue connecting it to the worker before
reading all of the worker's tuples.  Rather than let the worker
continue to generate and send all of the results, have it stop after
sending the next tuple.

More could be done here to stop the worker even quicker, but this is
about as well as we can hope to do for 9.6.

This is in response to a problem report from Andreas Seltenreich.
Commit 44339b892a should be actually be
sufficient to fix that example even without this change, but it seems
better to do this, too, since we might otherwise waste quite a large
amount of effort in one or more workers.

Discussion: CAA4eK1KOKGqmz9bGu+Z42qhRwMbm4R5rfnqsLCNqFs9j14jzEA@mail.gmail.com

Amit Kapila
2016-06-06 14:52:58 -04:00
Robert Haas 6436a853f1 Fix comment to be more accurate.
Now that we skip vacuuming all-frozen pages, this comment needs
updating.

Masahiko Sawada
2016-06-03 11:56:57 -04:00
Greg Stark e1623c3959 Fix various common mispellings.
Mostly these are just comments but there are a few in documentation
and a handful in code and tests. Hopefully this doesn't cause too much
unnecessary pain for backpatching. I relented from some of the most
common like "thru" for that reason. The rest don't seem numerous
enough to cause problems.

Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
2016-06-03 16:08:45 +01:00
Robert Haas fdfaccfa79 Cosmetic improvements to freeze map code.
Per post-commit review comments from Andres Freund, improve variable
names, comments, and in one place, slightly improve the code structure.

Masahiko Sawada
2016-06-03 08:43:41 -04:00
Tom Lane 83dbde94f7 Fix DROP ACCESS METHOD IF EXISTS.
The IF EXISTS option was documented, and implemented in the grammar, but
it didn't actually work for lack of support in does_not_exist_skipping().
Per bug #14160.

Report and patch by Kouhei Sutou

Report: <20160527070433.19424.81712@wrigleys.postgresql.org>
2016-05-27 11:03:18 -04:00
Tom Lane 2d2e40e3be Fetch XIDs atomically during vac_truncate_clog().
Because vac_update_datfrozenxid() updates datfrozenxid and datminmxid
in-place, it's unsafe to assume that successive reads of those values will
give consistent results.  Fetch each one just once to ensure sane behavior
in the minimum calculation.  Noted while reviewing Alexander Korotkov's
patch in the same area.

Discussion: <8564.1464116473@sss.pgh.pa.us>
2016-05-24 15:47:51 -04:00
Tom Lane 996d273978 Avoid consuming an XID during vac_truncate_clog().
vac_truncate_clog() uses its own transaction ID as the comparison point in
a sanity check that no database's datfrozenxid has already wrapped around
"into the future".  That was probably fine when written, but in a lazy
vacuum we won't have assigned an XID, so calling GetCurrentTransactionId()
causes an XID to be assigned when otherwise one would not be.  Most of the
time that's not a big problem ... but if we are hard up against the
wraparound limit, consuming XIDs during antiwraparound vacuums is a very
bad thing.

Instead, use ReadNewTransactionId(), which not only avoids this problem
but is in itself a better comparison point to test whether wraparound
has already occurred.

Report and patch by Alexander Korotkov.  Back-patch to all versions.

Report: <CAPpHfdspOkmiQsxh-UZw2chM6dRMwXAJGEmmbmqYR=yvM7-s6A@mail.gmail.com>
2016-05-24 15:20:36 -04:00
Stephen Frost a89505fd21 Remove various special checks around default roles
Default roles really should be like regular roles, for the most part.
This removes a number of checks that were trying to make default roles
extra special by not allowing them to be used as regular roles.

We still prevent users from creating roles in the "pg_" namespace or
from altering roles which exist in that namespace via ALTER ROLE, as
we can't preserve such changes, but otherwise the roles are very much
like regular roles.

Based on discussion with Robert and Tom.
2016-05-06 14:06:50 -04:00
Kevin Grittner a343e223a5 Revert no-op changes to BufferGetPage()
The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature.  Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming).  The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.

This change should have little or no effect on generated executable
code.

Motivated by the back-patching pain of Tom Lane and Robert Haas
2016-04-20 08:31:19 -05:00
Tom Lane c34df8a003 Disallow creation of indexes on system columns (except for OID).
Although OID acts pretty much like user data, the other system columns do
not, so an index on one would likely misbehave.  And it's pretty hard to
see a use-case for one, anyway.  Let's just forbid the case rather than
worry about whether it should be supported.

David Rowley
2016-04-16 12:11:41 -04:00
Tom Lane 8f1911d5e6 Fix possible crash in ALTER TABLE ... REPLICA IDENTITY USING INDEX.
Careless coding added by commit 07cacba983 could result in a crash
or a bizarre error message if someone tried to select an index on the
OID column as the replica identity index for a table.  Back-patch to 9.4
where the feature was introduced.

Discussion: CAKJS1f8TQYgTRDyF1_u9PVCKWRWz+DkieH=U7954HeHVPJKaKg@mail.gmail.com

David Rowley
2016-04-15 12:11:40 -04:00
Robert Haas 5702277ca9 Tweak EXPLAIN for parallel query to show workers launched.
The previous display was sort of confusing, because it didn't
distinguish between the number of workers that we planned to launch
and the number that actually got launched.  This has already confused
several people, so display both numbers and label them clearly.

Julien Rouhaud, reviewed by me.
2016-04-15 11:52:18 -04:00
Tom Lane 92a30a7eb0 Fix broken dependency-mongering for index operator classes/families.
For a long time, opclasscmds.c explained that "we do not create a
dependency link to the AM [for an opclass or opfamily], because we don't
currently support DROP ACCESS METHOD".  Commit 473b932870 invented
DROP ACCESS METHOD, but it batted only 1 for 2 on adding the dependency
links, and 0 for 2 on updating the comments about the topic.

In passing, undo the same commit's entirely inappropriate decision to
blow away an existing index as a side-effect of create_am.sql.
2016-04-13 23:33:31 -04:00
Stephen Frost bfed4ab824 Disallow SET SESSION AUTHORIZATION pg_*
As part of reserving the pg_* namespace for default roles and in line
with SET ROLE and other previous efforts, disallow settings the role
to a default/reserved role using SET SESSION AUTHORIZATION.

These checks and restrictions on what is allowed regarding default /
reserved roles are under debate, but it seems prudent to ensure that
the existing checks at least cover the intended cases while the
debate rages on.  On me to clean it up if the consensus decision is
to remove these checks.
2016-04-13 21:31:24 -04:00
Alvaro Herrera bd905a0d04 Fix possible NULL dereference in ExecAlterObjectDependsStmt
I used the wrong variable here.  Doesn't make a difference today because
the only plausible caller passes a non-NULL variable, but someday it
will be wrong, and even today's correctness is subtle: the caller that
does pass a NULL is never invoked because of object type constraints.
Surely not a condition to rely on.

Noted by Coverity
2016-04-10 11:03:35 -03:00
Stephen Frost 293007898d Reserve the "pg_" namespace for roles
This will prevent users from creating roles which begin with "pg_" and
will check for those roles before allowing an upgrade using pg_upgrade.

This will allow for default roles to be provided at initdb time.

Reviews by José Luis Tallón and Robert Haas
2016-04-08 16:56:27 -04:00
Kevin Grittner 848ef42bb8 Add the "snapshot too old" feature
This feature is controlled by a new old_snapshot_threshold GUC.  A
value of -1 disables the feature, and that is the default.  The
value of 0 is just intended for testing.  Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect.  The xmin associated with a transaction ID does
still protect dead tuples.  A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.

This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
2016-04-08 14:36:30 -05:00
Kevin Grittner 8b65cf4c5e Modify BufferGetPage() to prepare for "snapshot too old" feature
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in.  It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point.  This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third.  The follow-on patch will change the places where the test
needs to be made.
2016-04-08 14:30:10 -05:00
Teodor Sigaev 8b99edefca Revert CREATE INDEX ... INCLUDING ...
It's not ready yet, revert two commits
690c543550 - unstable test output
386e3d7609 - patch itself
2016-04-08 21:52:13 +03:00
Teodor Sigaev 386e3d7609 CREATE INDEX ... INCLUDING (column[, ...])
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.

Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
2016-04-08 19:45:59 +03:00
Peter Eisentraut 339025c68f Replace printf format %i by %d
see also ce8d7bb644
2016-04-08 12:42:58 -04:00
Tom Lane 93c301fc4f Fix multiple bugs in tablespace symlink removal.
Don't try to examine S_ISLNK(st.st_mode) after a failed lstat().
It's undefined.

Also, if the lstat() reported ENOENT, we do not wish that to be a hard
error, but the code might nonetheless treat it as one (giving an entirely
misleading error message, too) depending on luck-of-the-draw as to what
S_ISLNK() returned.

Don't throw error for ENOENT from rmdir(), either.  (We're not really
expecting ENOENT because we just stat'd the file successfully; but
if we're going to allow ENOENT in the symlink code path, surely the
directory code path should too.)

Generate an appropriate errcode for its-the-wrong-type-of-file complaints.
(ERRCODE_SYSTEM_ERROR doesn't seem appropriate, and failing to write
errcode() around it certainly doesn't work, and not writing an errcode
at all is not per project policy.)

Valgrind noticed the undefined S_ISLNK result; the other problems emerged
while reading the code in the area.

All of this appears to have been introduced in 8f15f74a44.
Back-patch to 9.5 where that commit appeared.
2016-04-08 12:31:53 -04:00
Tom Lane de94e2af18 Run pgindent on a batch of (mostly-planner-related) source files.
Getting annoyed at the amount of unrelated chatter I get from pgindent'ing
Rowley's unique-joins patch.  Re-indent all the files it touches.
2016-04-06 11:34:02 -04:00
Alvaro Herrera f2fcad27d5 Support ALTER THING .. DEPENDS ON EXTENSION
This introduces a new dependency type which marks an object as depending
on an extension, such that if the extension is dropped, the object
automatically goes away; and also, if the database is dumped, the object
is included in the dump output.  Currently the grammar supports this for
indexes, triggers, materialized views and functions only, although the
utility code is generic so adding support for more object types is a
matter of touching the parser rules only.

Author: Abhijit Menon-Sen
Reviewed-by: Alexander Korotkov, Álvaro Herrera
Discussion: http://www.postgresql.org/message-id/20160115062649.GA5068@toroid.org
2016-04-05 18:38:54 -03:00
Robert Haas 41ea0c2376 Fix parallel-safety code for parallel aggregation.
has_parallel_hazard() was ignoring the proparallel markings for
aggregates, which is no good.  Fix that.  There was no way to mark
an aggregate as actually being parallel-safe, either, so add a
PARALLEL option to CREATE AGGREGATE.

Patch by me, reviewed by David Rowley.
2016-04-05 16:06:15 -04:00
Tom Lane 3c69b33f45 Add a few comments about ANALYZE's strategy for collecting MCVs.
Alex Shulgin complained that the underlying strategy wasn't all that
apparent, particularly not the fact that we intentionally have two
code paths depending on whether we think the column has a limited set
of possible values or not.  Try to make it clearer.
2016-04-04 17:06:33 -04:00
Tom Lane 391159e03a Partially revert commit 3d3bf62f30.
On reflection, the pre-existing logic in ANALYZE is specifically meant to
compare the frequency of a candidate MCV against the estimated frequency of
a random distinct value across the whole table.  The change to compare it
against the average frequency of values actually seen in the sample doesn't
seem very principled, and if anything it would make us less likely not more
likely to consider a value an MCV.  So revert that, but keep the aspect of
considering only nonnull values, which definitely is correct.

In passing, rename the local variables in these stanzas to
"ndistinct_table", to avoid confusion with the "ndistinct" that appears at
an outer scope in compute_scalar_stats.
2016-04-04 16:48:13 -04:00
Tom Lane 3d3bf62f30 Omit null rows when setting the threshold for what's a most-common value.
As with the previous patch, large numbers of null rows could skew this
calculation unfavorably, causing us to discard values that have a
legitimate claim to be MCVs, since our definition of MCV is that it's
most common among the non-null population of the column.  Hence, make
the numerator of avgcount be the number of non-null sample values not
the number of sample rows; likewise for maxmincount in the
compute_scalar_stats variant.

Also, make the denominator be the number of distinct values actually
observed in the sample, rather than reversing it back out of the computed
stadistinct.  This avoids depending on the accuracy of the Haas-Stokes
approximation, and really it's what we want anyway; the threshold should
depend only on what we see in the sample, not on what we extrapolate
about the contents of the whole column.

Alex Shulgin, reviewed by Tomas Vondra and myself
2016-04-01 17:03:27 -04:00
Tom Lane be4b4dc759 Omit null rows when applying the Haas-Stokes estimator for ndistinct.
Previously, we included null rows in the values of n and N that went
into the formula, which amounts to considering null as a value in its
own right; but the d and f1 values do not include nulls.  This is
inconsistent, and it contributes to significant underestimation of
ndistinct when the column is mostly nulls.  In any case stadistinct
is defined as the number of distinct non-null values, so we should
exclude nulls when doing this computation.

This is an aboriginal bug in our application of the Haas-Stokes formula,
but we'll refrain from back-patching for fear of destabilizing plan
choices in released branches.

While at it, make the code a bit more readable by omitting unnecessary
casts and intermediate variables.

Observation and original patch by Tomas Vondra, adjusted to fix both
uses of the formula by Alex Shulgin, cosmetic improvements by me
2016-04-01 15:48:24 -04:00
Alvaro Herrera f402b99501 Type names should not be quoted
Our actual convention, contrary to what I said in 59a2111b23, is not to
quote type names, as evidenced by unquoted use of format_type_be()
result value in error messages.  Remove quotes from recently tweaked
messages accordingly.

Per note from Tom Lane
2016-04-01 13:35:48 -03:00
Robert Haas 5fe5a2cee9 Allow aggregate transition states to be serialized and deserialized.
This is necessary infrastructure for supporting parallel aggregation
for aggregates whose transition type is "internal".  Such values
can't be passed between cooperating processes, because they are
just pointers.

David Rowley, reviewed by Tomas Vondra and by me.
2016-03-29 15:04:05 -04:00
Robert Haas f9143d102f Rework custom scans to work more like the new extensible node stuff.
Per discussion, the new extensible node framework is thought to be
better designed than the custom path/scan/scanstate stuff we added
in PostgreSQL 9.5.  Rework the latter to be more like the former.

This is not backward-compatible, but we generally don't promise that
for C APIs, and there probably aren't many people using this yet
anyway.

KaiGai Kohei, reviewed by Petr Jelinek and me.  Some further
cosmetic changes by me.
2016-03-29 11:28:04 -04:00
Alvaro Herrera 59a2111b23 Improve internationalization of messages involving type names
Change the slightly different variations of the message
  function FOO must return type BAR
to a single wording, removing the variability in type name so that they
all create a single translation entry; since the type name is not to be
translated, there's no point in it being part of the message anyway.

Also, change them all to use the same quoting convention, namely that
the function name is not to be quoted but the type name is.  (I'm not
quite sure why this is so, but it's the clear majority.)

Some similar messages such as "encoding conversion function FOO must ..."
are also changed.
2016-03-28 14:24:37 -03:00
Tom Lane c94959d411 Fix DROP OPERATOR to reset oprcom/oprnegate links to the dropped operator.
This avoids leaving dangling links in pg_operator; which while fairly
harmless are also unsightly.

While we're at it, simplify OperatorUpd, which went through
heap_modify_tuple for no very good reason considering it had already made
a tuple copy it could just scribble on.

Roma Sokolov, reviewed by Tomas Vondra, additional hacking by Robert Haas
and myself.
2016-03-25 12:33:16 -04:00
Tom Lane a376960c8f Suppress compiler warning for get_am_type_string().
Compilers that don't know that elog(ERROR) doesn't return complained
that this function might fail to return a value.  Per buildfarm.

While at it, const-ify the function's declaration, since the intent
is evidently to always return a constant string.
2016-03-24 17:22:24 -04:00
Alvaro Herrera 473b932870 Support CREATE ACCESS METHOD
This enables external code to create access methods.  This is useful so
that extensions can add their own access methods which can be formally
tracked for dependencies, so that DROP operates correctly.  Also, having
explicit support makes pg_dump work correctly.

Currently only index AMs are supported, but we expect different types to
be added in the future.

Authors: Alexander Korotkov, Petr Jelínek
Reviewed-By: Teodor Sigaev, Petr Jelínek, Jim Nasby
Commitfest-URL: https://commitfest.postgresql.org/9/353/
Discussion: https://www.postgresql.org/message-id/CAPpHfdsXwZmojm6Dx+TJnpYk27kT4o7Ri6X_4OSWcByu1Rm+VA@mail.gmail.com
2016-03-23 23:01:35 -03:00
Simon Riggs 8320c625d9 Change comment to describe correct lock level used 2016-03-23 11:32:34 +00:00
Robert Haas 0bf3ae88af Directly modify foreign tables.
postgres_fdw can now sent an UPDATE or DELETE statement directly to
the foreign server in simple cases, rather than sending a SELECT FOR
UPDATE statement and then updating or deleting rows one-by-one.

Etsuro Fujita, reviewed by Rushabh Lathia, Shigeru Hanada, Kyotaro
Horiguchi, Albe Laurenz, Thom Brown, and me.
2016-03-18 13:55:52 -04:00
Tom Lane bd0ab28912 Remove useless double calls of make_parsestate().
Aleksander Alekseev
2016-03-17 16:46:35 -04:00
Robert Haas bc55cc0b6a Fix problems in commit c16dc1aca5.
Vinayak Pokale provided a patch for a copy-and-paste error in a
comment.  I noticed that I'd use the word "automatically" nearby where
I meant to talk about things being "atomic".  Rahila Syed spotted a
misplaced counter update.  Fix all that stuff.
2016-03-16 13:54:04 -04:00
Robert Haas c16dc1aca5 Add simple VACUUM progress reporting.
There's a lot more that could be done here yet - in particular, this
reports only very coarse-grained information about the index vacuuming
phase - but even as it stands, the new pg_stat_progress_vacuum can
tell you quite a bit about what a long-running vacuum is actually
doing.

Amit Langote and Robert Haas, based on earlier work by Vinayak Pokale
and Rahila Syed.
2016-03-15 13:32:56 -04:00
Robert Haas 270b7daf5c Fix EXPLAIN ANALYZE SELECT INTO not to choose a parallel plan.
We don't support any parallel write operations at present, so choosing
a parallel plan causes us to error out.  Also, add a new regression
test that uses EXPLAIN ANALYZE SELECT INTO; if we'd had this previously,
force_parallel_mode testing would have caught this issue.

Mithun Cy and Robert Haas
2016-03-14 19:48:46 -04:00
Tom Lane 23a27b039d Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout.  Some of these values were declared uint32 before, and
others "long".

I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.

The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command.  Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.

Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long".  It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.

Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-12 16:05:29 -05:00
Robert Haas fd31cd2651 Don't vacuum all-frozen pages.
Commit a892234f83 gave us enough
infrastructure to avoid vacuuming pages where every tuple on the
page is already frozen.  So, replace the notion of a scan_all or
whole-table vacuum with the less onerous notion of an "aggressive"
vacuum, which will pages that are all-visible, but still skip those
that are all-frozen.

This should greatly reduce the cost of anti-wraparound vacuuming
on large clusters where the majority of data is never touched
between one cycle and the next, because we'll no longer have to
read all of those pages only to find out that we don't need to
do anything with them.

Patch by me, reviewed by Masahiko Sawada.
2016-03-10 16:14:42 -05:00
Tom Lane 364a9f47ab Refactor pull_var_clause's API to make it less tedious to extend.
In commit 1d97c19a0f and later c1d9579dd8, we extended
pull_var_clause's API by adding enum-type arguments.  That's sort of a pain
to maintain, though, because it means every time we add a new behavior we
must touch every last one of the call sites, even if there's a reasonable
default behavior that most of them could use.  Let's switch over to using a
bitmask of flags, instead; that seems more maintainable and might save a
nanosecond or two as well.  This commit changes no behavior in itself,
though I'm going to follow it up with one that does add a new behavior.

In passing, remove flatten_tlist(), which has not been used since 9.1
and would otherwise need the same API changes.

Removing these enums means that optimizer/tlist.h no longer needs to
depend on optimizer/var.h.  Changing that caused a number of C files to
need addition of #include "optimizer/var.h" (probably we can thank old
runs of pgrminclude for that); but on balance it seems like a good change
anyway.
2016-03-10 15:53:07 -05:00
Robert Haas be060cbcd4 Re-pgindent vacuumlazy.c. 2016-03-09 13:51:11 -05:00
Robert Haas b6fb6471f6 Add a generic command progress reporting facility.
Using this facility, any utility command can report the target relation
upon which it is operating, if there is one, and up to 10 64-bit
counters; the intent of this is that users should be able to figure out
what a utility command is doing without having to resort to ugly hacks
like attaching strace to a backend.

As a demonstration, this adds very crude reporting to lazy vacuum; we
just report the target relation and nothing else.  A forthcoming patch
will make VACUUM report a bunch of additional data that will make this
much more interesting.  But this gets the basic framework in place.

Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by
Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao,
and Masanori Oyama.
2016-03-09 12:08:58 -05:00
Robert Haas dcfecaae9e Fix parallel query on standby servers.
Without this fix, it inevitably bombs out with "ERROR:  failed to
initialize transaction_read_only to 0".  Repair.

Ashutosh Sharma; comments adjusted by me.
2016-03-08 10:27:03 -05:00
Robert Haas 77a1d1e798 Department of second thoughts: remove PD_ALL_FROZEN.
Commit a892234f83 added a second bit per
page to the visibility map, which still seems like a good idea, but it
also added a second page-level bit alongside PD_ALL_VISIBLE to track
whether the visibility map bit was set.  That no longer seems like a
clever plan, because we don't really need that bit for anything.  We
always clear both bits when the page is modified anyway.

Patch by me, reviewed by Kyotaro Horiguchi and Masahiko Sawada.
2016-03-08 08:46:48 -05:00
Robert Haas a892234f83 Change the format of the VM fork to add a second bit per page.
The new bit indicates whether every tuple on the page is already frozen.
It is cleared only when the all-visible bit is cleared, and it can be
set only when we vacuum a page and find that every tuple on that page is
both visible to every transaction and in no need of any future
vacuuming.

A future commit will use this new bit to optimize away full-table scans
that would otherwise be triggered by XID wraparound considerations.  A
page which is merely all-visible must still be scanned in that case, but
a page which is all-frozen need not be.  This commit does not attempt
that optimization, although that optimization is the goal here.  It
seems better to get the basic infrastructure in place first.

Per discussion, it's very desirable for pg_upgrade to automatically
migrate existing VM forks from the old format to the new format.  That,
too, will be handled in a follow-on patch.

Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit
Kapila, Simon Riggs, Andres Freund, and others, and substantially
revised by me.
2016-03-01 21:49:41 -05:00
Robert Haas 7bea19d0a9 On second thought, disable parallelism for prepared statements.
CREATE TABLE .. AS EXECUTE can turn an apparently read-only query into
a write operation, which parallel query can't handle.  It's a bit of a
shame that requires us to avoid parallel query for queries prepared via
PREPARE in all cases, but for right now it does.
2016-02-26 16:33:37 +05:30
Robert Haas 57a6a72b6b Enable parallelism for prepared statements and extended query protocol.
Parallel query can't handle running a query only partially rather than
to completion.  However, there seems to be no way to run a statement
prepared via SQL PREPARE other than to completion, so we can enable it
there without a problem.

The situation is more complicated for the extend query protocol.
libpq seems to provide no way to send an Execute message with a
non-zero rowcount, but some other client might.  If that happens, and
a parallel plan was chosen, we'll execute the parallel plan without
using any workers, which may be somewhat inefficient but should still
work.  Hopefully this won't be a problem; users can always set
max_parallel_degree=0 to avoid choosing parallel plans in the first
place.

Amit Kapila, reviewed by me.
2016-02-25 13:02:18 +05:30
Fujii Masao 31b6606c48 Make concurrent refresh check early that there is a unique index on matview.
In REFRESH MATERIALIZED VIEW command, CONCURRENTLY option is only
allowed if there is at least one unique index with no WHERE clause on
one or more columns of the matview. Previously, concurrent refresh
checked the existence of a unique index on the matview after filling
the data to new snapshot, i.e., after calling refresh_matview_datafill().
So, when there was no unique index, we could need to wait a long time
before we detected that and got the error. It was a waste of time.

To eliminate such wasting time, this commit changes concurrent refresh
so that it checks the existence of a unique index at the beginning of
the refresh operation, i.e., before starting any time-consuming jobs.
If CONCURRENTLY option is not allowed due to lack of a unique index,
concurrent refresh can immediately detect it and emit an error.

Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Fujii Masao
2016-02-16 02:15:44 +09:00
Tom Lane 72eee410d4 Move pg_constraint.h function declarations to new file pg_constraint_fn.h.
A pending patch requires exporting a function returning Bitmapset from
catalog/pg_constraint.c.  As things stand, that would mean including
nodes/bitmapset.h in pg_constraint.h, which might be hazardous for the
client-side includability of that header.  It's not entirely clear whether
any client-side code needs to include pg_constraint.h, but it seems prudent
to assume that there is some such code somewhere.  Therefore, split off the
function definitions into a new file pg_constraint_fn.h, similarly to what
we've done for some other catalog header files.
2016-02-11 15:51:28 -05:00
Robert Haas 7c944bd903 Introduce a new GUC force_parallel_mode for testing purposes.
When force_parallel_mode = true, we enable the parallel mode restrictions
for all queries for which this is believed to be safe.  For the subset of
those queries believed to be safe to run entirely within a worker, we spin
up a worker and run the query there instead of running it in the
original process.  When force_parallel_mode = regress, make additional
changes to allow the regression tests to run cleanly even though parallel
workers have been injected under the hood.

Taken together, this facilitates both better user testing and better
regression testing of the parallelism code.

Robert Haas, with help from Amit Kapila and Rushabh Lathia.
2016-02-07 11:41:33 -05:00
Robert Haas 7191ce8bea Make all built-in lwlock tranche IDs fixed.
This makes the values more stable, which seems like a good thing for
anybody who needs to look at at them.

Alexander Korotkov and Amit Kapila
2016-02-02 06:45:55 -05:00
Robert Haas a7de3dc5c3 Support multi-stage aggregation.
Aggregate nodes now have two new modes: a "partial" mode where they
output the unfinalized transition state, and a "finalize" mode where
they accept unfinalized transition states rather than individual
values as input.

These new modes are not used anywhere yet, but they will be necessary
for parallel aggregation.  The infrastructure also figures to be
useful for cases where we want to aggregate local data and remote
data via the FDW interface, and want to bring back partial aggregates
from the remote side that can then be combined with locally generated
partial aggregates to produce the final value.  It may also be useful
even when neither FDWs nor parallelism are in play, as explained in
the comments in nodeAgg.c.

David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
Linnakangas, Haribabu Kommi, and me.
2016-01-20 13:46:50 -05:00
Tom Lane 65c5fcd353 Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-17 19:36:59 -05:00
Tom Lane f47b602df8 Fix bogus lock release in RemovePolicyById and RemoveRoleFromObjectPolicy.
Can't release the AccessExclusiveLock on the target table until commit.
Otherwise there is a race condition whereby other backends might service
our cache invalidation signals before they can actually see the updated
catalog rows.

Just to add insult to injury, RemovePolicyById was closing the rel (with
incorrect lock drop) and then passing the now-dangling rel pointer to
CacheInvalidateRelcache.  Probably the reason this doesn't fall over on
CLOBBER_CACHE buildfarm members is that some outer level of the DROP logic
is still holding the rel open ... but it'd have bit us on the arse
eventually, no doubt.
2016-01-03 20:53:35 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Tom Lane 0dab5ef39b Fix ALTER OPERATOR to update dependencies properly.
Fix an oversight in commit 321eed5f0f7563a0: replacing an operator's
selectivity functions needs to result in a corresponding update in
pg_depend.  We have a function that can handle that, but it was not
called by AlterOperator().

To fix this without enlarging pg_operator.h's #include list beyond
what clients can safely include, split off the function definitions
into a new file pg_operator_fn.h, similarly to what we've done for
some other catalog header files.  It's not entirely clear whether
any client-side code needs to include pg_operator.h, but it seems
prudent to assume that there is some such code somewhere.
2015-12-31 17:37:31 -05:00
Tom Lane e5d06f2b12 Dept of second thoughts: the !scan_all exit mustn't increase scanned_pages.
In the extreme edge case where contended pages are the only ones that
escape being scanned, the previous commit would have allowed us to think
that relfrozenxid could be advanced, which is exactly wrong.
2015-12-30 17:32:23 -05:00
Tom Lane e842908233 Avoid useless truncation attempts during VACUUM.
VACUUM can skip heap pages altogether when there's a run of consecutive
pages that are all-visible according to the visibility map.  This causes it
to not update its nonempty_pages count, just as if those pages were empty,
which means that at the end we will think they are candidates for deletion.
Thus, we may take the table's AccessExclusive lock only to find that no
pages are really truncatable.  This usually causes no real problems on a
master server, thanks to the lock being acquired only conditionally; but on
hot-standby servers, the same lock must be acquired unconditionally which
can result in unnecessary query cancellations.

To improve matters, force examination of the table's last page whenever
we reach there with a nonempty_pages count that would allow a truncation
attempt.  If it's not empty, we'll advance nonempty_pages and thereby
prevent the truncation attempt.

If we are unable to acquire cleanup lock on that page, there's no need to
force it, unless we're doing an anti-wraparound vacuum.  We can just check
for tuples with a shared buffer lock and then give up.  (When we are doing
an anti-wraparound vacuum, and decide it's okay to skip the page because it
contains no freezable tuples, this patch still improves matters because
nonempty_pages is properly updated, which it was not before.)

Since only the last page is special-cased in this way, we might attempt a
truncation that will release many fewer pages than the normal heuristic
would suggest; at worst, only one page would be truncated.  But that seems
all right, because the situation won't repeat during the next vacuum.
The real problem with the old logic is that the useless truncation attempt
happens every time we vacuum, so long as the state of the last few dozen
pages doesn't change.

This is a longstanding deficiency, but since the consequences aren't very
severe in most scenarios, I'm not going to risk a back-patch.

Jeff Janes and Tom Lane
2015-12-30 17:13:15 -05:00
Joe Conway 241448b23a Rename (new|old)estCommitTs to (new|old)estCommitTsXid
The variables newestCommitTs and oldestCommitTs sound as if they are
timestamps, but in fact they are the transaction Ids that correspond
to the newest and oldest timestamps rather than the actual timestamps.
Rename these variables to reflect that they are actually xids: to wit
newestCommitTsXid and oldestCommitTsXid respectively. Also modify
related code in a similar fashion, particularly the user facing output
emitted by pg_controldata and pg_resetxlog.

Complaint and patch by me, review by Tom Lane and Alvaro Herrera.
Backpatch to 9.5 where these variables were first introduced.
2015-12-28 12:34:11 -08:00
Tom Lane fec1ad94df Include typmod when complaining about inherited column type mismatches.
MergeAttributes() rejects cases where columns to be merged have the same
type but different typmod, which is correct; but the error message it
printed didn't show either typmod, which is unhelpful.  Changing this
requires using format_type_with_typemod() in place of TypeNameToString(),
which will have some minor side effects on the way some type names are
printed, but on balance this is an improvement: the old code sometimes
printed one type according to one set of rules and the other type according
to the other set, which could be confusing in its own way.

Oddly, there were no regression test cases covering any of this behavior,
so add some.

Complaint and fix by Amit Langote
2015-12-26 13:41:29 -05:00
Alvaro Herrera 756e7b4c9d Rework internals of changing a type's ownership
This is necessary so that REASSIGN OWNED does the right thing with
composite types, to wit, that it also alters ownership of the type's
pg_class entry -- previously, the pg_class entry remained owned by the
original user, which caused later other failures such as the new owner's
inability to use ALTER TYPE to rename an attribute of the affected
composite.  Also, if the original owner is later dropped, the pg_class
entry becomes owned by a non-existant user which is bogus.

To fix, create a new routine AlterTypeOwner_oid which knows whether to
pass the request to ATExecChangeOwner or deal with it directly, and use
that in shdepReassignOwner rather than calling AlterTypeOwnerInternal
directly.  AlterTypeOwnerInternal is now simpler in that it only
modifies the pg_type entry and recurses to handle a possible array type;
higher-level tasks are handled by either AlterTypeOwner directly or
AlterTypeOwner_oid.

I took the opportunity to add a few more objects to the test rig for
REASSIGN OWNED, so that more cases are exercised.  Additional ones could
be added for superuser-only-ownable objects (such as FDWs and event
triggers) but I didn't want to push my luck by adding a new superuser to
the tests on a backpatchable bug fix.

Per bug #13666 reported by Chris Pacejo.

Backpatch to 9.5.

(I would back-patch this all the way back, except that it doesn't apply
cleanly in 9.4 and earlier because 59367fdf9 wasn't backpatched.  If we
decide that we need this in earlier branches too, we should backpatch
both.)
2015-12-17 14:25:41 -03:00
Andres Freund f54d0629ec Fix ALTER TABLE ... SET TABLESPACE for unlogged relations.
Changing the tablespace of an unlogged relation did not WAL log the
creation and content of the init fork. Thus, after a standby is
promoted, unlogged relation cannot be accessed anymore, with errors
like:
ERROR:  58P01: could not open file "pg_tblspc/...": No such file or directory
Additionally the init fork was not synced to disk, independent of the
configured wal_level, a relatively small durability risk.

Investigation of that problem also brought to light that, even for
permanent relations, the creation of !main forks was not WAL logged,
i.e. no XLOG_SMGR_CREATE record were emitted. That mostly turns out not
to be a problem, because these files were created when the actual
relation data is copied; nonexistent files are not treated as an error
condition during replay. But that doesn't work for empty files, and
generally feels a bit haphazard. Luckily, outside init and main forks,
empty forks don't occur often or are not a problem.

Add the required WAL logging and syncing to disk.

Reported-By: Michael Paquier
Author: Michael Paquier and Andres Freund
Discussion: 20151210163230.GA11331@alap3.anarazel.de
Backpatch: 9.1, where unlogged relations were introduced
2015-12-12 14:17:39 +01:00
Stephen Frost 833728d4c8 Handle policies during DROP OWNED BY
DROP OWNED BY handled GRANT-based ACLs but was not removing roles from
policies.  Fix that by having DROP OWNED BY remove the role specified
from the list of roles the policy (or policies) apply to, or the entire
policy (or policies) if it only applied to the role specified.

As with ACLs, the DROP OWNED BY caller must have permission to modify
the policy or a WARNING is thrown and no change is made to the policy.
2015-12-11 16:12:25 -05:00
Stephen Frost ed8bec915e Handle dependencies properly in ALTER POLICY
ALTER POLICY hadn't fully considered partial policy alternation
(eg: change just the roles on the policy, or just change one of
the expressions) when rebuilding the dependencies.  Instead, it
would happily remove all dependencies which existed for the
policy and then only recreate the dependencies for the objects
referred to in the specific ALTER POLICY command.

Correct that by extracting and building the dependencies for all
objects referenced by the policy, regardless of if they were
provided as part of the ALTER POLICY command or were already in
place as part of the pre-existing policy.
2015-12-11 15:43:03 -05:00
Peter Eisentraut a351705d8a Improve some messages 2015-12-10 22:05:27 -05:00
Robert Haas b287df70e4 Allow EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics.
The original parallel sequential scan commit included only very limited
changes to the EXPLAIN output.  Aggregated totals from all workers were
displayed, but there was no way to see what each individual worker did
or to distinguish the effort made by the workers from the effort made by
the leader.

Per a gripe by Thom Brown (and maybe others).  Patch by me, reviewed
by Amit Kapila.
2015-12-09 13:21:19 -05:00
Teodor Sigaev 92e38182d7 COPY (INSERT/UPDATE/DELETE .. RETURNING ..)
Attached is a patch for being able to do COPY (query) without a CTE.

Author: Marko Tiikkaja
Review: Michael Paquier
2015-11-27 19:11:22 +03:00
Tom Lane 074c5cfbfb Fix handling of inherited check constraints in ALTER COLUMN TYPE (again).
The previous way of reconstructing check constraints was to do a separate
"ALTER TABLE ONLY tab ADD CONSTRAINT" for each table in an inheritance
hierarchy.  However, that way has no hope of reconstructing the check
constraints' own inheritance properties correctly, as pointed out in
bug #13779 from Jan Dirk Zijlstra.  What we should do instead is to do
a regular "ALTER TABLE", allowing recursion, at the topmost table that
has a particular constraint, and then suppress the work queue entries
for inherited instances of the constraint.

Annoyingly, we'd tried to fix this behavior before, in commit 5ed6546cf,
but we failed to notice that it wasn't reconstructing the pg_constraint
field values correctly.

As long as I'm touching pg_get_constraintdef_worker anyway, tweak it to
always schema-qualify the target table name; this seems like useful backup
to the protections installed by commit 5f173040.

In HEAD/9.5, get rid of get_constraint_relation_oids, which is now unused.
(I could alternatively have modified it to also return conislocal, but that
seemed like a pretty single-purpose API, so let's not pretend it has some
other use.)  It's unused in the back branches as well, but I left it in
place just in case some third-party code has decided to use it.

In HEAD/9.5, also rename pg_get_constraintdef_string to
pg_get_constraintdef_command, as the previous name did nothing to explain
what that entry point did differently from others (and its comment was
equally useless).  Again, that change doesn't seem like material for
back-patching.

I did a bit of re-pgindenting in tablecmds.c in HEAD/9.5, as well.

Otherwise, back-patch to all supported branches.
2015-11-20 14:55:47 -05:00
Robert Haas bc4996e61b Make ALTER .. SET SCHEMA do nothing, instead of throwing an ERROR.
This was already true for CREATE EXTENSION, but historically has not
been true for other object types.  Therefore, this is a backward
incompatibility.  Per discussion on pgsql-hackers, everyone seems to
agree that the new behavior is better.

Marti Raudsepp, reviewed by Haribabu Kommi and myself
2015-11-19 10:49:25 -05:00
Peter Eisentraut c5ec406412 Message style fix
from Euler Taveira
2015-11-17 06:53:07 -05:00
Peter Eisentraut 5db837d3f2 Message improvements 2015-11-16 21:39:23 -05:00
Robert Haas fe702a7b3f Move each SLRU's lwlocks to a separate tranche.
This makes it significantly easier to identify these lwlocks in
LWLOCK_STATS or Trace_lwlocks output.  It's also arguably better
from a modularity standpoint, since lwlock.c no longer needs to
know anything about the LWLock needs of the higher-level SLRU
facility.

Ildus Kurbangaliev, reviewd by Álvaro Herrera and by me.
2015-11-12 14:59:09 -05:00
Robert Haas f0661c4e8c Make sequential scans parallel-aware.
In addition, this path fills in a number of missing bits and pieces in
the parallel infrastructure.  Paths and plans now have a parallel_aware
flag indicating whether whatever parallel-aware logic they have should
be engaged.  It is believed that we will need this flag for a number of
path/plan types, not just sequential scans, which is why the flag is
generic rather than part of the SeqScan structures specifically.
Also, execParallel.c now gives parallel nodes a chance to initialize
their PlanState nodes from the DSM during parallel worker startup.

Amit Kapila, with a fair amount of adjustment by me.  Review of previous
patch versions by Haribabu Kommi and others.
2015-11-11 08:57:52 -05:00
Peter Eisentraut 7bd099d511 Update spelling of COPY options
The preferred spelling was changed from FORCE QUOTE to FORCE_QUOTE and
the like, but some code was still referring to the old spellings.
2015-11-04 21:01:26 -05:00
Robert Haas 1efc7e5382 Fix problems with ParamListInfo serialization mechanism.
Commit d1b7c1ffe7 introduced a mechanism
for serializing a ParamListInfo structure to be passed to a parallel
worker.  However, this mechanism failed to handle external expanded
values, as pointed out by Noah Misch.  Repair.

Moreover, plpgsql_param_fetch requires adjustment because the
serialization mechanism needs it to skip evaluating unused parameters
just as we would do when it is called from copyParamList, but params
== estate->paramLI in that case.  To fix, make the bms_is_member test
in that function unconditional.

Finally, have setup_param_list set a new ParamListInfo field,
paramMask, to the parameters actually used in the expression, so that
we don't try to fetch those that are not needed when serializing a
parameter list.  This isn't necessary for correctness, but it makes
the performance of the parallel executor code comparable to what we
do for cases involving cursors.

Design suggestions and extensive review by Noah Misch.  Patch by me.
2015-11-02 18:11:29 -05:00
Peter Eisentraut a8d585c091 Message style improvements
Message style, plurals, quoting, spelling, consistency with similar
messages
2015-10-28 20:38:36 -04:00
Robert Haas d455651624 Add missing serial comma, for consistency.
Amit Langote, per Etsuro Fujita
2015-10-28 12:19:14 +01:00
Robert Haas 9dcce7123e Fix incorrect message in ATWrongRelkindError.
Mistake introduced by commit 3bf3ab8c56.

Etsuro Fujita
2015-10-28 11:47:19 +01:00
Alvaro Herrera 531d21b75f Cleanup commit timestamp module activaction, again
Further tweak commit_ts.c so that on a standby the state is completely
consistent with what that in the master, rather than behaving
differently in the cases that the settings differ.  Now in standby and
master the module should always be active or inactive in lockstep.

Author: Petr Jelínek, with some further tweaks by Álvaro Herrera.

Backpatch to 9.5, where commit timestamps were introduced.

Discussion: http://www.postgresql.org/message-id/5622BF9D.2010409@2ndquadrant.com
2015-10-27 15:06:50 -03:00
Robert Haas 872101bede Add two missing cases to ATWrongRelkindError.
This way, we produce a better error message if someone tries to do
something like ALTER INDEX .. ALTER COLUMN .. SET STORAGE.

Amit Langote
2015-10-22 17:00:53 -04:00
Robert Haas bde39eed0c Fix a couple of bugs in recent parallelism-related commits.
Commit 816e336f12 added the wrong error
check to async.c; sending restrictions is restricted to the leader,
not altogether unsafe.

Commit 3bd909b220 added ExecShutdownNode
to traverse the planstate tree and call shutdown functions, but made
a Gather node, the only node that actually has such a function, abort
the tree traversal, which is wrong.
2015-10-22 10:49:20 -04:00
Robert Haas 816e336f12 Mark more functions parallel-restricted or parallel-unsafe.
Commit 7aea8e4f2d was overoptimistic
about the degree of safety associated with running various functions
in parallel mode.  Functions that take a table name or OID as an
argument are at least parallel-restricted, because the table might be
temporary, and we currently don't allow parallel workers to touch
temporary tables.  Functions that take a query as an argument are
outright unsafe, because the query could be anything, including a
parallel-unsafe query.

Also, the queue of pending notifications is backend-private, so adding
to it from a worker doesn't behave correctly.  We could fix this by
transferring the worker's queue of pending notifications to the master
during worker cleanup, but that seems like more trouble than it's
worth for now.  In addition to adjusting the pg_proc.h markings, also
add an explicit check for this in async.c.
2015-10-16 11:49:31 -04:00
Robert Haas 82b37765c7 Fix a problem with parallel workers being unable to restore role.
check_role() tries to verify that the user has permission to become the
requested role, but this is inappropriate in a parallel worker, which
needs to exactly recreate the master's authorization settings.  So skip
the check in that case.

This fixes a bug in commit 924bcf4f16.
2015-10-16 11:37:19 -04:00
Alvaro Herrera 817588bc2b Fix bogus comments
Author: Amit Langote
2015-10-15 12:20:11 -03:00
Stephen Frost 088c83363a ALTER TABLE .. FORCE ROW LEVEL SECURITY
To allow users to force RLS to always be applied, even for table owners,
add ALTER TABLE .. FORCE ROW LEVEL SECURITY.

row_security=off overrides FORCE ROW LEVEL SECURITY, to ensure pg_dump
output is complete (by default).

Also add SECURITY_NOFORCE_RLS context to avoid data corruption when
ALTER TABLE .. FORCE ROW SECURITY is being used. The
SECURITY_NOFORCE_RLS security context is used only during referential
integrity checks and is only considered in check_enable_rls() after we
have already checked that the current user is the owner of the relation
(which should always be the case during referential integrity checks).

Back-patch to 9.5 where RLS was added.
2015-10-04 21:05:08 -04:00
Andres Freund b67aaf21e8 Add CASCADE support for CREATE EXTENSION.
Without CASCADE, if an extension has an unfullfilled dependency on
another extension, CREATE EXTENSION ERRORs out with "required extension
... is not installed". That is annoying, especially when that dependency
is an implementation detail of the extension, rather than something the
extension's user can make sense of.

In addition to CASCADE this also includes a small set of regression
tests around CREATE EXTENSION.

Author: Petr Jelinek, editorialized by Michael Paquier, Andres Freund
Reviewed-By: Michael Paquier, Andres Freund, Jeff Janes
Discussion: 557E0520.3040800@2ndquadrant.com
2015-10-03 18:23:40 +02:00
Tom Lane 5884b92a84 Fix errors in commit a04bb65f70.
Not a lot of commentary needed here really.
2015-09-30 23:37:26 -04:00
Tom Lane 07e4d03fb4 Improve LISTEN startup time when there are many unread notifications.
If some existing listener is far behind, incoming new listener sessions
would start from that session's read pointer and then need to advance over
many already-committed notification messages, which they have no interest
in.  This was expensive in itself and also thrashed the pg_notify SLRU
buffers a lot more than necessary.  We can improve matters considerably
in typical scenarios, without much added cost, by starting from the
furthest-ahead read pointer, not the furthest-behind one.  We do have to
consider only sessions in our own database when doing this, which requires
an extra field in the data structure, but that's a pretty small cost.

Back-patch to 9.0 where the current LISTEN/NOTIFY logic was introduced.

Matt Newell, slightly adjusted by me
2015-09-30 23:32:43 -04:00
Robert Haas 3bd909b220 Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream.  It can also run the plan itself, if the workers are
unavailable or haven't started up yet.  It is intended to work with
the Partial Seq Scan node which will be added in future commits.

It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used.  In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results.  So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes.  Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.

There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne.  But we're getting
close.

Amit Kapila.  Some designs suggestions were provided by me, and I also
reviewed the patch.  Single-copy mode, documentation, and other minor
changes also by me.
2015-09-30 19:23:36 -04:00
Tom Lane 6057f61b4d Small improvements in comments in async.c.
We seem to have lost a line somewhere along the way in the comment block
that discusses async.c's locks, because it suddenly refers to "both locks"
without previously having mentioned more than one.  Add a sentence to make
that read more sanely.  Also, refer to the "pos of the slowest backend"
not the "tail of the slowest backend", since we have no per-backend value
called "tail".
2015-09-29 22:07:16 -04:00
Alvaro Herrera 590e2d12f0 COPY: use pg_plan_query() instead of planner()
While at it, trim the includes list in copy.c.  The planner headers
cannot be removed, but there are a few others that are not of any use.
2015-09-28 15:14:08 -03:00
Andres Freund aa29c1ccd9 Remove legacy multixact truncation support.
In 9.5 and master there is no need to support legacy truncation. This is
just committed separately to make it easier to backpatch the WAL logged
multixact truncation to 9.3 and 9.4 if we later decide to do so.

I bumped master's magic from 0xD086 to 0xD088 and 9.5's from 0xD085 to
0xD087 to avoid 9.5 reusing a value that has been in use on master while
keeping the numbers increasing between major versions.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Backpatch: 9.5
2015-09-26 19:04:25 +02:00
Andres Freund 4f627f8973 Rework the way multixact truncations work.
The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.

We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.

This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
   to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
   we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
   this has gotten much easier

During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
   segments. This happened to be hard or impossible to hit due to the
   interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
   that doesn't work for offsets that haven't yet been flushed to
   disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
   slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
   and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
   of emergency vacuuming. The problem remains in
   pg_get_multixact_members(), but that's quite harmless.

For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.

For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound.  To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.

A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now
2015-09-26 19:04:25 +02:00
Tom Lane 82e1ba7fd6 Make ANALYZE compute basic statistics even for types with no "=" operator.
Previously, ANALYZE simply ignored columns of datatypes that have neither
a btree nor hash opclass (which means they have no recognized equality
operator).  Without a notion of equality, we can't identify most-common
values nor estimate the number of distinct values.  But we can still
count nulls and compute the average physical column width, and those
stats might be of value.  Moreover there are some tools out there that
don't work so well if rows are missing from pg_statistic.  So let's
add suitable logic for this case.

While this is arguably a bug fix, it also has the potential to change
query plans, and the gain seems not worth taking a risk of that in
stable branches.  So back-patch into 9.5 but not further.

Oleksandr Shulgin, rewritten a bit by me.
2015-09-23 18:26:49 -04:00
Robert Haas 8dd401aa07 Add new function planstate_tree_walker.
ExplainPreScanNode knows how to iterate over a generic tree of plan
states; factor that logic out into a separate walker function so that
other code, such as upcoming patches for parallel query, can also use
it.

Patch by me, reviewed by Tom Lane.
2015-09-17 11:27:06 -04:00
Robert Haas 7aea8e4f2d Determine whether it's safe to attempt a parallel plan for a query.
Commit 924bcf4f16 introduced a framework
for parallel computation in PostgreSQL that makes most but not all
built-in functions safe to execute in parallel mode.  In order to have
parallel query, we'll need to be able to determine whether that query
contains functions (either built-in or user-defined) that cannot be
safely executed in parallel mode.  This requires those functions to be
labeled, so this patch introduces an infrastructure for that.  Some
functions currently labeled as safe may need to be revised depending on
how pending issues related to heavyweight locking under paralllelism
are resolved.

Parallel plans can't be used except for the case where the query will
run to completion.  If portal execution were suspended, the parallel
mode restrictions would need to remain in effect during that time, but
that might make other queries fail.  Therefore, this patch introduces
a framework that enables consideration of parallel plans only when it
is known that the plan will be run to completion.  This probably needs
some refinement; for example, at bind time, we do not know whether a
query run via the extended protocol will be execution to completion or
run with a limited fetch count.  Having the client indicate its
intentions at bind time would constitute a wire protocol break.  Some
contexts in which parallel mode would be safe are not adjusted by this
patch; the default is not to try parallel plans except from call sites
that have been updated to say that such plans are OK.

This commit doesn't introduce any parallel paths or plans; it just
provides a way to determine whether they could potentially be used.
I'm committing it on the theory that the remaining parallel sequential
scan patches will also get committed to this release, hopefully in the
not-too-distant future.

Robert Haas and Amit Kapila.  Reviewed (in earlier versions) by Noah
Misch.
2015-09-16 15:38:47 -04:00
Stephen Frost 22eaf35c1d RLS refactoring
This refactors rewrite/rowsecurity.c to simplify the handling of the
default deny case (reducing the number of places where we check for and
add the default deny policy from three to one) by splitting up the
retrival of the policies from the application of them.

This also allowed us to do away with the policy_id field.  A policy_name
field was added for WithCheckOption policies and is used in error
reporting, when available.

Patch by Dean Rasheed, with various mostly cosmetic changes by me.

Back-patch to 9.5 where RLS was introduced to avoid unnecessary
differences, since we're still in alpha, per discussion with Robert.
2015-09-15 15:49:31 -04:00
Tom Lane 9270d8db9a Fix CreateTableSpace() so it will compile without HAVE_SYMLINK.
This has been broken since 9.3 (commit 82b1b213ca to be exact),
which suggests that nobody is any longer using a Windows build system that
doesn't provide a symlink emulation.  Still, it's wrong on its own terms,
so repair.

YUriy Zhuravlev
2015-09-05 16:15:38 -04:00