Commit Graph

28568 Commits

Author SHA1 Message Date
Tom Lane c7a141a986 Fix PL/Python ereport() test to work on Python 2.3.
Per buildfarm.

Pavel Stehule
2016-04-09 16:44:54 -04:00
Tom Lane 08e785436f Get rid of GenericXLogUnregister().
This routine is unsafe as implemented, because it invalidates the page
image pointers returned by previous GenericXLogRegister() calls.

Rather than complicate the API or the implementation to avoid that,
let's just get rid of it; the use-case for having it seems much
too thin to justify a lot of work here.

While at it, do some wordsmithing on the SGML docs for generic WAL.
2016-04-09 16:39:30 -04:00
Tom Lane db03cf375d Code review/prettification for generic_xlog.c.
Improve commentary, use more specific names for the delta fields,
const-ify pointer arguments where possible, avoid assuming that
initializing only the first element of a local array will guarantee
that the remaining elements end up as we need them.  (I think that
code in generic_redo actually worked, but only because InvalidBuffer
is zero; this is a particularly ugly way of depending on that ...)
2016-04-09 15:02:19 -04:00
Tom Lane 2dd318d277 Run pgindent on generic_xlog.c.
This code desperately needs some micro-optimization, and I'd like it
to be formatted a bit more nicely while I work on it.
2016-04-09 13:33:33 -04:00
Kevin Grittner 381200be4b Fix typo in C comment. 2016-04-09 09:07:42 -05:00
Kevin Grittner 56dffb5a73 Turn special page pointer validation to static inline function
Inclusion of multiple macros inside another macro was pushing MSVC
past its size liimit.  Reported by buildfarm.
2016-04-09 08:17:22 -05:00
Alvaro Herrera 1ff3f420d4 Move \crosstabview regression tests to a separate file
It cannot run in the same parallel group as misc, because it creates a
table which is unpredictably visible in that test.

Per buildfarm member crake.
2016-04-08 23:42:24 -03:00
Alvaro Herrera c09b18f21c Support \crosstabview in psql
\crosstabview is a completely different way to display results from a
query: instead of a vertical display of rows, the data values are placed
in a grid where the column and row headers come from the data itself,
similar to a spreadsheet.

The sort order of the horizontal header can be specified by using
another column in the query, and the vertical header determines its
ordering from the order in which they appear in the query.

This only allows displaying a single value in each cell.  If more than
one value correspond to the same cell, an error is thrown.  Merging of
values can be done in the query itself, if necessary.  This may be
revisited in the future.

Author: Daniel Verité
Reviewed-by: Pavel Stehule, Dean Rasheed
2016-04-08 20:23:18 -03:00
Kevin Grittner 279d86afdb Add snapshot_too_old to NSVC @contrib_excludes
The buildfarm showed failure for Windows MSVC builds due to this
omission.  This might not be the only problem with the Makefile for
this feature, but hopefully this will get it past the immediate
problem.

Fix suggested by Tom Lane
2016-04-08 17:22:21 -05:00
Andres Freund c1ddd2361f Expose more out/readfuncs support functions.
Previously bcac23d exposed a subset of support functions, namely the
ones Kaigai found useful. In
20160304193704.elq773pyg5fyl3mi@alap3.anarazel.de I mentioned that
there's some functions missing to use the facility in an external
project.

To avoid having to add functions piecemeal, add all the functions which
are used to define READ_* and WRITE_* macros; users of the extensible
node functionality are likely to need these. Additionally expose
outDatum(), which doesn't have it's own WRITE_ macro, as it needs
information from the embedding struct.

Discussion: 20160304193704.elq773pyg5fyl3mi@alap3.anarazel.de
2016-04-08 14:26:36 -07:00
Stephen Frost 7a542700df Create default roles
This creates an initial set of default roles which administrators may
use to grant access to, historically, superuser-only functions.  Using
these roles instead of granting superuser access reduces the number of
superuser roles required for a system.  Documention for each of the
default roles has been added to user-manag.sgml.

Bump catversion to 201604082, as we had a commit that bumped it to
201604081 and another that set it back to 201604071...

Reviews by José Luis Tallón and Robert Haas
2016-04-08 16:56:27 -04:00
Stephen Frost 293007898d Reserve the "pg_" namespace for roles
This will prevent users from creating roles which begin with "pg_" and
will check for those roles before allowing an upgrade using pg_upgrade.

This will allow for default roles to be provided at initdb time.

Reviews by José Luis Tallón and Robert Haas
2016-04-08 16:56:27 -04:00
Stephen Frost fa6075e551 Fix improper usage of 'dump' bitmap
Now that 'dump' is a bitmap, we can't simply set it to 'true'.

Noticed while debugging the prior issue.
2016-04-08 16:30:02 -04:00
Kevin Grittner 848ef42bb8 Add the "snapshot too old" feature
This feature is controlled by a new old_snapshot_threshold GUC.  A
value of -1 disables the feature, and that is the default.  The
value of 0 is just intended for testing.  Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect.  The xmin associated with a transaction ID does
still protect dead tuples.  A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.

This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
2016-04-08 14:36:30 -05:00
Kevin Grittner 8b65cf4c5e Modify BufferGetPage() to prepare for "snapshot too old" feature
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in.  It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point.  This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third.  The follow-on patch will change the places where the test
needs to be made.
2016-04-08 14:30:10 -05:00
Stephen Frost 689f9a0588 In dumpTable, re-instate the skipping logic
Pretty sure I removed this based on some incorrect thinking that it was
no longer possible to reach this point for a table which will not be
dumped, but that's clearly wrong.

Pointed out on IRC by Erik Rijkers.
2016-04-08 15:00:44 -04:00
Teodor Sigaev 8b99edefca Revert CREATE INDEX ... INCLUDING ...
It's not ready yet, revert two commits
690c543550 - unstable test output
386e3d7609 - patch itself
2016-04-08 21:52:13 +03:00
Magnus Hagander 35e2e357cb Add authentication parameters compat_realm and upn_usename for SSPI
These parameters are available for SSPI authentication only, to make
it possible to make it behave more like "normal gssapi", while
making it possible to maintain compatibility.

compat_realm is on by default, but can be turned off to make the
authentication use the full Kerberos realm instead of the NetBIOS name.

upn_username is off by default, and can be turned on to return the users
Kerberos UPN rather than the SAM-compatible name (a user in Active
Directory can have both a legacy SAM-compatible username and a new
Kerberos one. Normally they are the same, but not always)

Author: Christian Ullrich
Reviewed by: Robbie Harwood, Alvaro Herrera, me
2016-04-08 20:28:38 +02:00
Teodor Sigaev cb0c8cbf31 Fix possible use of uninitialised value in ts_headline()
Found during investigation of failure of skink buildfarm member and its
valgrind report.

Backpatch to all supported branches
2016-04-08 21:25:14 +03:00
Tom Lane 690c543550 Fix unstable regression test output.
Output order from the pg_indexes view might vary depending on the
phase of the moon, so add ORDER BY to ensure stable results of tests
added by commit 386e3d7609.
Per buildfarm.
2016-04-08 14:15:20 -04:00
Peter Eisentraut 7c7d4fddab Distrust external OpenSSL clients; clear err queue
OpenSSL has an unfortunate tendency to mix per-session state error
handling with per-thread error handling.  This can cause problems when
programs that link to libpq with OpenSSL enabled have some other use of
OpenSSL; without care, one caller of OpenSSL may cause problems for the
other caller.  Backend code might similarly be affected, for example
when a third party extension independently uses OpenSSL without taking
the appropriate precautions.

To fix, don't trust other users of OpenSSL to clear the per-thread error
queue.  Instead, clear the entire per-thread queue ahead of certain I/O
operations when it appears that there might be trouble (these I/O
operations mostly need to call SSL_get_error() to check for success,
which relies on the queue being empty).  This is slightly aggressive,
but it's pretty clear that the other callers have a very dubious claim
to ownership of the per-thread queue.  Do this is both frontend and
backend code.

Finally, be more careful about clearing our own error queue, so as to
not cause these problems ourself.  It's possibly that control previously
did not always reach SSLerrmessage(), where ERR_get_error() was supposed
to be called to clear the queue's earliest code.  Make sure
ERR_get_error() is always called, so as to spare other users of OpenSSL
the possibility of similar problems caused by libpq (as opposed to
problems caused by a third party OpenSSL library like PHP's OpenSSL
extension).  Again, do this is both frontend and backend code.

See bug #12799 and https://bugs.php.net/bug.php?id=68276

Based on patches by Dave Vitek and Peter Eisentraut.

From: Peter Geoghegan <pg@bowt.ie>
2016-04-08 14:11:56 -04:00
Tom Lane 34c33a1f00 Add BSD authentication method.
Create a "bsd" auth method that works the same as "password" so far as
clients are concerned, but calls the BSD Authentication service to
check the password.  This is currently only available on OpenBSD.

Marisa Emerson, reviewed by Thomas Munro
2016-04-08 13:52:06 -04:00
Robert Haas af025eed53 Add combine functions for various floating-point aggregates.
This allows parallel aggregation to use them.  It may seem surprising
that we use float8_combine for both float4_accum and float8_accum
transition functions, but that's because those functions differ only
in the type of the non-transition-state argument.

Haribabu Kommi, reviewed by David Rowley and Tomas Vondra
2016-04-08 13:47:06 -04:00
Teodor Sigaev 1ec4c7c055 Restore original tsquery operation numbering.
As noticed by Tom Lane changing operation's number in commit
bb140506df causes on-disk format incompatibility.
Revert to previous numbering, that is reason to add special array to store
priorities of operation. Also it reverts order of tsquery to previous.

Author: Dmitry Ivanov
2016-04-08 20:11:30 +03:00
Andrew Dunstan 76a1c97bf2 Silence warning from modern perl about unescaped braces 2016-04-08 12:50:30 -04:00
Teodor Sigaev 386e3d7609 CREATE INDEX ... INCLUDING (column[, ...])
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.

Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
2016-04-08 19:45:59 +03:00
Peter Eisentraut 339025c68f Replace printf format %i by %d
see also ce8d7bb644
2016-04-08 12:42:58 -04:00
Andrew Dunstan 01a07e6c11 Turn down MSVC compiler verbosity
Most of what is produced by the detailed verbosity level is of no
interest at all, so switch to the normal level for more usable output.

Christian Ullrich

Backpatch to all live branches
2016-04-08 12:37:20 -04:00
Tom Lane 93c301fc4f Fix multiple bugs in tablespace symlink removal.
Don't try to examine S_ISLNK(st.st_mode) after a failed lstat().
It's undefined.

Also, if the lstat() reported ENOENT, we do not wish that to be a hard
error, but the code might nonetheless treat it as one (giving an entirely
misleading error message, too) depending on luck-of-the-draw as to what
S_ISLNK() returned.

Don't throw error for ENOENT from rmdir(), either.  (We're not really
expecting ENOENT because we just stat'd the file successfully; but
if we're going to allow ENOENT in the symlink code path, surely the
directory code path should too.)

Generate an appropriate errcode for its-the-wrong-type-of-file complaints.
(ERRCODE_SYSTEM_ERROR doesn't seem appropriate, and failing to write
errcode() around it certainly doesn't work, and not writing an errcode
at all is not per project policy.)

Valgrind noticed the undefined S_ISLNK result; the other problems emerged
while reading the code in the area.

All of this appears to have been introduced in 8f15f74a44.
Back-patch to 9.5 where that commit appeared.
2016-04-08 12:31:53 -04:00
Teodor Sigaev 5c3c3cd0a3 Enhanced custom error in PLPythonu
Patch adds a new, more rich,  way to emit error message or exception from
PL/Pythonu code.

Author: Pavel Stehule
Reviewers: Catalin Iacob, Peter Eisentraut, Jim Nasby
2016-04-08 18:33:06 +03:00
Andres Freund 5364b357fb Increase maximum number of clog buffers.
Benchmarking has shown that the current number of clog buffers limits
scalability. We've previously increased the number in 33aaa139, but
that's not sufficient with a large number of clients.

We've benchmarked the cost of increasing the limit by benchmarking worst
case scenarios; testing showed that 128 buffers don't cause a
regression, even in contrived scenarios, whereas 256 does

There are a number of more complex patches flying around to address
various clog scalability problems, but this is simple enough that we can
get it into 9.6; and is beneficial even after those patches have been
applied.

It is a bit unsatisfactory to increase this in small steps every few
releases, but a better solution seems to require a rewrite of slru.c;
not something done quickly.

Author: Amit Kapila and Andres Freund
Discussion: CAA4eK1+-=18HOrdqtLXqOMwZDbC_15WTyHiFruz7BvVArZPaAw@mail.gmail.com
2016-04-08 08:25:59 -07:00
Robert Haas 25fe8b5f1a Add a 'parallel_degree' reloption.
The code that estimates what parallel degree should be uesd for the
scan of a relation is currently rather stupid, so add a parallel_degree
reloption that can be used to override the planner's rather limited
judgement.

Julien Rouhaud, reviewed by David Rowley, James Sewell, Amit Kapila,
and me.  Some further hacking by me.
2016-04-08 11:14:56 -04:00
Robert Haas b0b64f6505 Attempt to fix breakage due to declaration following code.
Per Tom Lane and the buildfarm.
2016-04-08 10:52:56 -04:00
Peter Eisentraut 2f1d2b7a75 Set PAM_RHOST item for PAM authentication
The PAM_RHOST item is set to the remote IP address or host name and can
be used by PAM modules.  A pg_hba.conf option is provided to choose
between IP address and resolved host name.

From: Grzegorz Sampolski <grzsmp@gmail.com>
Reviewed-by: Haribabu Kommi <kommi.haribabu@gmail.com>
2016-04-08 10:48:44 -04:00
Teodor Sigaev 4e55b3f033 Rename comparePos() to compareWordEntryPos()
Rename comparePos() to compareWordEntryPos() to prevent export of too
generic name.

Per gripe from Tom Lane.
2016-04-08 12:04:15 +03:00
Fujii Masao 196b72fb9a Add regression tests for multiple synchronous standbys.
Authors: Suraj Kharage, Michael Paquier, Masahiko Sawada, refactored by me
Reviewed-By: Kyotaro Horiguchi
2016-04-08 16:48:53 +09:00
Robert Haas 0711803775 Use quicksort, not replacement selection, for external sorting.
We still use replacement selection for the first run of the sort only
and only when the number of tuples is relatively small.  Otherwise,
the first run, and subsequent runs in all cases, are produced using
quicksort.  This tends to be faster except perhaps for very small
amounts of working memory.

Peter Geoghegan, reviewed by Tomas Vondra, Jeff Janes, Mithun Cy,
Greg Stark, and me.
2016-04-08 02:36:26 -04:00
Robert Haas 719c84c1be Extend relations multiple blocks at a time to improve scalability.
Contention on the relation extension lock can become quite fierce when
multiple processes are inserting data into the same relation at the same
time at a high rate.  Experimentation shows the extending the relation
multiple blocks at a time improves scalability.

Dilip Kumar, reviewed by Petr Jelinek, Amit Kapila, and me.
2016-04-08 02:04:46 -04:00
Simon Riggs 137805f89a Use Foreign Key relationships to infer multi-column join selectivity
In cases where joins use multiple columns we currently assess each join
separately causing gross mis-estimates for join cardinality.

This patch adds use of FK information for the first time into the
planner. When FKs are present and we have multi-column join information,
plan estimates will be drastically improved. Cases with multiple FKs
are handled, though partial matches are ignored currently.

Net effect is substantial performance improvements for joins in many
common cases. Additional planning time is isolated to cases that are
currently performing poorly, measured at 0.08 - 0.15 ms.

Please watch for planner performance regressions; circumstances seem
unlikely but the law of unintended consequences may apply somewhen.
Additional complex tests welcome to prove this before release.

Tests can be performed using SET enable_fkey_estimates = on | off
using scripts provided during Hackers discussions, message id:
552335D9.3090707@2ndquadrant.com

Authors: Tomas Vondra and David Rowley
Reviewed and tested by Simon Riggs, adding comments only
2016-04-08 02:51:09 +01:00
Stephen Frost 6928484bda GRANT rights to CURRENT_USER instead of adding roles
We shouldn't be adding roles during the regression tests as that can
cause back-to-back installcheck runs to fail and users running the
regression tests likley don't want those extra roles.

Pointed out by Tom
2016-04-07 14:40:23 -04:00
Teodor Sigaev 3308467905 Zeroing unused parts ducring tsquery construction.
Per investigation failure skink buildfarm member and
RANDOMIZE_ALLOCATED_MEMORY help
2016-04-07 20:45:24 +03:00
Tom Lane f338dd7585 Refactor join_is_removable() to separate out distinctness-proving logic.
Extracted from pending unique-join patch, since this is a rather large
delta but it's simply moving code out into separately-accessible
subroutines.

I (tgl) did choose to add a bit more logic to rel_supports_distinctness,
so that it verifies that there's at least one potentially usable unique
index rather than just checking indexlist != NIL.  Otherwise there's
no functional change here.

David Rowley
2016-04-07 13:12:31 -04:00
Teodor Sigaev a7ace3b6d9 Make testing of phraseto_tsquery independ from value of
default_text_search_config variable.

Per skink buldfarm member
2016-04-07 19:33:23 +03:00
Kevin Grittner fcff8a5751 Detect SSI conflicts before reporting constraint violations
While prior to this patch the user-visible effect on the database
of any set of successfully committed serializable transactions was
always consistent with some one-at-a-time order of execution of
those transactions, the presence of declarative constraints could
allow errors to occur which were not possible in any such ordering,
and developers had no good workarounds to prevent user-facing
errors where they were not necessary or desired.  This patch adds
a check for serialization failure ahead of duplicate key checking
so that if a developer explicitly (redundantly) checks for the
pre-existing value they will get the desired serialization failure
where the problem is caused by a concurrent serializable
transaction; otherwise they will get a duplicate key error.

While it would be better if the reads performed by the constraints
could count as part of the work of the transaction for
serialization failure checking, and we will hopefully get there
some day, this patch allows a clean and reliable way for developers
to work around the issue.  In many cases existing code will already
be doing the right thing for this to "just work".

Author: Thomas Munro, with minor editing of docs by me
Reviewed-by: Marko Tiikkaja, Kevin Grittner
2016-04-07 11:12:35 -05:00
Teodor Sigaev bb140506df Phrase full text search.
Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
  should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
  readable

Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov
2016-04-07 18:44:18 +03:00
Simon Riggs 015e88942a Load FK defs into relcache for use by planner
Fastpath ignores this if no triggers defined.

Author: Tomas Vondra, with fastpath and comments added by me
Reviewers: David Rowley, Simon Riggs
2016-04-07 12:08:33 +01:00
Noah Misch f2b1b3079c Standardize GetTokenInformation() error reporting.
Commit c22650cd64 sparked a discussion
about diverse interpretations of "token user" in error messages.  Expel
old and new specimens of that phrase by making all GetTokenInformation()
callers report errors the way GetTokenUser() has been reporting them.
These error conditions almost can't happen, so users are unlikely to
observe this change.

Reviewed by Tom Lane and Stephen Frost.
2016-04-06 23:41:43 -04:00
Noah Misch 33d3fc5e2a Remove redundant message in AddUserToTokenDacl().
GetTokenUser() will have reported an adequate error message.  These
error conditions almost can't happen, so users are unlikely to observe
this change.

Reviewed by Tom Lane and Stephen Frost.
2016-04-06 23:40:51 -04:00
Stephen Frost 29dd1504a1 Bump catversion for pg_dump dump catalog ACL patches
Pointed out by Tom.
2016-04-06 23:04:48 -04:00
Stephen Frost 1574783b4c Use GRANT system to manage access to sensitive functions
Now that pg_dump will properly dump out any ACL changes made to
functions which exist in pg_catalog, switch to using the GRANT system
to manage access to those functions.

This means removing 'if (!superuser()) ereport()' checks from the
functions themselves and then REVOKEing EXECUTE right from 'public' for
these functions in system_views.sql.

Reviews by Alexander Korotkov, Jose Luis Tallon
2016-04-06 21:45:32 -04:00
Stephen Frost 23f34fa4ba In pg_dump, include pg_catalog and extension ACLs, if changed
Now that all of the infrastructure exists, add in the ability to
dump out the ACLs of the objects inside of pg_catalog or the ACLs
for objects which are members of extensions, but only if they have
been changed from their original values.

The original values are tracked in pg_init_privs.  When pg_dump'ing
9.6-and-above databases, we will dump out the ACLs for all objects
in pg_catalog and the ACLs for all extension members, where the ACL
has been changed from the original value which was set during either
initdb or CREATE EXTENSION.

This should not change dumps against pre-9.6 databases.

Reviews by Alexander Korotkov, Jose Luis Tallon
2016-04-06 21:45:32 -04:00
Stephen Frost d217b2c360 In pg_dump, split "dump" into "dump" and "dump_contains"
Historically, the "dump" component of the namespace has been used
to decide if the objects inside of the namespace should be dumped
also.  Given that "dump" is now a bitmask and may be partial, and
we may want to dump out all components of the namespace object but
only some of the components of objects contained in the namespace,
create a "dump_contains" bitmask which will represent what components
of the objects inside of a namespace should be dumped out.

No behavior change here, but in preparation for a change where we
will dump out just the ACLs of objects in pg_catalog, but we might
not dump out the ACL of the pg_catalog namespace itself (for instance,
when it hasn't been changed from the value set at initdb time).

Reviews by Alexander Korotkov, Jose Luis Tallon
2016-04-06 21:45:32 -04:00
Stephen Frost a9f0e8e5a2 In pg_dump, use a bitmap to represent what to include
pg_dump has historically used a simple boolean 'dump' value to indicate
if a given object should be included in the dump or not.  Instead, use
a bitmap which breaks down the components of an object into their
distinct pieces and use that bitmap to only include the components
requested.

This does not include any behavioral change, but is in preperation for
the change to dump out just ACLs for objects in pg_catalog.

Reviews by Alexander Korotkov, Jose Luis Tallon
2016-04-06 21:45:32 -04:00
Stephen Frost 6c268df127 Add new catalog called pg_init_privs
This new catalog holds the privileges which the system was
initialized with at initdb time, along with any permissions set
by extensions at CREATE EXTENSION time.  This allows pg_dump
(and any other similar use-cases) to detect when the privileges
set on initdb-created or extension-created objects have been
changed from what they were set to at initdb/extension-creation
time and handle those changes appropriately.

Reviews by Alexander Korotkov, Jose Luis Tallon
2016-04-06 21:45:32 -04:00
Teodor Sigaev 0b62fd036e Add jsonb_insert
It inserts a new value into an jsonb array at arbitrary position or
a new key to jsonb object.

Author: Dmitry Dolgov
Reviewers: Petr Jelinek, Vitaly Burovoy, Andrew Dunstan
2016-04-06 19:25:00 +03:00
Peter Eisentraut 3b3fcc4eea pg_dump: Add table qualifications to some tags
Some object types have names that are only unique for one table.  But
for those we generally didn't put the table name into the dump TOC tag.
So it was impossible to identify these objects if the same name was used
for multiple tables.  This affects policies, column defaults,
constraints, triggers, and rules.

Fix by adding the table name to the TOC tag, so that it now reads
"$schema $table $object".

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2016-04-06 12:13:11 -04:00
Tom Lane de94e2af18 Run pgindent on a batch of (mostly-planner-related) source files.
Getting annoyed at the amount of unrelated chatter I get from pgindent'ing
Rowley's unique-joins patch.  Re-indent all the files it touches.
2016-04-06 11:34:02 -04:00
Fujii Masao ead9963c47 Use proper format specifier %X/%X for LSN, again.
Commit cee31f5 fixed this problem, but commit 989be08 accidentally
reverted the fix.

Thomas Munro
2016-04-06 22:20:52 +09:00
Simon Riggs cac0e36682 Revert bf08f2292f
Remove recent changes to logging XLOG_RUNNING_XACTS by request.
2016-04-06 14:03:46 +01:00
Simon Riggs 3fe3511d05 Generic Messages for Logical Decoding
API and mechanism to allow generic messages to be inserted into WAL that are
intended to be read by logical decoding plugins. This commit adds an optional
new callback to the logical decoding API.

Messages are either text or bytea. Messages can be transactional, or not, and
are identified by a prefix to allow multiple concurrent decoding plugins.

(Not to be confused with Generic WAL records, which are intended to allow crash
recovery of extensible objects.)

Author: Petr Jelinek and Andres Freund
Reviewers: Artur Zakirov, Tomas Vondra, Simon Riggs
Discussion: 5685F999.6010202@2ndquadrant.com
2016-04-06 10:05:41 +01:00
Fujii Masao 989be0810d Support multiple synchronous standby servers.
Previously synchronous replication offered only the ability to confirm
that all changes made by a transaction had been transferred to at most
one synchronous standby server.

This commit extends synchronous replication so that it supports multiple
synchronous standby servers. It enables users to consider one or more
standby servers as synchronous, and increase the level of transaction
durability by ensuring that transaction commits wait for replies from
all of those synchronous standbys.

Multiple synchronous standby servers are configured in
synchronous_standby_names which is extended to support new syntax of
'num_sync ( standby_name [ , ... ] )', where num_sync specifies
the number of synchronous standbys that transaction commits need to
wait for replies from and standby_name is the name of a standby
server.

The syntax of 'standby_name [ , ... ]' which was used in 9.5 or before
is also still supported. It's the same as new syntax with num_sync=1.

This commit doesn't include "quorum commit" feature which was discussed
in pgsql-hackers. Synchronous standbys are chosen based on their priorities.
synchronous_standby_names determines the priority of each standby for
being chosen as a synchronous standby. The standbys whose names appear
earlier in the list are given higher priority and will be considered as
synchronous. Other standby servers appearing later in this list
represent potential synchronous standbys.

The regression test for multiple synchronous standbys is not included
in this commit. It should come later.

Authors: Sawada Masahiko, Beena Emerson, Michael Paquier, Fujii Masao
Reviewed-By: Kyotaro Horiguchi, Amit Kapila, Robert Haas, Simon Riggs,
Amit Langote, Thomas Munro, Sameer Thakur, Suraj Kharage, Abhijit Menon-Sen,
Rajeev Rastogi

Many thanks to the various individuals who were involved in
discussing and developing this feature.
2016-04-06 17:18:25 +09:00
Alvaro Herrera f2fcad27d5 Support ALTER THING .. DEPENDS ON EXTENSION
This introduces a new dependency type which marks an object as depending
on an extension, such that if the extension is dropped, the object
automatically goes away; and also, if the database is dumped, the object
is included in the dump output.  Currently the grammar supports this for
indexes, triggers, materialized views and functions only, although the
utility code is generic so adding support for more object types is a
matter of touching the parser rules only.

Author: Abhijit Menon-Sen
Reviewed-by: Alexander Korotkov, Álvaro Herrera
Discussion: http://www.postgresql.org/message-id/20160115062649.GA5068@toroid.org
2016-04-05 18:38:54 -03:00
Robert Haas 41ea0c2376 Fix parallel-safety code for parallel aggregation.
has_parallel_hazard() was ignoring the proparallel markings for
aggregates, which is no good.  Fix that.  There was no way to mark
an aggregate as actually being parallel-safe, either, so add a
PARALLEL option to CREATE AGGREGATE.

Patch by me, reviewed by David Rowley.
2016-04-05 16:06:15 -04:00
Robert Haas 09adc9a8c0 Align all shared memory allocations to cache line boundaries.
Experimentation shows this only costs about 6kB, which seems well
worth it given the major performance effects that can be caused
by insufficient alignment, especially on larger systems.

Discussion: 14166.1458924422@sss.pgh.pa.us
2016-04-05 15:47:49 -04:00
Tom Lane 1d2fe56e42 Fix PL/Python for recursion and interleaved set-returning functions.
PL/Python failed if a PL/Python function was invoked recursively via SPI,
since arguments are passed to the function in its global dictionary
(a horrible decision that's far too ancient to undo) and it would delete
those dictionary entries on function exit, leaving the outer recursion
level(s) without any arguments.  Not deleting them would be little better,
since the outer levels would then see the innermost level's arguments.

Since PL/Python uses ValuePerCall mode for evaluating set-returning
functions, it's possible for multiple executions of the same SRF to be
interleaved within a query.  PL/Python failed in such a case, because
it stored only one iterator per function, directly in the function's
PLyProcedure struct.  Moreover, one interleaved instance of the SRF
would see argument values that should belong to another.

Hence, invent code for saving and restoring the argument entries.  To fix
the recursion case, we only need to save at recursive entry and restore
at recursive exit, so the overhead in non-recursive cases is negligible.
To fix the SRF case, we have to save when suspending a SRF and restore
when resuming it, which is potentially not negligible; but fortunately
this is mostly a matter of manipulating Python object refcounts and
should not involve much physical data copying.

Also, store the Python iterator and saved argument values in a structure
associated with the SRF call site rather than the function itself.  This
requires adding a memory context deletion callback to ensure that the SRF
state is cleaned up if the calling query exits before running the SRF to
completion.  Without that we'd leak a refcount to the iterator object in
such a case, resulting in session-lifespan memory leakage.  (In the
pre-existing code, there was no memory leak because there was only one
iterator pointer, but what would happen is that the previous iterator
would be resumed by the next query attempting to use the SRF.  Hardly the
semantics we want.)

We can buy back some of whatever overhead we've added by getting rid of
PLy_function_delete_args(), which seems a useless activity: there is no
need to delete argument entries from the global dictionary on exit,
since the next time anyone would see the global dict is on the next
fresh call of the PL/Python function, at which time we'd overwrite those
entries with new arg values anyway.

Also clean up some really ugly coding in the SRF implementation, including
such gems as returning directly out of a PG_TRY block.  (The only reason
that failed to crash hard was that all existing call sites immediately
exited their own PG_TRY blocks, popping the dangling longjmp pointer before
there was any chance of it being used.)

In principle this is a bug fix; but it seems a bit too invasive relative to
its value for a back-patch, and besides the fix depends on memory context
callbacks so it could not go back further than 9.5 anyway.

Alexey Grishchenko and Tom Lane
2016-04-05 14:51:19 -04:00
Robert Haas 11c8669c0c Add parallel query support functions for assorted aggregates.
This lets us use parallel aggregate for a variety of useful cases
that didn't work before, like sum(int8), sum(numeric), several
versions of avg(), and various other functions.

Add some regression tests, as well, testing the general sanity of
these and future catalog entries.

David Rowley, reviewed by Tomas Vondra, with a few further changes
by me.
2016-04-05 14:32:53 -04:00
Magnus Hagander 7117685461 Implement backup API functions for non-exclusive backups
Previously non-exclusive backups had to be done using the replication protocol
and pg_basebackup. With this commit it's now possible to make them using
pg_start_backup/pg_stop_backup as well, as long as the backup program can
maintain a persistent connection to the database.

Doing this, backup_label and tablespace_map are returned as results from
pg_stop_backup() instead of being written to the data directory. This makes
the server safe from a crash during an ongoing backup, which can be a problem
with exclusive backups.

The old syntax of the functions remain and work exactly as before, but since the
new syntax is safer this should eventually be deprecated and removed.

Only reference documentation is included. The main section on backup still needs
to be rewritten to cover this, but since that is already scheduled for a separate
large rewrite, it's not included in this patch.

Reviewed by David Steele and Amit Kapila
2016-04-05 20:03:49 +02:00
Magnus Hagander 9457b591b9 Fix typo
Etsuro Fujita
2016-04-05 11:05:01 +02:00
Peter Eisentraut 4dcd4da98c Fix error message from wal_level value renaming
found by Ian Barwick
2016-04-04 21:17:54 -04:00
Tom Lane 99f3b5613b Disallow newlines in parameter values to be set in ALTER SYSTEM.
As noted by Julian Schauder in bug #14063, the configuration-file parser
doesn't support embedded newlines in string literals.  While there might
someday be a good reason to remove that restriction, there doesn't seem
to be one right now.  However, ALTER SYSTEM SET could accept strings
containing newlines, since many of the variable-specific value-checking
routines would just see a newline as whitespace.  This led to writing a
postgresql.auto.conf file that was broken and had to be removed manually.

Pending a reason to work harder, just throw an error if someone tries this.

In passing, fix several places in the ALTER SYSTEM logic that failed to
provide an errcode() for an ereport(), and thus would falsely log the
failure as an internal XX000 error.

Back-patch to 9.4 where ALTER SYSTEM was introduced.
2016-04-04 18:05:23 -04:00
Alvaro Herrera 890614d2b3 Display WAL pointer in rm_redo error callback
This makes it easier to identify the source of a recovery problem
in case of a bug or data corruption.
2016-04-04 18:12:12 -03:00
Tom Lane 3c69b33f45 Add a few comments about ANALYZE's strategy for collecting MCVs.
Alex Shulgin complained that the underlying strategy wasn't all that
apparent, particularly not the fact that we intentionally have two
code paths depending on whether we think the column has a limited set
of possible values or not.  Try to make it clearer.
2016-04-04 17:06:33 -04:00
Tom Lane 391159e03a Partially revert commit 3d3bf62f30.
On reflection, the pre-existing logic in ANALYZE is specifically meant to
compare the frequency of a candidate MCV against the estimated frequency of
a random distinct value across the whole table.  The change to compare it
against the average frequency of values actually seen in the sample doesn't
seem very principled, and if anything it would make us less likely not more
likely to consider a value an MCV.  So revert that, but keep the aspect of
considering only nonnull values, which definitely is correct.

In passing, rename the local variables in these stanzas to
"ndistinct_table", to avoid confusion with the "ndistinct" that appears at
an outer scope in compute_scalar_stats.
2016-04-04 16:48:13 -04:00
Alvaro Herrera c9ff752a85 Silence compiler warning
Reported by Peter Eisentraut to occur on 32bit systems
2016-04-04 17:07:23 -03:00
Tom Lane 2bbe9112ae Add a \gexec command to psql for evaluation of computed queries.
\gexec executes the just-entered query, like \g, but instead of printing
the results it takes each field as a SQL command to send to the server.
Computing a series of queries to be executed is a fairly common thing,
but up to now you always had to resort to kluges like writing the queries
to a file and then inputting the file.  Now it can be done with no
intermediate step.

The implementation is fairly straightforward except for its interaction
with FETCH_COUNT.  ExecQueryUsingCursor isn't capable of being called
recursively, and even if it were, its need to create a transaction
block interferes unpleasantly with the desired behavior of \gexec after
a failure of a generated query (i.e., that it can continue).  Therefore,
disable use of ExecQueryUsingCursor when doing the master \gexec query.
We can still apply it to individual generated queries, however, and there
might be some value in doing so.

While testing this feature's interaction with single-step mode, I (tgl) was
led to conclude that SendQuery needs to recognize SIGINT (cancel_pressed)
as a negative response to the single-step prompt.  Perhaps that's a
back-patchable bug fix, but for now I just included it here.

Corey Huinker, reviewed by Jim Nasby, Daniel Vérité, and myself
2016-04-04 15:25:16 -04:00
Tom Lane 66229ac004 Introduce a LOG_SERVER_ONLY ereport level, which is never sent to client.
This elevel is useful for logging audit messages and similar information
that should not be passed to the client.  It's equivalent to LOG in terms
of decisions about logging priority in the postmaster log, but messages
with this elevel will never be sent to the client.

In the current implementation, it's just an alias for the longstanding
COMMERROR elevel (or more accurately, we've made COMMERROR an alias for
this).  At some point it might be interesting to allow a LOG_ONLY flag to
be attached to any elevel, but that would be considerably more complicated,
and it's not clear there's enough use-cases to justify the extra work.
For now, let's just take the easy 90% solution.

David Steele, reviewed by Fabien Coelho, Petr Jelínek, and myself
2016-04-04 12:32:42 -04:00
Tom Lane 58666ed28a Fix latent portability issue in pgwin32_dispatch_queued_signals().
The first iteration of the signal-checking loop would compute sigmask(0)
which expands to 1<<(-1) which is undefined behavior according to the
C standard.  The lack of field reports of trouble suggest that it
evaluates to 0 on all existing Windows compilers, but that's hardly
something to rely on.  Since signal 0 isn't a queueable signal anyway,
we can just make the loop iterate from 1 instead, and save a few cycles
as well as avoiding the undefined behavior.

In passing, avoid evaluating the volatile expression UNBLOCKED_SIGNAL_QUEUE
twice in a row; there's no reason to waste cycles like that.

Noted by Aleksander Alekseev, though this isn't his proposed fix.
Back-patch to all supported branches.
2016-04-04 11:13:17 -04:00
Dean Rasheed 84f9a35e39 Improve estimate of distinct values in estimate_num_groups().
When adjusting the estimate for the number of distinct values from a
rel in a grouped query to take into account the selectivity of the
rel's restrictions, use a formula that is less likely to produce
under-estimates.

The old formula simply multiplied the number of distinct values in the
rel by the restriction selectivity, which would be correct if the
restrictions were fully correlated with the grouping expressions, but
can produce significant under-estimates in cases where they are not
well correlated.

The new formula is based on the random selection probability, and so
assumes that the restrictions are not correlated with the grouping
expressions. This is guaranteed to produce larger estimates, and of
course risks over-estimating in cases where the restrictions are
correlated, but that has less severe consequences than
under-estimating, which might lead to a HashAgg that consumes an
excessive amount of memory.

This could possibly be improved upon in the future by identifying
correlated restrictions and using a hybrid of the old and new
formulae.

Author: Tomas Vondra, with some hacking be me
Reviewed-by: Mark Dilger, Alexander Korotkov, Dean Rasheed and Tom Lane
Discussion: http://www.postgresql.org/message-id/flat/56CD0381.5060502@2ndquadrant.com
2016-04-04 12:41:56 +01:00
Simon Riggs bf08f2292f Avoid archiving XLOG_RUNNING_XACTS on idle server
If archive_timeout > 0 we should avoid logging XLOG_RUNNING_XACTS if idle.

Bug 13685 reported by Laurence Rowe, investigated in detail by Michael Paquier,
though this is not his proposed fix.
20151016203031.3019.72930@wrigleys.postgresql.org

Simple non-invasive patch to allow later backpatch to 9.4 and 9.5
2016-04-04 07:18:05 +01:00
Simon Riggs 3e4b7d8798 Avoid pin scan for replay of XLOG_BTREE_VACUUM in all cases
Replay of XLOG_BTREE_VACUUM during Hot Standby was previously thought to require
complex interlocking that matched the requirements on the master. This required
an O(N) operation that became a significant problem with large indexes, causing
replication delays of seconds or in some cases minutes while the
XLOG_BTREE_VACUUM was replayed.

This commit skips the pin scan that was previously required, by observing in
detail when and how it is safe to do so, with full documentation. The pin
scan is skipped only in replay; the VACUUM code path on master is not
touched here and WAL is identical.

The current commit applies in all cases, effectively replacing commit
687f2cd7a0.
2016-04-03 17:46:09 +01:00
Tom Lane 3cc38ca7d2 Add psql \errverbose command to see last server error at full verbosity.
Often, upon getting an unexpected error in psql, one's first wish is that
the verbosity setting had been higher; for example, to be able to see the
schema-name field or the server code location info.  Up to now the only way
has been to adjust the VERBOSITY variable and repeat the failing query.
That's a pain, and it doesn't work if the error isn't reproducible.

This commit adds a psql feature that redisplays the most recent server
error at full verbosity, without needing to make any variable changes or
re-execute the failed command.  We just need to hang onto the latest error
PGresult in case the user executes \errverbose, and then apply libpq's
new PQresultVerboseErrorMessage() function to it.  This will consume
some trivial amount of psql memory, but otherwise the cost when the
feature isn't used should be negligible.

Alex Shulgin, reviewed by Daniel Vérité, some improvements by me
2016-04-03 12:29:55 -04:00
Tom Lane e3161b231c Add libpq support for recreating an error message with different verbosity.
Often, upon getting an unexpected error in psql, one's first wish is that
the verbosity setting had been higher; for example, to be able to see the
schema-name field or the server code location info.  Up to now the only way
has been to adjust the VERBOSITY variable and repeat the failing query.
That's a pain, and it doesn't work if the error isn't reproducible.

This commit adds support in libpq for regenerating the error message for
an existing error PGresult at any desired verbosity level.  This is almost
just a matter of refactoring the existing code into a subroutine, but there
is one bit of possibly-needed information that was not getting put into
PGresults: the text of the last query sent to the server.  We must add that
string to the contents of an error PGresult.  But we only need to save it
if it might be used, which with the existing error-formatting code only
happens if there is a PG_DIAG_STATEMENT_POSITION error field, which is
probably pretty rare for errors in production situations.  So really the
overhead when the feature isn't used should be negligible.

Alex Shulgin, reviewed by Daniel Vérité, some improvements by me
2016-04-03 12:24:54 -04:00
Tom Lane a1953f3a60 Make all the declarations of WaitEventSetWaitBlock be marked "inline".
The inconsistency here triggered compiler warnings on some buildfarm
members, and it's surely pretty pointless.
2016-04-02 13:55:44 -04:00
Tom Lane 45aae8e789 Suppress compiler warning.
Some buildfarm members are showing "comparison is always false due to
limited range of data type" complaints on this test, so #ifdef it out
on machines with 32-bit int.
2016-04-02 13:49:17 -04:00
Stephen Frost 62b5cd234b Fix typo in pg_regress.c
s/afer/after

Pointed out by Andreas 'ads' Scherbaum
2016-04-02 11:12:17 -04:00
Noah Misch c22650cd64 Refer to a TOKEN_USER payload as a "token user," not as a "user token".
This corrects messages for can't-happen errors.  The corresponding "user
token" appears in the HANDLE argument of GetTokenInformation().
2016-04-01 21:53:18 -04:00
Noah Misch 4ad6f13500 Copyedit comments and documentation. 2016-04-01 21:53:10 -04:00
Alvaro Herrera f07d18b6e9 test_slot_timelines: Fix alternate expected output 2016-04-01 18:36:07 -03:00
Tom Lane 3d3bf62f30 Omit null rows when setting the threshold for what's a most-common value.
As with the previous patch, large numbers of null rows could skew this
calculation unfavorably, causing us to discard values that have a
legitimate claim to be MCVs, since our definition of MCV is that it's
most common among the non-null population of the column.  Hence, make
the numerator of avgcount be the number of non-null sample values not
the number of sample rows; likewise for maxmincount in the
compute_scalar_stats variant.

Also, make the denominator be the number of distinct values actually
observed in the sample, rather than reversing it back out of the computed
stadistinct.  This avoids depending on the accuracy of the Haas-Stokes
approximation, and really it's what we want anyway; the threshold should
depend only on what we see in the sample, not on what we extrapolate
about the contents of the whole column.

Alex Shulgin, reviewed by Tomas Vondra and myself
2016-04-01 17:03:27 -04:00
Alvaro Herrera 5cb882675a pgbench: Remove unused parameter
For some reason this parameter was introduced as unused in 3da0dfb4b1,
and has never been used for anything.  Remove it.

Author: Fabien Coelho
2016-04-01 17:11:18 -03:00
Tom Lane be4b4dc759 Omit null rows when applying the Haas-Stokes estimator for ndistinct.
Previously, we included null rows in the values of n and N that went
into the formula, which amounts to considering null as a value in its
own right; but the d and f1 values do not include nulls.  This is
inconsistent, and it contributes to significant underestimation of
ndistinct when the column is mostly nulls.  In any case stadistinct
is defined as the number of distinct non-null values, so we should
exclude nulls when doing this computation.

This is an aboriginal bug in our application of the Haas-Stokes formula,
but we'll refrain from back-patching for fear of destabilizing plan
choices in released branches.

While at it, make the code a bit more readable by omitting unnecessary
casts and intermediate variables.

Observation and original patch by Tomas Vondra, adjusted to fix both
uses of the formula by Alex Shulgin, cosmetic improvements by me
2016-04-01 15:48:24 -04:00
Alvaro Herrera 82c83b3372 Fix logical_decoding_timelines test crashes
In the test_slot_timelines test module, we were abusing passing NULL
values which was received as zeroes in x86, but this breaks in ARM
(buildfarm member hamster) by crashing instead.  Fix the breakage by
marking these functions as STRICT; the InvalidXid value that was
previously implicit in NULL values (on x86 at least) can now be passed
as 0.  Failing to follow the fmgr protocol to check for NULLs beforehand
was causing ARM to fail, as evidenced by segmentation faults in
buildfarm member hamster.

In order to use the new functionality in the test script, use COALESCE
in the right spot to avoid forwarding NULL values.

This was diagnosed from the hamster crash by Craig Ringer, who also
proposed a different patch (checking for NULL values explicitely in the
C function code, and keeping the non-strictness in the C functions).
I decided to go with this approach instead.
2016-04-01 16:47:00 -03:00
Alvaro Herrera f402b99501 Type names should not be quoted
Our actual convention, contrary to what I said in 59a2111b23, is not to
quote type names, as evidenced by unquoted use of format_type_be()
result value in error messages.  Remove quotes from recently tweaked
messages accordingly.

Per note from Tom Lane
2016-04-01 13:35:48 -03:00
Tom Lane a067b50470 Get rid of minus zero in box regression test.
Commit acdf2a8b added a test case involving minus zero as a box endpoint.
This is not very portable, as evidenced by the several older buildfarm
members that are failing on the test because they print minus zero as
just "0".  If there were any significant reason to test this behavior,
we could consider carrying a separate expected-file; but it doesn't look
to me like there's adequate justification to accept such a maintenance
burden.  Just change the test to use plain zero, instead.
2016-04-01 12:25:17 -04:00
Tom Lane 2306696004 Fix oversight in getParamDescriptions(), and improve comments.
When getParamDescriptions was changed to handle out-of-memory better
by cribbing error recovery logic from getRowDescriptions/getAnotherTuple,
somebody omitted to copy the stanza about checking for excess data in
the message.  But you need to do that, since continue'ing out of the
switch in pqParseInput3 means no such check gets applied there anymore.
Noted while looking at Michael Paquier's patch that made yet another
copy of this advance_and_error logic.

(This whole business desperately needs refactoring, because I sure don't
want to see a dozen copies of this code, but that's where we seem to be
headed.  What's more, the "suspend parsing on EOF return" convention is a
holdover from protocol 2 and shouldn't exist at all in protocol 3, because
we don't process partial messages anymore.  But for now, just fix the
obvious bug.)

Also, fix some wrong/missing comments about what the API spec is
for these three functions.

This doesn't seem worthy of back-patching, even though it's a bug;
the case shouldn't ever arise in the field.
2016-04-01 12:14:16 -04:00
Teodor Sigaev 65578341af Add Generic WAL interface
This interface is designed to give an access to WAL for extensions which
could implement new access method, for example. Previously it was
impossible because restoring from custom WAL would need to access system
catalog to find a redo custom function. This patch suggests generic way
to describe changes on page with standart layout.

Bump XLOG_PAGE_MAGIC because of new record type.

Author: Alexander Korotkov with a help of Petr Jelinek, Markus Nullmeier and
	minor editorization by my
Reviewers: Petr Jelinek, Alvaro Herrera, Teodor Sigaev, Jim Nasby,
	Michael Paquier
2016-04-01 12:21:48 +03:00
Tom Lane c202ecf902 Another zic portability fix.
I should have remembered that we can't use INT64_MODIFIER with sscanf():
configure chooses that to work with snprintf(), but it might be for our
src/port/snprintf.c implementation and so not compatible with the
platform's sscanf().  This appears to be the explanation for buildfarm
member frogmouth's continuing unhappiness with the tzcode update.

Fortunately, in all of the places where zic is attempting to read into
an int64 variable, it's reading a year which certainly will fit just fine
into an int.  So make it read into an int with %d, and then cast or copy
as necessary.
2016-03-31 16:14:55 -04:00
Alvaro Herrera 61608d3836 Fix recovery_min_apply_delay test
Previously this test was relying too much on WAL replay to occur in the
exact configured interval, which was unreliable on slow or overly busy
servers.  Use a custom loop instead of poll_query_until, which is
hopefully more reliable.

Per continued failures on buildfarm member hamster (which is probably
the only one running this test suite)

Author: Michaël Paquier
2016-03-31 16:06:32 -03:00
Tom Lane f9aefcb91f Support using index-only scans with partial indexes in more cases.
Previously, the planner would reject an index-only scan if any restriction
clause for its table used a column not available from the index, even
if that restriction clause would later be dropped from the plan entirely
because it's implied by the index's predicate.  This is a fairly common
situation for partial indexes because predicates using columns not included
in the index are often the most useful kind of predicate, and we have to
duplicate (or at least imply) the predicate in the WHERE clause in order
to get the index to be considered at all.  So index-only scans were
essentially unavailable with such partial indexes.

To fix, we have to do detection of implied-by-predicate clauses much
earlier in the planner.  This patch puts it in check_index_predicates
(nee check_partial_indexes), meaning it gets done for every partial index,
whereas we previously only considered this issue at createplan time,
so that the work was only done for an index actually selected for use.
That could result in a noticeable planning slowdown for queries against
tables with many partial indexes.  However, testing suggested that there
isn't really a significant cost, especially not with reasonable numbers
of partial indexes.  We do get a small additional benefit, which is that
cost_index is more accurate since it correctly discounts the evaluation
cost of clauses that will be removed.  We can also avoid considering such
clauses as potential indexquals, which saves useless matching cycles in
the case where the predicate columns aren't in the index, and prevents
generating bogus plans that double-count the clause's selectivity when
the columns are in the index.

Tomas Vondra and Kyotaro Horiguchi, reviewed by Kevin Grittner and
Konstantin Knizhnik, and whacked around a little by me
2016-03-31 14:49:10 -04:00
Alvaro Herrera 3501f71c21 Fix broken variable declaration
Author: Konstantin Knizhnik
2016-03-30 23:39:15 -03:00
Alvaro Herrera 3dd0792ae0 Blind attempt at fixing Win32 issue on 24c5f1a103
As best as I can tell, MyReplicationSlot needs to be PGDLLIMPORT in
order for the new test_slot_timelines test module to compile.

Per buildfarm
2016-03-30 23:12:20 -03:00
Fujii Masao cee31f5fee Use proper format specifier %X/%X for LSN. 2016-03-31 11:03:40 +09:00
Alvaro Herrera 3a3b309041 I forgot the alternate expected file in previous commit
Without this, the test_slot_timelines modules fails "make installcheck"
because the required feature is not enabled in a stock server.

Per buildfarm
2016-03-30 20:48:24 -03:00
Alvaro Herrera 24c5f1a103 Enable logical slots to follow timeline switches
When decoding from a logical slot, it's necessary for xlog reading to be
able to read xlog from historical (i.e. not current) timelines;
otherwise, decoding fails after failover, because the archives are in
the historical timeline.  This is required to make "failover logical
slots" possible; it currently has no other use, although theoretically
it could be used by an extension that creates a slot on a standby and
continues to replay from the slot when the standby is promoted.

This commit includes a module in src/test/modules with functions to
manipulate the slots (which is not otherwise possible in SQL code) in
order to enable testing, and a new test in src/test/recovery to ensure
that the behavior is as expected.

Author: Craig Ringer
Reviewed-By: Oleksii Kliukin, Andres Freund, Petr Jelínek
2016-03-30 20:07:05 -03:00
Alvaro Herrera 3b02ea4f07 XLogReader general code cleanup
Some minor tweaks and comment additions, for cleanliness sake and to
avoid having the upcoming timeline-following patch be polluted with
unrelated cleanup.

Extracted from a larger patch by Craig Ringer, reviewed by Andres
Freund, with some additions by myself.
2016-03-30 18:56:13 -03:00
Tom Lane 50861cd683 Improve portability of I/O behavior for the geometric types.
Formerly, the geometric I/O routines such as box_in and point_out relied
directly on strtod() and sprintf() for conversion of the float8 component
values of their data types.  However, the behavior of those functions is
pretty platform-dependent, especially for edge-case values such as
infinities and NaNs.  This was exposed by commit acdf2a8b37, which
added test cases involving boxes with infinity endpoints, and immediately
failed on Windows and AIX buildfarm members.  We solved these problems
years ago in the main float8in and float8out functions, so let's fix it
by making the geometric types use that code instead of depending directly
on the platform-supplied functions.

To do this, refactor the float8in code so that it can be used to parse
just part of a string, and as a convenience make the guts of float8out
usable without going through DirectFunctionCall.

While at it, get rid of geo_ops.c's fairly shaky assumptions about the
maximum output string length for a double, by having it build results in
StringInfo buffers instead of fixed-length strings.

In passing, convert all the "invalid input syntax for type foo" messages
in this area of the code into "invalid input syntax for type %s" to reduce
the number of distinct translatable strings, per recent discussion.
We would have needed a fair number of the latter anyway for code-sharing
reasons, so we might as well just go whole hog.

Note: this patch is by no means intended to guarantee that the geometric
types uniformly behave sanely for infinity or NaN component values.
But any bugs we have in that line were there all along, they were just
harder to reach in a platform-independent way.
2016-03-30 17:25:03 -04:00
Tom Lane 818e593736 Suppress uninitialized-variable warnings.
My compiler doesn't like the lack of initialization of "flag", and
I think it's right: if there were zero keys we'd have an undefined
result.  The AND of zero items is TRUE, so initialize to TRUE.
2016-03-30 13:36:18 -04:00
Teodor Sigaev 2d02a856e8 Bump catalog version, forget in acdf2a8b37 2016-03-30 18:56:21 +03:00
Teodor Sigaev acdf2a8b37 Introduce SP-GiST operator class over box.
Patch implements quad-tree over boxes, naive approach of 2D quad tree will not
work for any non-point objects because splitting space on node is not
efficient. The idea of pathc is treating 2D boxes as 4D points, so,
object will not overlap (in 4D space).

The performance tests reveal that this technique especially beneficial
with too much overlapping objects, so called "spaghetti data".

Author: Alexander Lebedev with editorization by Emre Hasegeli and me
2016-03-30 18:42:36 +03:00
Teodor Sigaev 87545f5412 Use traversalValue in SP-GiST range opclass.
Author: Alexander Lebedev
2016-03-30 18:38:53 +03:00
Teodor Sigaev ccd6eb49a4 Introduce traversalValue for SP-GiST scan
During scan sometimes it would be very helpful to know some information about
parent node or all 	ancestor nodes. Right now reconstructedValue could be used
but it's not a right usage of it (range opclass uses that).

traversalValue is arbitrary piece of memory in separate MemoryContext while
reconstructedVale should have the same type as indexed column.

Subsequent patches for range opclass and quad4d tree will use it.

Author: Alexander Lebedev, Teodor Sigaev
2016-03-30 18:29:28 +03:00
Magnus Hagander 3063e7a840 Add missing gss option to msvc config template
Michael Paquier
2016-03-30 10:49:44 +02:00
Tom Lane c53ab8a3af Remove just-added tests for to_timestamp(float8) with out-of-range inputs.
Reporting the specific out-of-range input value produces platform-dependent
results.  We could skip reporting the value, but that's contrary to our
message style guidelines and unhelpful to users.  Or we could add a
separate expected-output file for Windows, but that would be a substantial
maintenance burden, and these test cases seem unlikely to be worth it.

Per buildfarm.
2016-03-29 22:23:32 -04:00
Robert Haas 314cbfc5da Add new replication mode synchronous_commit = 'remote_apply'.
In this mode, the master waits for the transaction to be applied on
the remote side, not just written to disk.  That means that you can
count on a transaction started on the standby to see all commits
previously acknowledged by the master.

To make this work, the standby sends a reply after replaying each
commit record generated with synchronous_commit >= 'remote_apply'.
This introduces a small inefficiency: the extra replies will be sent
even by standbys that aren't the current synchronous standby.  But
previously-existing synchronous_commit levels make no attempt at all
to optimize which replies are sent based on what the primary cares
about, so this is no worse, and at least avoids any extra replies for
people not using the feature at all.

Thomas Munro, reviewed by Michael Paquier and by me.  Some additional
tweaks by me.
2016-03-29 21:29:49 -04:00
Tom Lane a898b409f6 Fix interval_mul() to not produce insane results.
interval_mul() attempts to prevent its calculations from producing silly
results, but it forgot that zero times infinity yields NaN in IEEE
arithmetic.  Hence, a case like '1 second'::interval * 'infinity'::float8
produced a NaN for the months product, which didn't trigger the range
check, resulting in bogus and possibly platform-dependent output.

This isn't terribly obvious to the naked eye because if you try that
exact case, you get "interval out of range" which is what you expect
--- but if you look closer, the error is coming from interval_out not
interval_mul.  interval_mul has allowed a bogus value into the system.

Fix by adding isnan tests.

Noted while testing Vitaly Burovoy's fix for infinity input to
to_timestamp().  Given the lack of field complaints, I doubt this
is worth a back-patch.
2016-03-29 17:21:12 -04:00
Tom Lane e511d878f3 Allow to_timestamp(float8) to convert float infinity to timestamp infinity.
With the original SQL-function implementation, such cases failed because
we don't support infinite intervals.  Converting the function to C lets
us bypass the interval representation, which should be a bit faster as
well as more flexible.

Vitaly Burovoy, reviewed by Anastasia Lubennikova
2016-03-29 17:09:29 -04:00
Robert Haas 96f8373cad Fix bug in aggregate (de)serialization commit.
resulttypeLen and resulttypeByVal must be set correctly when serializing
aggregates, not just when finalizing them.  This was in David's final
patch but I downloaded the wrong version by mistake and failed to spot
the error.

David Rowley
2016-03-29 15:21:57 -04:00
Robert Haas 5fe5a2cee9 Allow aggregate transition states to be serialized and deserialized.
This is necessary infrastructure for supporting parallel aggregation
for aggregates whose transition type is "internal".  Such values
can't be passed between cooperating processes, because they are
just pointers.

David Rowley, reviewed by Tomas Vondra and by me.
2016-03-29 15:04:05 -04:00
Alvaro Herrera a1c935d3b7 pgbench: allow a script weight of zero
This refines the previous weight range and allows a script to be "turned
off" by passing a zero weight, which is useful when scripting multiple
pgbench runs.

I did not apply the suggested warning when a script uses zero weight; we
use the principle elsewhere that if there's nothing to be done, do
nothing quietly.

Adjust docs accordingly.

Author: Jeff Janes, Fabien Coelho
2016-03-29 14:47:10 -03:00
Robert Haas ad9566470b pgbench: Remove \setrandom.
You can now do the same thing via \set using the appropriate function,
either random(), random_gaussian(), or random_exponential(), depending
on the desired distribution.  This is not backward-compatible, but per
discussion, it's worth it to avoid having the old syntax hang around
forever.

Fabien Coelho, reviewed by Michael Paquier, and adjusted by me.
2016-03-29 12:08:49 -04:00
Tom Lane 7abc157165 Avoid possibly-unsafe use of Windows' FormatMessage() function.
Whenever this function is used with the FORMAT_MESSAGE_FROM_SYSTEM flag,
it's good practice to include FORMAT_MESSAGE_IGNORE_INSERTS as well.
Otherwise, if the message contains any %n insertion markers, the function
will try to fetch argument strings to substitute --- which we are not
passing, possibly leading to a crash.  This is exactly analogous to the
rule about not giving printf() a format string you're not in control of.

Noted and patched by Christian Ullrich.
Back-patch to all supported branches.
2016-03-29 11:55:19 -04:00
Teodor Sigaev 61d66c44f1 Fix support of digits in email/hostnames.
When tsearch was implemented I did several mistakes in hostname/email
definition rules:
1) allow underscore in hostname what prohibited by RFC
2) forget to allow leading digits separated by hyphen (like 123-x.com)
   in hostname
3) do no allow underscore/hyphen after leading digits in localpart of email

Artur's patch resolves two last issues, but by the way allows hosts name like
123_x.com together with 123-x.com. RFC forbids underscore usage in hostname
but pg allows that since initial tsearch version in core, although only
for non-digits. Patch syncs support digits and nondigits in both hostname and
email.

Forbidding underscore in hostname may break existsing usage of tsearch and,
anyhow, it should be done by separate patch.

Author: Artur Zakirov
BUG: #13964
2016-03-29 18:28:49 +03:00
Robert Haas f9143d102f Rework custom scans to work more like the new extensible node stuff.
Per discussion, the new extensible node framework is thought to be
better designed than the custom path/scan/scanstate stuff we added
in PostgreSQL 9.5.  Rework the latter to be more like the former.

This is not backward-compatible, but we generally don't promise that
for C APIs, and there probably aren't many people using this yet
anyway.

KaiGai Kohei, reviewed by Petr Jelinek and me.  Some further
cosmetic changes by me.
2016-03-29 11:28:04 -04:00
Tom Lane 534da37927 Protect zic's symlink() call with #ifdef HAVE_SYMLINK.
The IANA crew seem to think that symlink() exists everywhere nowadays,
and they may well be right.  But we use #ifdef HAVE_SYMLINK elsewhere
so for consistency we should do it here too.  Noted by Michael Paquier.
2016-03-29 11:06:44 -04:00
Tom Lane 6d257e732b Fix zic for Windows.
The new coding of dolink() is dependent on link() returning an on-point
errno when it fails; but the quick-hack implementation of link() that
we'd put in for Windows didn't bother with setting errno.  Fix that.

Analysis and patch by Christian Ullrich.
2016-03-29 10:40:08 -04:00
Tom Lane 656ee84890 Fix portability issues in 86c43f4e22.
INT64_MIN/MAX should be spelled PG_INT64_MIN/MAX, per well established
convention in our sources.  Less obviously, a symbol named DOUBLE causes
problems on Windows builds, so rename that to DOUBLE_CONST; and rename
INTEGER to INTEGER_CONST for consistency.

Also, get rid of incorrect/obsolete hand-munging of yycolumn, and fix
the grammar for float constants to handle expected cases such as ".1".

First two items by Michael Paquier, second two by me.
2016-03-29 00:53:53 -04:00
Robert Haas 5d4171d1c7 Don't require a user mapping for FDWs to work.
Commit fbe5a3fb73 accidentally changed
this behavior; put things back the way they were, and add some
regression tests.

Report by Andres Freund; patch by Ashutosh Bapat, with a bit of
kibitzing by me.
2016-03-28 21:50:28 -04:00
Robert Haas 868628e4fd On all Windows platforms, not just Cygwin, use _timezone and _tzname.
Up until now, we've been using timezone and tzname, but Visual Studio
2015 (for which we wish to add support) no longer declares those
symbols.  All versions since Visual Studio 2003 apparently support the
underscore-equipped names, and we don't support anything older than
Visual Studio 2005, so this should work OK everywhere.  But let's see
what the buildfarm thinks.

Michael Paquier, reviewed by Petr Jelinek
2016-03-28 20:59:25 -04:00
Robert Haas bd0f206f55 Fix typo in comment.
Thomas Munro
2016-03-28 20:55:15 -04:00
Robert Haas 86c43f4e22 pgbench: Support double constants and functions.
The new functions are pi(), random(), random_exponential(),
random_gaussian(), and sqrt().  I was worried that this would be
slower than before, but, if anything, it actually turns out to be
slightly faster, because we now express the built-in pgbench scripts
using fewer lines; each \setrandom can be merged into a subsequent
\set.

Fabien Coelho
2016-03-28 20:45:57 -04:00
Alvaro Herrera 9bd61311bd PostgresNode: initialize $timed_out if passed
Corrects an oversight in 2c83f435a3 where the $timed_out reference var
isn't initialized; using it would require the caller to initialize it
beforehand, which is cumbersome.

Author: Craig Ringer
2016-03-28 19:17:06 -03:00
Tom Lane 1f4e9da624 Sync tzload() and tzparse() APIs with IANA release tzcode2016c.
This brings us a bit closer to matching upstream, but since it affects
files outside src/timezone/, we might choose not to back-patch it.
Hence keep it separate from the main update patch.
2016-03-28 17:19:29 -04:00
Tom Lane f5f15ea6aa Fix MSVC build for changes in zic.
zic now only needs zic.c, but I didn't realize knowledge about it was
hardwired into Mkvcbuild.pm.  Per buildfarm.
2016-03-28 16:02:07 -04:00
Tom Lane 1c1a7cbd6a Sync our copy of the timezone library with IANA release tzcode2016c.
We hadn't done this in about six years, which proves to have been a mistake
because there's been a lot of code churn upstream, making the merge rather
painful.  But putting it off any further isn't going to lessen the pain,
and there are at least two incompatible changes that we need to absorb
before someone starts complaining that --with-system-tzdata doesn't work
at all on their platform, or we get blindsided by a tzdata release that
our out-of-date zic can't compile.  Last week's "time zone abbreviation
differs from POSIX standard" mess was a wake-up call in that regard.

This is a sufficiently large patch that I'm afraid to back-patch it
immediately, though the foregoing considerations imply that we probably
should do so eventually.  For the moment, just put it in HEAD so that
it can get some testing.  Maybe we can wait till the end of the 9.6
beta cycle before deeming it okay.
2016-03-28 15:10:17 -04:00
Tom Lane e5a4dea80f Document errhidecontext() where it ought to be documented.
Seems to have been missed when this function was added.  Noted while
looking at David Steele's proposal to add another similar function.
2016-03-28 14:18:14 -04:00
Alvaro Herrera 4b746f0d07 Update expected file from quoting change
I neglected to update this in 59a2111b23.

Per buildfarm
2016-03-28 14:40:32 -03:00
Alvaro Herrera cad3edef4f pg_rewind: Improve internationalization
This is mostly cosmetic since two of the three changes are debug
messages, and the third one is just a progress indicator.

Author: Michaël Paquier
2016-03-28 14:33:00 -03:00
Alvaro Herrera 37732a2555 Fix minor leak in pg_dump for ACCESS METHOD.
Bug reported by Coverity.

Author: Michaël Paquier
2016-03-28 14:27:41 -03:00
Alvaro Herrera 59a2111b23 Improve internationalization of messages involving type names
Change the slightly different variations of the message
  function FOO must return type BAR
to a single wording, removing the variability in type name so that they
all create a single translation entry; since the type name is not to be
translated, there's no point in it being part of the message anyway.

Also, change them all to use the same quoting convention, namely that
the function name is not to be quoted but the type name is.  (I'm not
quite sure why this is so, but it's the clear majority.)

Some similar messages such as "encoding conversion function FOO must ..."
are also changed.
2016-03-28 14:24:37 -03:00
Teodor Sigaev 559e7a0a6d psql tab-complete for CREATE/DROP ACCESS METHOD
Alexander Korotkov
2016-03-28 19:32:13 +03:00
Teodor Sigaev dabd255d58 Fix comment in pg_dump.
It was missed in 473b932870,
CREATE ACCESS METHOD

Alexander Korotkov
2016-03-28 19:17:28 +03:00
Stephen Frost 86ebf30fd6 Reset plan->row_security_env and planUserId
In the plancache, we check if the environment we planned the query under
has changed in a way which requires us to re-plan, such as when the user
for whom the plan was prepared changes and RLS is being used (and,
therefore, there may be different policies to apply).

Unfortunately, while those values were set and checked, they were not
being reset when the query was re-planned and therefore, in cases where
we change role, re-plan, and then change role again, we weren't
re-planning again.  This leads to potentially incorrect policies being
applied in cases where role-specific policies are used and a given query
is planned under one role and then executed under other roles, which
could happen under security definer functions or when a common user and
query is planned initially and then re-used across multiple SET ROLEs.

Further, extensions which made use of CopyCachedPlan() may suffer from
similar issues as the RLS-related fields were not properly copied as
part of the plan and therefore RevalidateCachedQuery() would copy in the
current settings without invalidating the query.

Fix by using the same approach used for 'search_path', where we set the
correct values in CompleteCachedPlan(), check them early on in
RevalidateCachedQuery() and then properly reset them if re-planning.
Also, copy through the values during CopyCachedPlan().

Pointed out by Ashutosh Bapat.  Reviewed by Michael Paquier.

Back-patch to 9.5 where RLS was introduced.

Security: CVE-2016-2193
2016-03-28 09:03:20 -04:00
Tom Lane d12e5bb79b Code and docs review for commit 3187d6de0e.
Fix up check for high-bit-set characters, which provoked "comparison is
always true due to limited range of data type" warnings on some compilers,
and was unlike the way we do it elsewhere anyway.  Fix omission of "$"
from the set of valid identifier continuation characters.  Get rid of
sanitize_text(), which was utterly inconsistent with any other error report
anywhere in the system, and wasn't even well designed on its own terms
(double-quoting the result string without escaping contained double quotes
doesn't seem very well thought out).  Fix up error messages, which didn't
follow the message style guidelines very well, and were overly specific in
situations where the actual mistake might not be what they said.  Improve
documentation.

(I started out just intending to fix the compiler warning, but the more
I looked at the patch the less I liked it.)
2016-03-28 01:00:30 -04:00
Tom Lane d65b665d52 Guard against zero vardata.rel->tuples in estimate_hash_bucketsize().
If the referenced rel was proven empty, we'd compute 0/0 here, which
results in the function returning NaN.  That's a bit more serious
than the other zero-divide case.  Still, it only seems to be possible
in HEAD, so no back-patch.

Per report from Piotr Stefaniak.  I looked through the rest of selfuncs.c
and found no other likely trouble spots.
2016-03-27 18:21:03 -04:00
Tom Lane fa09f89351 Clamp adjusted ndistinct to positive integer in estimate_hash_bucketsize().
This avoids a possible divide-by-zero in the following calculation,
and rounding the number to an integer seems like saner behavior anyway.
Assuming IEEE math, the division would yield +Infinity which would get
replaced by 1.0 at the bottom of the function, so nothing really
interesting would ensue; but avoiding divide-by-zero seems like a
good idea on general principles.

Per report from Piotr Stefaniak.  No back-patch since this seems
mostly cosmetic.
2016-03-27 18:07:16 -04:00
Andres Freund 408f043853 pg_rewind: fsync target data directory.
Previously pg_rewind did not fsync any files. That's problematic, given
that the target directory is modified. If the database was started
afterwards, 2ce439f33 luckily already caused the data directory to be
synced to disk at postmaster startup; reducing the scope of the problem.

To fix, use initdb -S, at the end of the pg_rewind run. It doesn't seem
worthwhile to duplicate the code into pg_rewind, and initdb -S is
already used that way by pg_upgrade.

Reported-By: Andres Freund
Author: Michael Paquier, somewhat edited by me
Discussion: 20160310034352.iuqgvpmg5qmnxtkz@alap3.anarazel.de
    CAB7nPqSytVG1o4S3S2pA1O=692ekurJ+fckW2PywEG3sNw54Ow@mail.gmail.com
Backpatch: 9.5, where pg_rewind was introduced
2016-03-27 23:46:25 +02:00
Andres Freund 9f7c527af3 Fix LWLockReportWaitEnd() parameter list to be (void).
Previously it was an "old style" function declaration.
2016-03-27 22:53:31 +02:00
Andres Freund a6c845946d pg_rewind: Close backup_label file descriptor.
This was a relatively harmless leak, as createBackupLabel() is only
called once per pg_rewind invocation.

Author: Michael Paquier
Reported-By: Michael Paquier
Discussion: CAB7nPqRnOw30gOXe2_SPLjh37bgm4V+txbYAPwoXb97nGQ297w@mail.gmail.com
Backpatch: 9.5, where pg_rewind was introduced
2016-03-27 22:48:31 +02:00
Andres Freund 1a7a43672b Don't use !! but != 0/NULL to force boolean evaluation.
I introduced several uses of !! to force bit arithmetic to be boolean,
but per discussion the project prefers != 0/NULL.

Discussion: CA+TgmoZP5KakLGP6B4vUjgMBUW0woq_dJYi0paOz-My0Hwt_vQ@mail.gmail.com
2016-03-27 18:10:19 +02:00
Andres Freund af4472bcb8 Change various Gin*Is* macros to return 0/1.
Returning the direct result of bit arithmetic, in a macro intended to be
used in a boolean manner, can be problematic if the return value is
stored in a variable of type 'bool'. If bool is implemented using C99's
_Bool, that can lead to comparison failures if the variable is then
compared again with the expression (see ginStepRight() for an example
that fails), as _Bool forces the result to be 0/1. That happens in some
configurations of newer MSVC compilers.  It's also problematic when
storing the result of such an expression in a narrower type.

Several gin macros have been declared in that style since gin's initial
commit in 8a3631f8d8.

There's a lot more macros like this, but this is the only one causing
regression test failures; and I don't want to commit and backpatch a
larger patch with lots of conflicts just before the next set of minor
releases.

Discussion: 20150811154237.GD17575@awork2.anarazel.de
Backpatch: All supported branches
2016-03-27 17:46:48 +02:00
Tom Lane 221619ad69 Modernize zic's test for valid timezone abbreviations.
We really need to sync all of our IANA-derived timezone code with upstream,
but that's going to be a large patch and I certainly don't care to shove
such a thing into stable branches immediately before a release.  As a
stopgap, copy just the tzcode2016c logic that checks validity of timezone
abbreviations.  This prevents getting multiple "time zone abbreviation
differs from POSIX standard" bleats with tzdata 2014b and later.
2016-03-26 15:58:44 -04:00
Tom Lane 76281aa964 Avoid a couple of zero-divide scenarios in the planner.
cost_subplan() supposed that the given subplan must have plan_rows > 0,
which as far as I can tell was true until recent refactoring of the
code in createplan.c; but now that code allows the Result for a provably
empty subquery to have plan_rows = 0.  Rather than undo that change,
put in a clamp to prevent zero divide.

get_cheapest_fractional_path() likewise supposed that best_path->rows > 0.
This assumption has been wrong for longer.  It's actually harmless given
IEEE float math, because a positive value divided by zero gives +Infinity
and compare_fractional_path_costs() will do the right thing with that.
Still, best not to assume that.

final_cost_nestloop() also seems to have some risks in this area, so
borrow the clamping logic already present in the mergejoin cost functions.

Lastly, remove unnecessary clamp_row_est() in planner.c's calls to
get_number_of_groups().  The only thing that function does with path_rows
is pass it to estimate_num_groups() which already has an internal clamp,
so we don't need the extra call; and if we did, the callers are arguably
the wrong place for it anyway.

First two items reported by Piotr Stefaniak, the others are products
of my nosing around for similar problems.  No back-patch since there's
no evidence that problems arise in the back branches.
2016-03-26 12:03:12 -04:00
Tom Lane 676265eb7b Update time zone data files to tzdata release 2016c.
DST law changes in Azerbaijan, Chile, Haiti, Palestine, and Russia (Altai,
Astrakhan, Kirov, Sakhalin, Ulyanovsk regions).  Historical corrections
for Lithuania, Moldova, Russia (Kaliningrad, Samara, Volgograd).

As of 2015b, the keepers of the IANA timezone database started to use
numeric time zone abbreviations (e.g., "+04") instead of inventing
abbreviations not found in the wild like "ASTT".  This causes our rather
old copy of zic to whine "warning: time zone abbreviation differs from
POSIX standard" several times during "make install".  This warning is
harmless according to the IANA folk, and I don't see any problems with
these abbreviations in some simple tests; but it seems like now would be
a good time to update our copy of the tzcode stuff.  I'll look into that
soon.
2016-03-25 19:03:08 -04:00
Tom Lane 9f73a2f6d1 Fix PL/Tcl for vpath builds.
Commit cd37bb7859 works for in-tree builds, but not so much for
VPATH.  Per buildfarm.
2016-03-25 17:13:03 -04:00
Tom Lane cd37bb7859 Improve PL/Tcl errorCode facility by providing decoded name for SQLSTATE.
We don't really want to encourage people to write numeric SQLSTATEs in
programs; that's unreadable and error-prone.  Copy plpgsql's infrastructure
for converting between SQLSTATEs and exception names shown in Appendix A,
and modify examples in tests and documentation to do it that way.
2016-03-25 16:54:52 -04:00
Tom Lane fb8d2a7f57 In PL/Tcl, make database errors return additional info in the errorCode.
Tcl has a convention for returning additional info about an error in a
global variable named errorCode.  Up to now PL/Tcl has ignored that,
but this patch causes database errors caught by PL/Tcl to fill in
errorCode with useful information from the ErrorData struct.

Jim Nasby, reviewed by Pavel Stehule and myself
2016-03-25 15:52:53 -04:00
Tom Lane c94959d411 Fix DROP OPERATOR to reset oprcom/oprnegate links to the dropped operator.
This avoids leaving dangling links in pg_operator; which while fairly
harmless are also unsightly.

While we're at it, simplify OperatorUpd, which went through
heap_modify_tuple for no very good reason considering it had already made
a tuple copy it could just scribble on.

Roma Sokolov, reviewed by Tomas Vondra, additional hacking by Robert Haas
and myself.
2016-03-25 12:33:16 -04:00
Tom Lane d543170f2f Don't split up SRFs when choosing to postpone SELECT output expressions.
In commit 9118d03a8c we taught the planner to postpone evaluation of
set-returning functions in a SELECT's targetlist until after any sort done
to satisfy ORDER BY.  However, if we postpone some SRFs this way while
others do not get postponed (because they're sort or group key columns)
we will break the traditional behavior by which all SRFs in the tlist run
in-step during ExecTargetList(), so that you get the least common multiple
of their periods not the product.  Fix make_sort_input_target() so it will
not split up SRF evaluation in such cases.

There is still a hazard of similar odd behavior if there's a SRF in a
grouping column and another one that isn't, but that was true before
and we're just trying to preserve bug-compatibility with the traditional
behavior.  This whole area is overdue to be rethought and reimplemented,
but we'll try to avoid changing behavior until then.

Per report from Regina Obe.
2016-03-25 11:19:51 -04:00
Tom Lane 7caaeaf360 Link libpq after libpgfeutils to satisfy Windows linker.
Some of the non-MSVC Windows buildfarm members seem to need this to avoid
getting "undefined symbol" errors on libpgfeutils' references to libpq.
I could understand that if libpq were a static library, but surely it is
not?  Oh well, at least the extra reference is no more harmful than it is
for libpgcommon or libpgport.
2016-03-24 20:45:31 -04:00
Tom Lane c1156411ad Move psql's psqlscan.l into src/fe_utils.
This completes (at least for now) the project of getting rid of ad-hoc
linkages among the src/bin/ subdirectories.  Everything they share is now
in src/fe_utils/ and is included from a static library at link time.

A side benefit is that we can restore the FLEX_NO_BACKUP check for
psqlscanslash.l.  We might need to think of another way to do that check
if we ever need to build two lexers with that property in the same source
directory, but there's no foreseeable reason to need that.
2016-03-24 20:28:47 -04:00
Tom Lane d65bea26a8 Move psql's print.c and mbprint.c into src/fe_utils.
Just turning the crank ...
2016-03-24 18:27:28 -04:00
Tom Lane a376960c8f Suppress compiler warning for get_am_type_string().
Compilers that don't know that elog(ERROR) doesn't return complained
that this function might fail to return a value.  Per buildfarm.

While at it, const-ify the function's declaration, since the intent
is evidently to always return a constant string.
2016-03-24 17:22:24 -04:00
Tom Lane 0ecd3fedfc Add missed inclusion requirement in Mkvcbuild.pm.
Per buildfarm.
2016-03-24 17:12:40 -04:00
Tom Lane 588d963b00 Create src/fe_utils/, and move stuff into there from pg_dump's dumputils.
Per discussion, we want to create a static library and put the stuff into
it that until now has been shared across src/bin/ directories by ad-hoc
methods like symlinking a source file.  This commit creates the library and
populates it with a couple of files that contain the widely-useful portions
of pg_dump's dumputils.c file.  dumputils.c survives, because it has some
stuff that didn't seem appropriate for fe_utils, but it's significantly
smaller and is no longer referenced from any other directory.

Follow-on patches will move more stuff into fe_utils.

The Mkvcbuild.pm hacking here is just a best guess; we'll see how the
buildfarm likes it.
2016-03-24 15:55:57 -04:00
Robert Haas 59a02815e2 Use correct GetDatum function.
Oops.
2016-03-24 08:57:48 -04:00
Tom Lane c2d1eea9e7 Avoid PGDLLIMPORT for simple local references in frontend programs.
I was wondering if this would be an issue, and buildfarm member frogmouth
says it is.
2016-03-23 23:26:44 -04:00
Alvaro Herrera 473b932870 Support CREATE ACCESS METHOD
This enables external code to create access methods.  This is useful so
that extensions can add their own access methods which can be formally
tracked for dependencies, so that DROP operates correctly.  Also, having
explicit support makes pg_dump work correctly.

Currently only index AMs are supported, but we expect different types to
be added in the future.

Authors: Alexander Korotkov, Petr Jelínek
Reviewed-By: Teodor Sigaev, Petr Jelínek, Jim Nasby
Commitfest-URL: https://commitfest.postgresql.org/9/353/
Discussion: https://www.postgresql.org/message-id/CAPpHfdsXwZmojm6Dx+TJnpYk27kT4o7Ri6X_4OSWcByu1Rm+VA@mail.gmail.com
2016-03-23 23:01:35 -03:00
Tom Lane 2c6af4f442 Move keywords.c/kwlookup.c into src/common/.
Now that we have src/common/ for code shared between frontend and backend,
we can get rid of (most of) the klugy ways that the keyword table and
keyword lookup code were formerly shared between different uses.
This is a first step towards a more general plan of getting rid of
special-purpose kluges for sharing code in src/bin/.

I chose to merge kwlookup.c back into keywords.c, as it once was, and
always has been so far as keywords.h is concerned.  We could have
kept them separate, but there is noplace that uses ScanKeywordLookup
without also wanting access to the backend's keyword list, so there
seems little point.

ecpg is still a bit weird, but at least now the trickiness is documented.

I think that the MSVC build script should require no adjustments beyond
what's done here ... but we'll soon find out.
2016-03-23 20:22:08 -04:00
Robert Haas 3df9c374e2 Disable abbreviated keys for string-sorting in non-C locales.
Unfortunately, every version of glibc thus far tested has bugs whereby
strcoll() ordering does not match strxfrm() ordering as required by
the standard.  This can result in, for example, corrupted indexes.
Disabling abbreviated keys in these cases slows down non-C-collation
string sorting considerably, but there seems to be no practical
alternative.  Users who are confident that their libc implementations
are solid in this regard can re-enable the optimization by compiling
with TRUST_STRXFRM.

Users who have built indexes using PostgreSQL 9.5 or PostgreSQL 9.5.1
should REINDEX if there is a possibility that they may have been
affected by this problem.

Report by Marc-Olaf Jaschke.  Investigation mostly by Tom Lane, with
help from Peter Geoghegan, Noah Misch, Stephen Frost, and me.  Patch
by me, reviewed by Peter Geoghegan and Tom Lane.
2016-03-23 16:03:13 -04:00
Robert Haas 44ca4022f3 Partition the freelist for shared dynahash tables.
Without this, contention on the freelist can become a pretty serious
problem on large servers.

Aleksander Alekseev, reviewed by Anastasia Lubennikova, Dilip Kumar,
and me.
2016-03-23 11:00:54 -04:00
Tom Lane ea4b8bd618 Code review for error reports in jsonb_set().
User-facing (even tested by regression tests) error conditions were thrown
with elog(), hence had wrong SQLSTATE and were untranslatable.  And the
error message texts weren't up to project style, either.
2016-03-23 11:00:39 -04:00
Tom Lane 384dfbde19 Fix unsafe use of strtol() on a non-null-terminated Text datum.
jsonb_set() could produce wrong answers or incorrect error reports, or in
the worst case even crash, when trying to convert a path-array element into
an integer for use as an array subscript.  Per report from Vitaly Burovoy.
Back-patch to 9.5 where the faulty code was introduced (in commit
c6947010ce).

Michael Paquier
2016-03-23 10:43:13 -04:00
Simon Riggs 8320c625d9 Change comment to describe correct lock level used 2016-03-23 11:32:34 +00:00
Tom Lane 71404af2a2 Fix EvalPlanQual bug when query contains both locked and not-locked rels.
In commit afb9249d06, we (probably I) made ExecLockRows assign
null test tuples to all relations of the query while setting up to do an
EvalPlanQual recheck for a newly-updated locked row.  This was sheerest
brain fade: we should only set test tuples for relations that are lockable
by the LockRows node, and in particular empty test tuples are only sensible
for inheritance child relations that weren't the source of the current
tuple from their inheritance tree.  Setting a null test tuple for an
unrelated table causes it to return NULLs when it should not, as exhibited
in bug #14034 from Bronislav Houdek.  To add insult to injury, doing it the
wrong way required two loops where one would suffice; so the corrected code
is even a bit shorter and faster.

Add a regression test case based on his example, and back-patch to 9.5
where the bug was introduced.
2016-03-22 17:56:20 -04:00
Tom Lane b283096534 Allow the delay in psql's \watch command to be a fractional second.
Instead of just "2" seconds, allow eg. "2.5" seconds.  Per request
from Alvaro Herrera.  No docs change since the docs didn't say you
couldn't do this already.
2016-03-21 18:34:18 -04:00
Tom Lane dea2b5960a Improve header output from psql's \watch command.
Include the \pset title string if there is one, and shorten the prefab
part of the header to be "timestamp (every Ns)".  Per suggestion by
David Johnston.

Michael Paquier and Tom Lane
2016-03-21 18:18:13 -04:00
Robert Haas ae507d9222 Make max_parallel_degree PGC_USERSET.
It was intended to be this way all along, just like other planner
GUCs such as work_mem.  But I goofed.
2016-03-21 10:54:36 -04:00
Robert Haas e06a38965b Support parallel aggregation.
Parallel workers can now partially aggregate the data and pass the
transition values back to the leader, which can combine the partial
results to produce the final answer.

David Rowley, based on earlier work by Haribabu Kommi.  Reviewed by
Álvaro Herrera, Tomas Vondra, Amit Kapila, James Sewell, and me.
2016-03-21 09:30:18 -04:00
Andres Freund 7fa0064092 Properly declare FeBeWaitSet.
Surprising that this worked on a number of systems. Reported by
buildfarm member longfin.
2016-03-21 12:58:18 +01:00
Andres Freund 98a64d0bd7 Introduce WaitEventSet API.
Commit ac1d794 ("Make idle backends exit if the postmaster dies.")
introduced a regression on, at least, large linux systems. Constantly
adding the same postmaster_alive_fds to the OSs internal datastructures
for implementing poll/select can cause significant contention; leading
to a performance regression of nearly 3x in one example.

This can be avoided by using e.g. linux' epoll, which avoids having to
add/remove file descriptors to the wait datastructures at a high rate.
Unfortunately the current latch interface makes it hard to allocate any
persistent per-backend resources.

Replace, with a backward compatibility layer, WaitLatchOrSocket with a
new WaitEventSet API. Users can allocate such a Set across multiple
calls, and add more than one file-descriptor to wait on. The latter has
been added because there's upcoming postgres features where that will be
helpful.

In addition to the previously existing poll(2), select(2),
WaitForMultipleObjects() implementations also provide an epoll_wait(2)
based implementation to address the aforementioned performance
problem. Epoll is only available on linux, but that is the most likely
OS for machines large enough (four sockets) to reproduce the problem.

To actually address the aforementioned regression, create and use a
long-lived WaitEventSet for FE/BE communication.  There are additional
places that would benefit from a long-lived set, but that's a task for
another day.

Thanks to Amit Kapila, who helped make the windows code I blindly wrote
actually work.

Reported-By: Dmitry Vasilyev Discussion:
CAB-SwXZh44_2ybvS5Z67p_CDz=XFn4hNAD=CnMEF+QqkXwFrGg@mail.gmail.com
20160114143931.GG10941@awork2.anarazel.de
2016-03-21 12:22:54 +01:00
Andres Freund 72e2d21c12 Combine win32 and unix latch implementations.
Previously latches for windows and unix had been implemented in
different files. A later patch introduce an expanded wait
infrastructure, keeping the implementation separate would introduce too
much duplication.

This basically just moves the functions, without too much change. The
reason to keep this separate is that it allows blame to continue working
a little less badly; and to make review a tiny bit easier.

Discussion: 20160114143931.GG10941@awork2.anarazel.de
2016-03-21 11:03:26 +01:00
Andres Freund 326d73c86f Second attempt at fixing MSVC build for 68ab8e8ba4.
After the previous fix in 6f1f34c9 msvc ended up looking for psqlscan.c
in the wrong directory.

David's fix just forces the path to be adjusted. That's not a
particularly pretty fix, but it hopefully will make the buildfarm green
again.

Author: David Rowley
Discussion: CAKJS1f_9CCi_t+LEgV5GWoCj3wjavcMoDc5qfcf_A0UwpQoPoA@mail.gmail.com
2016-03-21 10:49:45 +01:00
Tom Lane b6afae71aa Use %option bison-bridge in psql/pgbench lexers.
The point of this change is to use %pure-parser in pgbench's exprparse.y.
The immediate reason is that it turns out very ancient versions of bison
have a bug with the combination of a reentrant lexer and non-reentrant
parser.  We could consider dropping support for such ancient bisons; but
considering that we might well need exprparse.y to be reentrant some day,
it seems better to make it so right now than to move the portability
goalposts.  (AFAICT there's no particular performance consequence to this
change, either, so there's no good reason not to do it.)

Now, %pure-parser assumes that the called lexer is built with %option
bison-bridge.  Because we're assuming bitwise compatibility of yyscan_t
(yyguts_t) data structures among all the psql/pgbench lexers, that
requirement propagates back to psql's lexers as well.  But it's just a
few lines of change on that side too; and if psqlscan.l is to set the
baseline for a possibly-large family of lexers, it should err on the
side of including not omitting useful features.
2016-03-20 21:59:03 -04:00
Tom Lane 6f1f34c92b Best-guess attempt at fixing MSVC build for 68ab8e8ba4.
pgbench now needs to use src/bin/psql/psqlscan.l, but it's not very clear
how to fit that into the MSVC build system.  If this doesn't work I'm going
to need some help from somebody who actually understands those scripts ...
2016-03-20 17:51:54 -04:00
Tom Lane 68ab8e8ba4 SQL commands in pgbench scripts are now ended by semicolons, not newlines.
To allow multiline SQL commands in scripts, adopt the same rules psql uses
to decide what is the end of a SQL command, to wit, an unquoted semicolon
not encased in parentheses.  Do this by importing the same flex lexer that
psql uses, since coping with stuff like dollar-quoted literals is hard to
get right without going the full nine yards.

This makes use of the infrastructure added in commit 0ea9efbe9e to
support independently-written flex lexers scanning the same PsqlScanState
input-buffer data structure.  Since that infrastructure isn't very
friendly to ad-hoc parsing code such as strtok(), improve exprscan.l
so that it can parse either whitespace-separated words or expression
tokens, on demand, and rewrite pgbench.c's backslash-command parsing
code to always use the lexer to fetch tokens.

It's still the case that pgbench backslash commands extend to the end
of the line, no more and no less.  That could be changed in a fairly
localized way now, and there was some interest in doing so, but it
seems like material for a separate patch.

In passing, make some marginal cleanups in syntax error reporting,
const-ify a few data structures that could use it, and run some of
this code through pgindent.

I can't tell whether the MSVC build scripts need to be taught explicitly
about the changes here or not, but the buildfarm will soon tell us.

Kyotaro Horiguchi and Tom Lane
2016-03-20 12:58:51 -04:00
Andrew Dunstan 5d03201056 Remove dependency on psed for MSVC builds.
Modern Perl has removed psed from its core distribution, so it might not
be readily available on some build platforms. We therefore replace its
use with a Perl script generated by s2p, which is equivalent to the sed
script. The latter is retained for non-MSVC builds to avoid creating a
new hard dependency on Perl for non-Windows tarball builds.

Backpatch to all live branches.

Michael Paquier and me.
2016-03-19 18:36:35 -04:00
Tom Lane d5351fcb03 Fix phony .PHONY.
A couple makefiles had misspelled the magic .PHONY target as PHONY.
2016-03-19 17:19:37 -04:00
Tom Lane 429ee5a822 Make pgbench's expression lexer reentrant.
This is a necessary preliminary step for making it play with psqlscan.l
given the way I set up the lexer input-buffer sharing mechanism in commit
0ea9efbe9e.

I've not tried to make it *actually* reentrant; there's still some static
variables laying about.  But flex thinks it's reentrant, and that's what
counts.

In support of that, fix exprparse.y to pass through the yyscan_t from the
caller.  Also do some minor code beautification, like not casting away
const.
2016-03-19 16:35:41 -04:00
Alvaro Herrera 1038bc91ca pgbench: Silence new compiler warnings
The original coding in 7bafffea64 and previous wasn't all that great
anyway.

Reported by Jeff Janes and Tom Lane
2016-03-19 16:16:39 -03:00
Tom Lane 78e7c44399 Typo fix. 2016-03-19 14:36:52 -04:00
Tom Lane 21c8ee7946 Sync backend/parser/scan.l with bin/psql/psqlscan.l.
Make some minor formatting adjustments to make it easier to diff these
files and see that they indeed implement the same flex rules (at least
to the extent that we want them to be the same).

(Someday it'd be nice to make ecpg's pgc.l more easily diff'able too,
but today is not that day.)

Also run relevant parts of these files and psqlscanslash.l through
pgindent.

No actual behavioral changes here, just obsessive neatnik-ism.
2016-03-19 14:36:22 -04:00
Tom Lane 72b1e3a21f Build backend/parser/scan.l and interfaces/ecpg/preproc/pgc.l standalone.
Now that we know about the %top{} trick, we can revert to building flex
lexers as separate .o files.  This is worth doing for a couple of reasons
besides sheer cleanliness.  We can narrow the scope of the -Wno-error flag
that's forced on scan.c.  Also, since these grammar and lexer files are
so large, splitting them into separate build targets should have some
advantages in build speed, particularly in parallel or ccache'd builds.

We have quite a few other .l files that could be changed likewise, but the
above arguments don't apply to them, so the benefit of fixing them seems
pretty minimal.  Leave the rest for some other day.
2016-03-19 12:07:24 -04:00
Alvaro Herrera 7bafffea64 pgbench: Allow changing weights for scripts
Previously, all scripts had the same probability of being chosen when
multiple of them were specified via -b, -f, -N, -S.  With this commit,
-b and -f now search for an "@" in the script name and use the integer
found after it as the drawing probability for that script.

(One disadvantage is that if you have script whose names contain @, you
are now forced to specify "@1" at the end; otherwise the name's @ is
confused with a weight separator.  We don't expect many pgbench script
with @ in their names in the wild, so this shouldn't be too serious a
problem.)

While at it, rework the interface between addScript, process_file,
process_builtin, and findBuiltin.  It had gotten a bit out of hand with
recent commits.

Author: Fabien Coelho
Reviewed-By: Andres Freund, Robert Haas, Álvaro Herrera, Michaël Paquier
Discussion: http://www.postgresql.org/message-id/alpine.DEB.2.10.1603160721240.1666@sto
2016-03-19 12:32:42 -03:00
Tom Lane b46d9beb65 With ancient gcc, skip pg_attribute_printf() on function pointer.
Buildfarm results show that the ability to attach pg_attribute_printf
decoration to a function pointer appeared somewhere between gcc 2.95.3
and gcc 4.0.1.  Guess that it was there in 4.0.
2016-03-19 10:59:20 -04:00
Peter Eisentraut 9a83564c58 Allow SSL server key file to have group read access if owned by root
We used to require the server key file to have permissions 0600 or less
for best security.  But some systems (such as Debian) have certificate
and key files managed by the operating system that can be shared with
other services.  In those cases, the "postgres" user is made a member of
a special group that has access to those files, and the server key file
has permissions 0640.  To accommodate that kind of setup, also allow the
key file to have permissions 0640 but only if owned by root.

From: Christoph Berg <myon@debian.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2016-03-19 11:03:22 +01:00
Andres Freund 6eb2be15b5 Fix stupid omission in c4901a1e.
Reported-By: Jeff Janes
Discussion: CAMkU=1zGxREwoyaCrp_CHadEB+dPgpVyKBysCJ+6xP9gCOvAuw@mail.gmail.com
2016-03-18 22:37:59 -07:00
Tom Lane 07aed46a6b Fix missed update in _readForeignScan().
Blatant fail in 0bf3ae88af.
Caught by buildfarm member mandrill.
2016-03-19 01:20:34 -04:00
Tom Lane ff0a7e6167 Use yylex_init not yylex_init_extra().
Older versions of flex don't have the latter.  Per buildfarm.
2016-03-19 01:02:18 -04:00
Tom Lane a3e39f8363 Suppress FLEX_NO_BACKUP check for psqlscanslash.l.
The existing infrastructure for FLEX_NO_BACKUP doesn't work reliably
when two lexers are built in parallel in the same directory.  We can
probably fix that, but as a short-term workaround, just don't make
the check for psqlscanslash.l.

Per buildfarm.
2016-03-19 00:43:46 -04:00
Tom Lane 0ea9efbe9e Split psql's lexer into two separate .l files for SQL and backslash cases.
This gets us to a point where psqlscan.l can be used by other frontend
programs for the same purpose psql uses it for, ie to detect when it's
collected a complete SQL command from input that is divided across
line boundaries.  Moreover, other programs can supply their own lexers
for backslash commands of their own choosing.  A follow-on patch will
use this in pgbench.

The end result here is roughly the same as in Kyotaro Horiguchi's
0001-Make-SQL-parser-part-of-psqlscan-independent-from-ps.patch, although
the details of the method for switching between lexers are quite different.
Basically, in this patch we share the entire PsqlScanState, YY_BUFFER_STATE
stack, *and* yyscan_t between different lexers.  The only thing we need
to do to switch to a different lexer is to make sure the start_state is
valid for the new lexer.  This works because flex doesn't keep any other
persistent state that depends on the specific lexing tables generated for
a particular .l file.  (We are assuming that both lexers are built with
the same flex version, or at least versions that are compatible with
respect to the contents of yyscan_t; but that doesn't seem likely to
be a big problem in practice, considering how slowly flex changes.)

Aside from being more efficient than Horiguchi-san's original solution,
this avoids possible corner-case changes in semantics: the original code
was capable of popping the input buffer stack while still staying in
backslash-related parsing states.  I'm not sure that that equates to any
useful user-visible behaviors, but I'm not sure it doesn't either, so
I'm loath to assume that we only need to consider the topmost buffer when
parsing a backslash command.

I've attempted to update the MSVC build scripts for the added .l file,
but will rely on the buildfarm to see if I missed anything.

Kyotaro Horiguchi and Tom Lane
2016-03-19 00:24:55 -04:00
Tom Lane 27199058d9 Convert psql's flex lexer to be re-entrant, and make it compile standalone.
Change psqlscan.l to specify '%option reentrant', adjust internal APIs
to match, and get rid of its internal static variables.  While this is
good cleanup in an abstract sense, the reason to do it right now is that
it seems the only practical way to support use of separate flex lexers
with common PsqlScanState infrastructure.  If we build two non-reentrant
lexers then we are going to have problems with dangling buffer pointers
in whichever lexer isn't active when we transition from one buffer to
another, as well as curious side-effects if we try to share any code
between the files.  (Horiguchi-san had a different solution to that in his
pending patch, but I find it ugly and probably broken for corner cases.)

Depending on which version of flex you're using, this may result in getting
a "warning: unused variable 'yyg'" warning from psqlscan, similar to the
one you'd have seen for a long time in backend/parser/scan.l.  I put a
local -Wno-error into CFLAGS for the file, for the convenience of those
who compile with -Werror.

Also, stop compiling psqlscan as part of mainloop.c, and make it a
standalone build target instead.  This is a lot cleaner than before, though
it doesn't really change much in practice as of this commit.  (I'm not sure
whether the MSVC build scripts will need some help with this part, but the
buildfarm will soon tell us.)
2016-03-18 21:22:02 -04:00
Peter Eisentraut b555ed8102 Merge wal_level "archive" and "hot_standby" into new name "replica"
The distinction between "archive" and "hot_standby" existed only because
at the time "hot_standby" was added, there was some uncertainty about
stability.  This is now a long time ago.  We would like to move forward
with simplifying the replication configuration, but this distinction is
in the way, because a primary server cannot tell (without asking a
standby or predicting the future) which one of these would be the
appropriate level.

Pick a new name for the combined setting to make it clearer that it
covers all (non-logical) backup and replication uses.  The old values
are still accepted but are converted internally.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: David Steele <david@pgmasters.net>
2016-03-18 23:56:03 +01:00
Tom Lane 4e1d2a1708 Decouple psqlscan.l from surrounding program.
Remove assorted external references from psqlscan.l in preparation for
making it usable by other frontend programs.  This mostly involves
getting rid of direct calls to psql_error() and GetVariable() in favor
of introducing a callback-functions struct to encapsulate variable
fetching and error printing.  In addition, pass the current encoding
and standard-strings status as additional parameters to psql_scan_setup
instead of looking directly at "pset" or calling additional functions.

I did not bother to change some references to psql_error that are in
functions that will soon migrate to a psql-specific backslash-command
lexer.  Other than that, this version of psqlscan.l is capable of
compiling standalone.  It still depends on assorted src/common functions
as well as some encoding-related libpq functions, but we expect that
all programs using it will be happy with those dependencies.

Kyotaro Horiguchi, somewhat editorialized on by me
2016-03-18 15:05:59 -04:00
Robert Haas 08a6d36dcb Use INT64_FORMAT instead of %ld for int64.
Commit 0011c0091e introduced this
mistake.

Patch by me.  Reported by Andres Freund, who also reviewed the
patch.
2016-03-18 14:54:09 -04:00
Andres Freund c4901a1e03 Only clear latch self-pipe/event if there is a pending notification.
This avoids a good number of, individually quite fast, system calls in
scenarios with many quick queries. Besides the aesthetic benefit of
seing fewer superflous system calls with strace, it also improves
performance by ~2% measured by pgbench -M prepared -c 96 -j 8 -S (scale
100).

Without having benchmarked it, this patch also adjust the windows code,
as that makes it easier to unify the unix/windows codepaths in a later
patch. There's little reason to diverge in behaviour between the
platforms.

Discussion: CA+TgmoYc1Zm+Szoc_Qbzi92z2c1vRHZmjhfPn5uC=w8bXv6Avg@mail.gmail.com
Reviewed-By: Robert Haas
2016-03-18 11:47:05 -07:00
Andres Freund c17966201c Make it easier to choose the used waiting primitive in unix_latch.c.
This allows for easier testing of the different primitives; in
preparation for adding a new primitive.

Discussion: 20160114143931.GG10941@awork2.anarazel.de
Reviewed-By: Robert Haas
2016-03-18 11:46:54 -07:00
Andres Freund 6bc4d95fcc Error out if waiting on socket readiness without a specified socket.
Previously we just ignored such an attempt, but that seems to serve no
purpose but making things harder to debug.

Discussion: 20160114143931.GG10941@awork2.anarazel.de
    20151230173734.hx7jj2fnwyljfqek@alap3.anarazel.de
Reviewed-By: Robert Haas
2016-03-18 11:46:45 -07:00
Andres Freund fad0f9d8c9 Remove unused, and dangerous, TestLatch() macro.
The macro has not seen any in-tree use since latches had been introduced
in 2746e5f, in 2010.
2016-03-18 11:46:42 -07:00
Robert Haas 0bf3ae88af Directly modify foreign tables.
postgres_fdw can now sent an UPDATE or DELETE statement directly to
the foreign server in simple cases, rather than sending a SELECT FOR
UPDATE statement and then updating or deleting rows one-by-one.

Etsuro Fujita, reviewed by Rushabh Lathia, Shigeru Hanada, Kyotaro
Horiguchi, Albe Laurenz, Thom Brown, and me.
2016-03-18 13:55:52 -04:00
Tom Lane 3422feccca Clean up some misplaced #includes.
Random .h files have no business including postgres-fe.h (or postgres.h).
If that wasn't the first #include done by the calling .c file, it's the
.c file that's broken.  Noted while prepping Kyotaro Horiguchi's psql
lexer refactoring patch.
2016-03-18 13:43:17 -04:00
Teodor Sigaev 3187d6de0e Introduce parse_ident()
SQL-layer function to split qualified identifier into array parts.

Author: Pavel Stehule with minor editorization by me and Jim Nasby
2016-03-18 18:16:14 +03:00
Robert Haas 992b5ba30d Push scan/join target list beneath Gather when possible.
This means that, for example, "SELECT expensive_func(a) FROM bigtab
WHERE something" can compute expensive_func(a) in the workers rather
than the leader if it happens to be parallel-safe, which figures to be
a big win in some practical cases.

Currently, we can only do this if the entire target list is
parallel-safe.  If we worked harder, we might be able to evaluate
parallel-safe targets in the worker and any parallel-restricted
targets in the leader, but that would be more complicated, and there
aren't that many parallel-restricted functions that people are likely
to use in queries anyway.  I think.  So just do the simple thing for
the moment.

Robert Haas, Amit Kapila, and Tom Lane
2016-03-18 09:50:05 -04:00
Robert Haas 2d8a1e22b1 Various minor corrections of and improvements to comments.
Aleksander Alekseev
2016-03-18 09:38:59 -04:00
Tom Lane bd0ab28912 Remove useless double calls of make_parsestate().
Aleksander Alekseev
2016-03-17 16:46:35 -04:00
Robert Haas c27033ff7c Update tuplesort.c comments for memory mangement improvements.
I'm committing these changes separately so that it's clear what is
Peter's original work versus what I changed.  This is a followup to
commit 0011c0091e, and these changes
are all by me.
2016-03-17 16:11:14 -04:00
Robert Haas 0011c0091e Improve memory management for external sorts.
Introduce a new memory context which stores tuple data, and reset it
at the end of each merge pass; this helps avoid memory fragmentation
and, consequently, overallocation.  Also, for the final merge patch,
eliminate memory context chunk header overhead entirely by allocating
all of the memory used for buffering tuples during the merge in a
single chunk.  Since this modestly increases the number of tuples we
can store, grow the memtuples array a bit so that we're less likely to
run short of slots there.

Peter Geoghegan.  Review and testing of patches in this series by
Jeff Janes, Greg Stark, Mithun Cy, and me.
2016-03-17 16:10:41 -04:00
Tom Lane 55c3a04d60 Fix assorted breakage in to_char()'s OF format option.
In HEAD, fix incorrect field width for hours part of OF when tm_gmtoff is
negative.  This was introduced by commit 2d87eedc1d as a result of
falsely applying a pattern that's correct when + signs are omitted, which
is not the case for OF.

In 9.4, fix missing abs() call that allowed a sign to be attached to the
minutes part of OF.  This was fixed in 9.5 by 9b43d73b3f, but for
inscrutable reasons not back-patched.

In all three versions, ensure that the sign of tm_gmtoff is correctly
reported even when the GMT offset is less than 1 hour.

Add regression tests, which evidently we desperately need here.

Thomas Munro and Tom Lane, per report from David Fetter
2016-03-17 15:50:33 -04:00
Teodor Sigaev f4ceed6ceb Improve support of Hunspell
- allow to use non-ascii characters as affix flag. Non-numeric affix flags now
  are stored as string instead of numeric value of character.
- allow to use 0 as affix flag in numeric encoded affixes

That adds support for arabian, hungarian, turkish and
brazilian portuguese languages.

Author: Artur Zakirov with heavy editorization by me
2016-03-17 17:23:38 +03:00
Robert Haas 0218e8b3fa Fix typos.
Jim Nasby
2016-03-17 07:26:20 -04:00
Peter Eisentraut fc201dfd95 Add syslog_split_messages parameter
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-03-16 23:21:44 -04:00
Peter Eisentraut f4c454e9ba Add syslog_sequence_numbers parameter
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2016-03-16 23:21:44 -04:00
Tom Lane 47211af17a Fix "pg_bench -C -M prepared".
This didn't work because when we dropped and re-established a database
connection, we did not bother to reset session-specific state such as
the statements-are-prepared flags.

The st->prepared[] array certainly needs to be flushed, and I cleared a
couple of other fields as well that couldn't possibly retain meaningful
state for a new connection.

In passing, fix some bogus comments and strange field order choices.

Per report from Robins Tharakan.
2016-03-16 23:18:07 -04:00
Tom Lane 5db5146431 Fix j2day() to behave sanely for negative Julian dates.
Somebody had apparently once figured that casting to unsigned int would
produce the right output for negative inputs, but that would only be
true if 2^32 were a multiple of 7, which of course it ain't.  We need
to use a signed division and then correct the sign of the remainder.

AFAICT, the only case where this would arise currently is when doing
ISO-week calculations for dates in 4714BC, where we'd compute a
negative Julian date representing 4714-01-04BC and then do some
arithmetic with it.  Since we don't even really document support for
such dates, this is not of much consequence.  But we may as well
get it right.

Per report from Vitaly Burovoy.
2016-03-16 20:57:45 -04:00
Tom Lane a70e13a39e Be more careful about out-of-range dates and timestamps.
Tighten the semantics of boundary-case timestamptz so that we allow
timestamps >= '4714-11-24 00:00+00 BC' and < 'ENDYEAR-01-01 00:00+00 AD'
exactly, no more and no less, but it is allowed to enter timestamps
within that range using non-GMT timezone offsets (which could make the
nominal date 4714-11-23 BC or ENDYEAR-01-01 AD).  This eliminates
dump/reload failure conditions for timestamps near the endpoints.
To do this, separate checking of the inputs for date2j() from the
final range check, and allow the Julian date code to handle a range
slightly wider than the nominal range of the datatypes.

Also add a bunch of checks to detect out-of-range dates and timestamps
that formerly could be returned by operations such as date-plus-integer.
All C-level functions that return date, timestamp, or timestamptz should
now be proof against returning a value that doesn't pass IS_VALID_DATE()
or IS_VALID_TIMESTAMP().

Vitaly Burovoy, reviewed by Anastasia Lubennikova, and substantially
whacked around by me
2016-03-16 19:09:28 -04:00
Robert Haas f2b74b01d4 Another comment update.
I thought this was in my last commit, but I goofed.
2016-03-16 14:28:25 -04:00
Robert Haas bc55cc0b6a Fix problems in commit c16dc1aca5.
Vinayak Pokale provided a patch for a copy-and-paste error in a
comment.  I noticed that I'd use the word "automatically" nearby where
I meant to talk about things being "atomic".  Rahila Syed spotted a
misplaced counter update.  Fix all that stuff.
2016-03-16 13:54:04 -04:00
Robert Haas c6dda1f48e Add idle_in_transaction_session_timeout.
Vik Fearing, reviewed by Stéphane Schildknecht and me, and revised
slightly by me.
2016-03-16 11:30:45 -04:00
Peter Eisentraut f9e5ed61ed UCS_to_EUC_JIS_2004.pl: Turn off "test" mode by default
It produces debugging output files that are of no further use, so we
don't need that by default.
2016-03-16 10:43:05 -04:00
Peter Eisentraut 9dbcb500ca Make spacing and punctuation consistent 2016-03-16 10:43:05 -04:00
Robert Haas 3aff33aa68 Fix typos.
Oskari Saarenmaa
2016-03-15 18:06:11 -04:00
Stephen Frost fd658dbb30 Avoid incorrectly indicating exclusion constraint wait
INSERT ... ON CONFLICT's precheck may have to wait on the outcome of
another insertion, which may or may not itself be a speculative
insertion.  This wait is not necessarily associated with an exclusion
constraint, but was always reported that way in log messages if the wait
happened to involve a tuple that had no speculative token.

Initially discovered through use of ON CONFLICT DO NOTHING, where
spurious references to exclusion constraints in log messages were more
likely.

Patch by Peter Geoghegan.
Reviewed by Julien Rouhaud.

Back-patch to 9.5 where INSERT ... ON CONFLICT was added.
2016-03-15 18:04:39 -04:00
Alvaro Herrera 5bcc413f80 Fix typos in comments 2016-03-15 17:57:17 -03:00
Robert Haas c16dc1aca5 Add simple VACUUM progress reporting.
There's a lot more that could be done here yet - in particular, this
reports only very coarse-grained information about the index vacuuming
phase - but even as it stands, the new pg_stat_progress_vacuum can
tell you quite a bit about what a long-running vacuum is actually
doing.

Amit Langote and Robert Haas, based on earlier work by Vinayak Pokale
and Rahila Syed.
2016-03-15 13:32:56 -04:00
Tom Lane 0e9b89986b Cope if platform declares mbstowcs_l(), but not locale_t, in <xlocale.h>.
Previously, we included <xlocale.h> only if necessary to get the definition
of type locale_t.  According to notes in PGAC_TYPE_LOCALE_T, this is
important because on some versions of glibc that file supplies an
incompatible declaration of locale_t.  (This info may be obsolete, because
on my RHEL6 box that seems to be the *only* definition of locale_t; but
there may still be glibc's in the wild for which it's a live concern.)

It turns out though that on FreeBSD and maybe other BSDen, you can get
locale_t from stdlib.h or locale.h but mbstowcs_l() and friends only from
<xlocale.h>.  This was leaving us compiling calls to mbstowcs_l() and
friends with no visible prototype, which causes a warning and could
possibly cause actual trouble, since it's not declared to return int.

Hence, adjust the configure checks so that we'll include <xlocale.h>
either if it's necessary to get type locale_t or if it's necessary to
get a declaration of mbstowcs_l().

Report and patch by Aleksander Alekseev, somewhat whacked around by me.
Back-patch to all supported branches, since we have been using
mbstowcs_l() since 9.1.
2016-03-15 13:19:57 -04:00
Tom Lane 101fd9349e Add a GetForeignUpperPaths callback function for FDWs.
This is basically like the just-added create_upper_paths_hook, but
control is funneled only to the FDW responsible for all the baserels
of the current query; so providing such a callback is much less likely
to add useless overhead than using the hook function is.

The documentation is a bit sketchy.  We'll likely want to improve it,
and/or adjust the call conventions, when we get some experience with
actually using this callback.  Hopefully somebody will find time to
experiment with it before 9.6 feature freeze.
2016-03-14 20:04:48 -04:00
Peter Eisentraut be6de4c121 Add missing include for self-containment 2016-03-14 19:56:33 -04:00
Robert Haas 270b7daf5c Fix EXPLAIN ANALYZE SELECT INTO not to choose a parallel plan.
We don't support any parallel write operations at present, so choosing
a parallel plan causes us to error out.  Also, add a new regression
test that uses EXPLAIN ANALYZE SELECT INTO; if we'd had this previously,
force_parallel_mode testing would have caught this issue.

Mithun Cy and Robert Haas
2016-03-14 19:48:46 -04:00
Tom Lane 5864d6a4b6 Provide a planner hook at a suitable place for creating upper-rel Paths.
In the initial revision of the upper-planner pathification work, the only
available way for an FDW or custom-scan provider to inject Paths
representing post-scan-join processing was to insert them during scan-level
GetForeignPaths or similar processing.  While that's not impossible, it'd
require quite a lot of duplicative processing to look forward and see if
the extension would be capable of implementing the whole query.  To improve
matters for custom-scan providers, provide a hook function at the point
where the core code is about to start filling in upperrel Paths.  At this
point Paths are available for the whole scan/join tree, which should reduce
the amount of redundant effort considerably.

(An alternative design that was suggested was to provide a separate hook
for each post-scan-join processing step, but that seems messy and not
clearly more useful.)

Following our time-honored tradition, there's no documentation for this
hook outside the source code.

As-is, this hook is only meant for custom scan providers, which we can't
assume very much about.  A followon patch will implement an FDW callback
to let FDWs do the same thing in a somewhat more structured fashion.
2016-03-14 19:23:29 -04:00
Tom Lane 28048cbaa2 Allow callers of create_foreignscan_path to specify nondefault PathTarget.
Although the default choice of rel->reltarget should typically be
sufficient for scan or join paths, it's not at all sufficient for the
purposes PathTargets were invented for; in particular not for
upper-relation Paths.  So break API compatibility by adding a PathTarget
argument to create_foreignscan_path().  To ease updating of existing
code, accept a NULL value of the argument as selecting rel->reltarget.
2016-03-14 17:31:28 -04:00
Tom Lane 307c78852f Rethink representation of PathTargets.
In commit 19a541143a I did not make PathTarget a subtype of Node,
and embedded a RelOptInfo's reltarget directly into it rather than having
a separately-allocated Node.  In hindsight that was misguided
micro-optimization, enabled by the fact that at that point we didn't have
any Paths with custom PathTargets.  Now that PathTarget processing has
been fleshed out some more, it's easier to see that it's better to have
PathTarget as an indepedent Node type, even if it does cost us one more
palloc to create a RelOptInfo.  So change it while we still can.

This commit just changes the representation, without doing anything more
interesting than that.
2016-03-14 16:59:59 -04:00
Tom Lane 07341a2980 Update PL/Perl's comment about hv_store().
Negative klen is documented since Perl 5.16, and 5.6 is no longer
supported so no need to comment about it.

Dagfinn Ilmari Mannsåker
2016-03-14 14:45:45 -04:00
Tom Lane f3f3aae4b7 Improve conversions from uint64 to Perl types.
Perl's integers are pointer-sized, so can hold more than INT_MAX on LP64
platforms, and come in both signed (IV) and unsigned (UV).  Floating
point values (NV) may also be larger than double.

Since Perl 5.19.4 array indices are SSize_t instead of I32, so allow up
to SSize_t_max on those versions.  The limit is not imposed just by
av_extend's argument type, but all the array handling code, so remove
the speculative comment.

Dagfinn Ilmari Mannsåker
2016-03-14 14:38:44 -04:00
Robert Haas 6be84eeb8d Update more comments for 96198d94cb.
Etsuro Fujita, reviewed (though not completely endorsed) by Ashutosh
Bapat, and slightly expanded by me.
2016-03-14 14:29:12 -04:00
Tom Lane 74a379b984 Use repalloc_huge() to enlarge a SPITupleTable's tuple pointer array.
Commit 23a27b039d widened the rows-stored counters to uint64, but
that's academic unless we allow the tuple pointer array to exceed 1GB.

(It might be a good idea to provide some other limit on how much storage
a SPITupleTable can eat.  On the other hand, there are plenty of other
ways to drive a backend into swap hell.)

Dagfinn Ilmari Mannsåker
2016-03-14 14:22:34 -04:00
Robert Haas 3adf9ced17 Improve check for overly-long extensible node name.
The old code is bad for two reasons.  First, it has an off-by-one
error.  Second, it won't help if you aren't running with assertions
enabled.  Per discussion, we want a check here in that case too.

Author: KaiGai Kohei, adjusted by me.
Reviewed-by: Petr Jelinek
Discussion: 56E0D547.1030101@2ndquadrant.com
2016-03-14 13:52:52 -04:00
Tom Lane 2da7549987 pg_stat_get_progress_info() should be marked STRICT.
I didn't bother with a catversion bump.

Report and patch by Thomas Munro
2016-03-14 12:51:55 -04:00
Tom Lane ab4ff2889d Fix memory leak in repeated GIN index searches.
Commit d88976cfa1 removed this code from ginFreeScanKeys():
-		if (entry->list)
-			pfree(entry->list);
evidently in the belief that that ItemPointer array is allocated in the
keyCtx and so would be reclaimed by the following MemoryContextReset.
Unfortunately, it isn't and it won't.  It'd likely be a good idea for
that to become so, but as a simple and back-patchable fix in the
meantime, restore this code to ginFreeScanKeys().

Also, add a similar pfree to where startScanEntry() is about to zero out
entry->list.  I am not sure if there are any code paths where this
change prevents a leak today, but it seems like cheap future-proofing.

In passing, make the initial allocation of so->entries[] use palloc
not palloc0.  The code doesn't depend on unused entries being zero;
if it did, the array-enlargement code in ginFillScanEntry() would be
wrong.  So using palloc0 initially can only serve to confuse readers
about what the invariant is.

Per report from Felipe de Jesús Molina Bravo, via Jaime Casanova in
<CAJGNTeMR1ndMU2Thpr8GPDUfiHTV7idELJRFusA5UXUGY1y-eA@mail.gmail.com>
2016-03-13 16:44:31 -04:00
Peter Eisentraut 96adb14d93 Fix whitespace and remove obsolete gitattributes entry 2016-03-13 16:03:13 -04:00
Magnus Hagander a1aa8b7ea0 Fix order of MemSet arguments
Noted by Tomas Vondra
2016-03-13 13:11:06 +01:00
Tom Lane 4b980167cb Report memory context stats upon out-of-memory in repalloc[_huge].
This longstanding functionality evidently got lost in commit
3d6d1b5855.  Noted while studying an OOM report from Jaime
Casanova.  Backpatch to 9.5 where the bug was introduced.
2016-03-13 00:21:07 -05:00
Tom Lane ab737f6ba9 Fix Windows portability issue in 23a27b039d.
_strtoui64() is available in MSVC builds, but apparently not with
other Windows toolchains.  Thanks to Petr Jelinek for the diagnosis.
2016-03-12 22:34:47 -05:00
Tom Lane fc7a9dfddb Get rid of scribbling on a const variable in psql's print.c.
Commit a2dabf0e1d had the bright idea that it could modify a "const"
global variable if it merely casted away const from a pointer.  This does
not work on platforms where the compiler puts "const" variables into
read-only storage.  Depressingly, we evidently have no such platforms in
our buildfarm ... an oversight I have now remedied.  (The one platform
that is known to catch this is recent OS X with -fno-common.)

Per report from Chris Ruprecht.  Back-patch to 9.5 where the bogus
code was introduced.
2016-03-12 18:16:24 -05:00
Tom Lane 23a27b039d Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout.  Some of these values were declared uint32 before, and
others "long".

I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.

The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command.  Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.

Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long".  It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.

Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-12 16:05:29 -05:00
Andres Freund e01157500f Include portability/mem.h into fd.c for MAP_FAILED.
Buildfarm members gaur and pademelon are old enough not to know about
MAP_FAILED; which is used in 428b1d6. Include portability/mem.h to fix;
as already done in a bunch of other places.
2016-03-12 12:16:48 -08:00
Tom Lane 570be1f73f Re-export a few of createplan.c's make_xxx() functions.
CitusDB is using these and don't wish to redesign their code right now.
I am not on board with this being a good idea, or a good precedent,
but I lack the energy to fight about it.
2016-03-12 12:12:59 -05:00
Robert Haas 7087166a88 pg_upgrade: Convert old visibility map format to new format.
Commit a892234f83 added a second bit per
page to the visibility map, but pg_upgrade has been unaware of it up
until now.  Therefore, a pg_upgrade from an earlier major release of
PostgreSQL to any commit preceding this one and following the one
mentioned above would result in invalid visibility map contents on the
new cluster, very possibly leading to data corruption.  This plugs
that hole.

Masahiko Sawada, reviewed by Jeff Janes, Bruce Momjian, Simon Riggs,
Michael Paquier, Andres Freund, me, and others.
2016-03-11 12:34:20 -05:00
Tom Lane 9118d03a8c When appropriate, postpone SELECT output expressions till after ORDER BY.
It is frequently useful for volatile, set-returning, or expensive functions
in a SELECT's targetlist to be postponed till after ORDER BY and LIMIT are
done.  Otherwise, the functions might be executed for every row of the
table despite the presence of LIMIT, and/or be executed in an unexpected
order.  For example, in
	SELECT x, nextval('seq') FROM tab ORDER BY x LIMIT 10;
it's probably desirable that the nextval() values are ordered the same
as x, and that nextval() is not run more than 10 times.

In the past, Postgres was inconsistent in this area: you would get the
desirable behavior if the ordering were performed via an indexscan, but
not if it had to be done by an explicit sort step.  Getting the desired
behavior reliably required contortions like
	SELECT x, nextval('seq')
	  FROM (SELECT x FROM tab ORDER BY x) ss LIMIT 10;

This patch conditionally postpones evaluation of pure-output target
expressions (that is, those that are not used as DISTINCT, ORDER BY, or
GROUP BY columns) so that they effectively occur after sorting, even if an
explicit sort step is necessary.  Volatile expressions and set-returning
expressions are always postponed, so as to provide consistent semantics.
Expensive expressions (costing more than 10 times typical operator cost,
which by default would include any user-defined function) are postponed
if there is a LIMIT or if there are expressions that must be postponed.

We could be more aggressive and postpone any nontrivial expression, but
there are costs associated with doing so: it requires an extra Result plan
node which adds some overhead, and postponement changes the volume of data
going through the sort step, perhaps for the worse.  Since we tend not to
have very good estimates of the output width of nontrivial expressions,
it's hard to have much confidence in our ability to predict whether
postponement would increase or decrease the cost of the sort; therefore
this patch doesn't attempt to make decisions conditionally on that.
Between these factors and a general desire not to change query behavior
when there's not a demonstrable benefit, it seems best to be conservative
about applying postponement.  We might tweak the decision rules in the
future, though.

Konstantin Knizhnik, heavily rewritten by me
2016-03-11 12:27:50 -05:00
Teodor Sigaev b1fdc727c3 Fix Windows build broken in 6943a946c7
Also it fixes dynamic array allocation disallowed by ANSI-C.

Author: Stas Kelvich
2016-03-11 20:10:20 +03:00
Teodor Sigaev 8829af47ef Fix merge affixes for numeric ones
Some dictionaries have duplicated base words with different affix set, we
just merge that sets into one set. But previously merging of sets of affixes
was actually a concatenation of strings but it's wrong for numeric
representation of affixes because such representation uses comma to
separate affixes.

Author: Artur Zakirov
2016-03-11 19:47:50 +03:00
Teodor Sigaev a9eb6c83ef Bump catalog version missed in 6943a946c7 2016-03-11 19:31:04 +03:00
Teodor Sigaev 6943a946c7 Tsvector editing functions
Adds several tsvector editting function: convert tsvector to/from text array,
set weight for given lexemes, delete lexeme(s), unnest, filter lexemes
with given weights

Author: Stas Kelvich with some editorization by me
Reviewers: Tomas Vondram, Teodor Sigaev
2016-03-11 19:22:36 +03:00
Tom Lane 49635d7b3e Minor additional refactoring of planner.c's PathTarget handling.
Teach make_group_input_target() and make_window_input_target() to work
entirely with the PathTarget representation of tlists, rather than
constructing a tlist and immediately deconstructing it into PathTarget
format.  In itself this only saves a few palloc's; the bigger picture is
that it opens the door for sharing cost_qual_eval work across all of
planner.c's constructions of PathTargets.  I'll come back to that later.

In support of this, flesh out tlist.c's infrastructure for PathTargets
a bit more.
2016-03-11 10:24:55 -05:00
Robert Haas 69ab7b9d6c psql: Don't automatically use expanded format when there's 1 column.
Andreas Karlsson and Robert Haas
2016-03-11 08:04:01 -05:00
Robert Haas 481c76abf4 Fix a typo, and remove unnecessary pgstat_report_wait_end().
Per Amit Kapila.
2016-03-11 07:34:00 -05:00
Magnus Hagander 38c83c9b75 Refactor receivelog.c parameters
Much cruft had accumulated over time with a large number of parameters
passed down between functions very deep. With this refactoring, instead
introduce a StreamCtl structure that holds the parameters, and pass around
a pointer to this structure instead. This makes it much easier to add or
remove fields that are needed deeper down in the implementation without
having to modify every function header in the file.

Patch by me after much nagging from Andres
Reviewed by Craig Ringer and Daniel Gustafsson
2016-03-11 11:15:12 +01:00
Simon Riggs 73e7e49da3 Allow emit_log_hook to see original message text
emit_log_hook could only see the translated text, making it harder to identify
which message was being sent. Pass original text to allow the exact message to
be identified, whichever language is used for logging.

Discussion: 20160216.184755.59721141.horiguchi.kyotaro@lab.ntt.co.jp
Author: Kyotaro Horiguchi
2016-03-11 09:53:06 +00:00
Robert Haas a414d96ad2 Simplify GetLockNameFromTagType.
The old code is wrong, because it returns a pointer to an automatic
variable.  And it's also more clever than we really need to be
considering that the case it's worrying about should never happen.
2016-03-10 21:37:22 -05:00
Andres Freund c94f0c29ce Blindly try to fix dtrace enabled builds, broken in 9cd00c45.
Reported-By: Peter Eisentraut
Discussion: 56E2239E.1050607@gmx.net
2016-03-10 17:51:03 -08:00
Andres Freund 9cd00c457e Checkpoint sorting and balancing.
Up to now checkpoints were written in the order they're in the
BufferDescriptors. That's nearly random in a lot of cases, which
performs badly on rotating media, but even on SSDs it causes slowdowns.

To avoid that, sort checkpoints before writing them out. We currently
sort by tablespace, relfilenode, fork and block number.

One of the major reasons that previously wasn't done, was fear of
imbalance between tablespaces. To address that balance writes between
tablespaces.

The other prime concern was that the relatively large allocation to sort
the buffers in might fail, preventing checkpoints from happening. Thus
pre-allocate the required memory in shared memory, at server startup.

This particularly makes it more efficient to have checkpoint flushing
enabled, because that'll often result in a lot of writes that can be
coalesced into one flush.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
2016-03-10 17:05:09 -08:00
Andres Freund 428b1d6b29 Allow to trigger kernel writeback after a configurable number of writes.
Currently writes to the main data files of postgres all go through the
OS page cache. This means that some operating systems can end up
collecting a large number of dirty buffers in their respective page
caches.  When these dirty buffers are flushed to storage rapidly, be it
because of fsync(), timeouts, or dirty ratios, latency for other reads
and writes can increase massively.  This is the primary reason for
regular massive stalls observed in real world scenarios and artificial
benchmarks; on rotating disks stalls on the order of hundreds of seconds
have been observed.

On linux it is possible to control this by reducing the global dirty
limits significantly, reducing the above problem. But global
configuration is rather problematic because it'll affect other
applications; also PostgreSQL itself doesn't always generally want this
behavior, e.g. for temporary files it's undesirable.

Several operating systems allow some control over the kernel page
cache. Linux has sync_file_range(2), several posix systems have msync(2)
and posix_fadvise(2). sync_file_range(2) is preferable because it
requires no special setup, whereas msync() requires the to-be-flushed
range to be mmap'ed. For the purpose of flushing dirty data
posix_fadvise(2) is the worst alternative, as flushing dirty data is
just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
from the page cache.  Thus the feature is enabled by default only on
linux, but can be enabled on all systems that have any of the above
APIs.

While desirable and likely possible this patch does not contain an
implementation for windows.

With the infrastructure added, writes made via checkpointer, bgwriter
and normal user backends can be flushed after a configurable number of
writes. Each of these sources of writes controlled by a separate GUC,
checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
respectively; they're separate because the number of flushes that are
good are separate, and because the performance considerations of
controlled flushing for each of these are different.

A later patch will add checkpoint sorting - after that flushes from the
ckeckpoint will almost always be desirable. Bgwriter flushes are most of
the time going to be random, which are slow on lots of storage hardware.
Flushing in backends works well if the storage and bgwriter can keep up,
but if not it can have negative consequences.  This patch is likely to
have negative performance consequences without checkpoint sorting, but
unfortunately so has sorting without flush control.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
2016-03-10 17:04:34 -08:00
Tom Lane c82c92b111 Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too.
All along, this function should have treated WindowFuncs in a manner
similar to Aggrefs, ie with an option whether or not to recurse into them.
By not considering the case, it was always recursing, which is OK for most
callers (although I suspect that the case in prepare_sort_from_pathkeys
might represent a bug).  But now we need return-without-recursing behavior
as well.  There are also more than a few callers that should never see a
WindowFunc, and now we'll get some error checking on that.
2016-03-10 16:23:52 -05:00
Robert Haas fd31cd2651 Don't vacuum all-frozen pages.
Commit a892234f83 gave us enough
infrastructure to avoid vacuuming pages where every tuple on the
page is already frozen.  So, replace the notion of a scan_all or
whole-table vacuum with the less onerous notion of an "aggressive"
vacuum, which will pages that are all-visible, but still skip those
that are all-frozen.

This should greatly reduce the cost of anti-wraparound vacuuming
on large clusters where the majority of data is never touched
between one cycle and the next, because we'll no longer have to
read all of those pages only to find out that we don't need to
do anything with them.

Patch by me, reviewed by Masahiko Sawada.
2016-03-10 16:14:42 -05:00
Tom Lane 364a9f47ab Refactor pull_var_clause's API to make it less tedious to extend.
In commit 1d97c19a0f and later c1d9579dd8, we extended
pull_var_clause's API by adding enum-type arguments.  That's sort of a pain
to maintain, though, because it means every time we add a new behavior we
must touch every last one of the call sites, even if there's a reasonable
default behavior that most of them could use.  Let's switch over to using a
bitmask of flags, instead; that seems more maintainable and might save a
nanosecond or two as well.  This commit changes no behavior in itself,
though I'm going to follow it up with one that does add a new behavior.

In passing, remove flatten_tlist(), which has not been used since 9.1
and would otherwise need the same API changes.

Removing these enums means that optimizer/tlist.h no longer needs to
depend on optimizer/var.h.  Changing that caused a number of C files to
need addition of #include "optimizer/var.h" (probably we can thank old
runs of pgrminclude for that); but on balance it seems like a good change
anyway.
2016-03-10 15:53:07 -05:00
Simon Riggs 37c54863cf Rework wait for AccessExclusiveLocks on Hot Standby
Earlier version committed in 9.0 caused spurious waits in some cases.
New infrastructure for lock waits in 9.3 used to correct and improve this.

Jeff Janes based upon a proposal by Simon Riggs, who also reviewed
Additional review comments from Amit Kapila
2016-03-10 19:26:24 +00:00
Robert Haas 53be0b1add Provide much better wait information in pg_stat_activity.
When a process is waiting for a heavyweight lock, we will now indicate
the type of heavyweight lock for which it is waiting.  Also, you can
now see when a process is waiting for a lightweight lock - in which
case we will indicate the individual lock name or the tranche, as
appropriate - or for a buffer pin.

Amit Kapila, Ildus Kurbangaliev, reviewed by me.  Lots of helpful
discussion and suggestions by many others, including Alexander
Korotkov, Vladimir Borodin, and many others.
2016-03-10 12:44:09 -05:00
Magnus Hagander 9d90388247 Avoid crash on old Windows with AVX2-capable CPU for VS2013 builds
The Visual Studio 2013 CRT generates invalid code when it makes a 64-bit
build that is later used on a CPU that supports AVX2 instructions using a
version of Windows before 7SP1/2008R2SP1.

Detect this combination, and in those cases turn off the generation of
FMA3, per recommendation from the Visual Studio team.

The bug is actually in the CRT shipping with Visual Studio 2013, but
Microsoft have stated they're only fixing it in newer major versions.
The fix is therefor conditioned specifically on being built with this
version of Visual Studio, and not previous or later versions.

Author: Christian Ullrich
2016-03-10 14:10:18 +01:00
Simon Riggs e0694cf9c7 Reduce size of two phase file header
Previously 2PC header was fixed at 200 bytes, which in most cases wasted
WAL space for a workload using 2PC heavily.

Pavan Deolasee, reviewed by Petr Jelinek
2016-03-10 12:51:46 +00:00
Simon Riggs fcb4bfddb6 Reduce lock level for altering fillfactor
Fabrízio de Royes Mello and Simon Riggs
2016-03-10 12:07:33 +00:00
Robert Haas 090b287fc5 Code review for b6fb6471f6.
Reports by Tomas Vondra, Vinayak Pokale, and Aleksander Alekseev.
Patch by Amit Langote.
2016-03-10 06:07:57 -05:00
Tom Lane cc402116ca Remove a couple of useless pstrdup() calls.
There's no point in pstrdup'ing the result of TextDatumGetCString,
since that's necessarily already a freshly-palloc'd C string.

These particular calls are unlikely to be of any consequence
performance-wise, but still they're a bad precedent that can confuse
future patch authors.

Noted by Chapman Flack.
2016-03-09 23:29:05 -05:00
Andres Freund 1d4a0ab19a Avoid unlikely data-loss scenarios due to rename() without fsync.
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes. Use the previously added durable_rename()/durable_link_or_rename()
in various places where we previously just renamed files.

Most of the changed call sites are arguably not critical, but it seems
better to err on the side of too much durability.  The most prominent
known case where the previously missing fsyncs could cause data loss is
crashes at the end of a checkpoint. After the actual checkpoint has been
performed, old WAL files are recycled. When they're filled, their
contents are fdatasynced, but we did not fsync the containing
directory. An OS/hardware crash in an unfortunate moment could then end
up leaving that file with its old name, but new content; WAL replay
would thus not replay it.

Reported-By: Tomas Vondra
Author: Michael Paquier, Tomas Vondra, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
2016-03-09 18:53:53 -08:00
Andres Freund 606e0f9841 Introduce durable_rename() and durable_link_or_rename().
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes; especially on filesystems like xfs and ext4 when mounted
with data=writeback. To be certain that a rename() atomically replaces
the previous file contents in the face of crashes and different
filesystems, one has to fsync the old filename, rename the file, fsync
the new filename, fsync the containing directory.  This sequence is not
generally adhered to currently; which exposes us to data loss risks. To
avoid having to repeat this arduous sequence, introduce
durable_rename(), which wraps all that.

Also add durable_link_or_rename(). Several places use link() (with a
fallback to rename()) to rename a file, trying to avoid replacing the
target file out of paranoia. Some of those rename sequences need to be
durable as well. There seems little reason extend several copies of the
same logic, so centralize the link() callers.

This commit does not yet make use of the new functions; they're used in
a followup commit.

Author: Michael Paquier, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
2016-03-09 18:53:53 -08:00
Alvaro Herrera 28f6df3c36 PostgresNode: add backup_fs_hot and backup_fs_cold
These simple methods rely on RecursiveCopy to create a filesystem-level
backup of a server.  They aren't currently used anywhere yet,but will be
useful for future tests.

Author: Craig Ringer
Reviewed-By: Michael Paquier, Salvador Fandino, Álvaro Herrera
Commitfest-URL: https://commitfest.postgresql.org/9/569/
2016-03-09 19:54:03 -03:00
Alvaro Herrera a31aaec406 Add filter capability to RecursiveCopy::copypath
This allows skipping copying certain files and subdirectories in tests.
This is useful in some circumstances such as copying a data directory;
future tests want this feature.

Also POD-ify the module.

Authors: Craig Ringer, Pallavi Sontakke
Reviewed-By: Álvaro Herrera
2016-03-09 18:00:31 -03:00
Tom Lane a298a1e06f Fix incorrect handling of NULL index entries in indexed ROW() comparisons.
An index search using a row comparison such as ROW(a, b) > ROW('x', 'y')
would stop upon reaching a NULL entry in the "b" column, ignoring the
fact that there might be non-NULL "b" values associated with later values
of "a".  This happens because _bt_mark_scankey_required() marks the
subsidiary scankey for "b" as required, which is just wrong: it's for
a column after the one with the first inequality key (namely "a"), and
thus can't be considered a required match.

This bit of brain fade dates back to the very beginnings of our support
for indexed ROW() comparisons, in 2006.  Kind of astonishing that no one
came across it before Glen Takahashi, in bug #14010.

Back-patch to all supported versions.

Note: the given test case doesn't actually fail in unpatched 9.1, evidently
because the fix for bug #6278 (i.e., stopping at nulls in either scan
direction) is required to make it fail.  I'm sure I could devise a case
that fails in 9.1 as well, perhaps with something involving making a cursor
back up; but it doesn't seem worth the trouble.
2016-03-09 14:51:22 -05:00
Robert Haas be060cbcd4 Re-pgindent vacuumlazy.c. 2016-03-09 13:51:11 -05:00
Robert Haas accf7616ff pgbench: When -T is used, don't wait for transactions beyond end of run.
At low rates, this can lead to pgbench taking significantly longer to
terminate than the user might expect.  Repair.

Fabien Coelho, reviewed by Aleksander Alekseev, Álvaro Herrera, and me.
2016-03-09 13:11:05 -05:00
Robert Haas b6fb6471f6 Add a generic command progress reporting facility.
Using this facility, any utility command can report the target relation
upon which it is operating, if there is one, and up to 10 64-bit
counters; the intent of this is that users should be able to figure out
what a utility command is doing without having to resort to ugly hacks
like attaching strace to a backend.

As a demonstration, this adds very crude reporting to lazy vacuum; we
just report the target relation and nothing else.  A forthcoming patch
will make VACUUM report a bunch of additional data that will make this
much more interesting.  But this gets the basic framework in place.

Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by
Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao,
and Masanori Oyama.
2016-03-09 12:08:58 -05:00
Tom Lane 8776c15c85 Fix incorrect tlist generation in create_gather_plan().
This function is written as though Gather doesn't project; but it does.
Even if it did not project, though, we must use build_path_tlist to ensure
that the output columns receive correct sortgroupref labeling.

Per report from Amit Kapila.
2016-03-09 10:56:46 -05:00
Tom Lane d31f20e2b5 Fix copy-and-pasteo in comment.
Wensheng Zhang
2016-03-09 10:29:14 -05:00
Tom Lane 51c0f63e4d Improve handling of pathtargets in planner.c.
Refactor so that the internal APIs in planner.c deal in PathTargets not
targetlists, and establish a more regular structure for deriving the
targets needed for successive steps.

There is more that could be done here; calculating the eval costs of each
successive target independently is both inefficient and wrong in detail,
since we won't actually recompute values available from the input node's
tlist.  But it's no worse than what happened before the pathification
rewrite.  In any case this seems like a good starting point for considering
how to handle Konstantin Knizhnik's function-evaluation-postponement patch.
2016-03-09 01:12:16 -05:00
Andres Freund 2f1f443930 Add valgrind suppressions for python code.
Python's allocator does some low-level tricks for efficiency;
unfortunately they trigger valgrind errors. Those tricks can be disabled
making instrumentation easier; but few people testing postgres will have
such a build of python. So add broad suppressions of the resulting
errors.

See also https://svn.python.org/projects/python/trunk/Misc/README.valgrind

This possibly will suppress valid errors, but without it it's basically
impossible to use valgrind with plpython code.

Author: Andres Freund
Backpatch: 9.4, where we started to maintain valgrind suppressions
2016-03-08 19:40:58 -08:00
Andres Freund 5e43bee830 Add valgrind suppressions for bootstrap related code.
Author: Andres Freund
Backpatch: 9.4, where we started to maintain valgrind suppressions
2016-03-08 19:40:58 -08:00
Tom Lane 9e8b99420f Improve handling of group-column indexes in GroupingSetsPath.
Instead of having planner.c compute a groupColIdx array and store it in
GroupingSetsPaths, make create_groupingsets_plan() find the grouping
columns by searching in the child plan node's tlist.  Although that's
probably a bit slower for create_groupingsets_plan(), it's more like
the way every other plan node type does this, and it provides positive
confirmation that we know which child output columns we're supposed to be
grouping on.  (Indeed, looking at this now, I'm not at all sure that it
wasn't broken before, because create_groupingsets_plan() isn't demanding
an exact tlist match from its child node.)  Also, this allows substantial
simplification in planner.c, because it no longer needs to compute the
groupColIdx array at all; no other cases were using it.

I'd intended to put off this refactoring until later (like 9.7), but
in view of the likely bug fix and the need to rationalize planner.c's
tlist handling so we can do something sane with Konstantin Knizhnik's
function-evaluation-postponement patch, I think it can't wait.
2016-03-08 22:32:11 -05:00
Peter Eisentraut a40814d7aa Handle invalid libpq sockets in more places
Also, make error messages consistent.

From: Michael Paquier <michael.paquier@gmail.com>
2016-03-08 21:10:33 -05:00
Peter Eisentraut a2fd62dd53 Suppress GCC 6 warning about self-comparison
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
2016-03-08 19:41:51 -05:00
Peter Eisentraut 92d4294d4b psql: Fix some strange code in SQL help creation
Struct QL_HELP used to be defined as static in the sql_help.h header
file, which is included in sql_help.c and help.c, thus creating two
separate instances of the struct.  This causes a warning from GCC 6,
because the struct is not used in sql_help.c.

Instead, declare the struct as extern in the header file and define it
in sql_help.c.  This also allows making a bunch of functions static
because they are no longer needed outside of sql_help.c.

Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
2016-03-08 19:41:51 -05:00
Peter Eisentraut 0d0644dce8 ecpg: Fix typo
GCC 6 points out the redundant conditions, which were apparently typos.

Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
2016-03-08 19:41:51 -05:00
Tom Lane 61fd218930 Fix minor thinko in pathification code.
I passed the wrong "root" struct to create_pathtarget in build_minmax_path.
Since the subroot is a clone of the outer root, this would not cause any
serious problems, but it would waste some cycles because
set_pathtarget_cost_width would not have access to Var width estimates
set up while running query_planner on the subroot.
2016-03-08 16:50:44 -05:00
Andres Freund e66197fa2e plperl: Correctly handle empty arrays in plperl_ref_from_pg_array.
plperl_ref_from_pg_array() didn't consider the case that postgrs arrays
can have 0 dimensions (when they're empty) and accessed the first
dimension without a check. Fix that by special casing the empty array
case.

Author: Alex Hunsaker
Reported-By: Andres Freund / valgrind / buildfarm animal skink
Discussion: 20160308063240.usnzg6bsbjrne667@alap3.anarazel.de
Backpatch: 9.1-
2016-03-08 13:42:57 -08:00
Tom Lane 8c314b9853 Finish refactoring make_foo() functions in createplan.c.
This patch removes some redundant cost calculations that I left for later
cleanup in commit 3fc6e2d7f5.  There's now a uniform policy that the
make_foo() convenience functions don't do any cost calculations.  Most of
their callers copy costs from the source Path node, and for those that
don't, the calculation in the make_foo() function wasn't necessarily right
anyhow.  (make_result() was particularly a mess, as it was serving multiple
callers using cost calcs designed for only the first one or two that had
ever existed.)  Aside from saving a few cycles, this ensures that what
EXPLAIN prints matches the costs we used for planning purposes.  It does
not change any planner decisions, since the decisions are already made.
2016-03-08 16:28:34 -05:00
Robert Haas 7400559a3f Comment update for fdw_recheck_quals.
Commit 5fc4c26db5 could've done a better
job updating these comments.

Etsuro Fujita
2016-03-08 14:40:55 -05:00
Robert Haas 734f86d50d Add new flags argument for xl_heap_visible to heap2_desc.
Masahiko Sawada
2016-03-08 13:28:22 -05:00
Robert Haas dcfecaae9e Fix parallel query on standby servers.
Without this fix, it inevitably bombs out with "ERROR:  failed to
initialize transaction_read_only to 0".  Repair.

Ashutosh Sharma; comments adjusted by me.
2016-03-08 10:27:03 -05:00
Robert Haas 070140ee48 Add some functions to fd.c for the convenience of extensions.
For example, if you want to perform an ioctl() on a file descriptor
opened through the fd.c routines, there's no way to do that without
being able to get at the underlying fd.

KaiGai Kohei
2016-03-08 10:09:50 -05:00
Robert Haas 77a1d1e798 Department of second thoughts: remove PD_ALL_FROZEN.
Commit a892234f83 added a second bit per
page to the visibility map, which still seems like a good idea, but it
also added a second page-level bit alongside PD_ALL_VISIBLE to track
whether the visibility map bit was set.  That no longer seems like a
clever plan, because we don't really need that bit for anything.  We
always clear both bits when the page is modified anyway.

Patch by me, reviewed by Kyotaro Horiguchi and Masahiko Sawada.
2016-03-08 08:46:48 -05:00
Robert Haas 6f56b41ac0 pg_upgrade: Remove converter plugin facility.
We've not found a use for this so far, and the current need, which
is to convert the visibility map to a new format, does not suit the
existing design anyway.  So just rip it out.

Author: Masahiko Sawada, slightly revised by me.
Discussion: 20160215211313.GB31273@momjian.us
2016-03-08 08:13:02 -05:00
Tom Lane cf8e7b16a5 Spell "parallel" correctly.
Per David Rowley.
2016-03-07 21:48:17 -05:00
Peter Eisentraut 1c2db8c305 Fix uninstall target in tsearch Makefile
Artur Zakirov
2016-03-07 20:36:59 -05:00
Joe Conway 7b077af500 Make get_controlfile() error logging consistent with src/common
As originally committed, get_controlfile() used a non-standard approach
to error logging. Make it consistent with the majority of error logging
done in src/common.

Applies to master only.
2016-03-07 15:14:20 -08:00
Andres Freund b63bea5fd3 Further improvements to c8f621c43.
Coverity and inspection for the issue addressed in fd45d16f found some
questionable code.

Specifically coverity noticed that the wrong length was added in
ReorderBufferSerializeChange() - without immediate negative consequences
as the variable isn't used afterwards.  During code-review and testing I
noticed that a bit of space was wasted when allocating tuple bufs in
several places.  Thirdly, the debug memset()s in
ReorderBufferGetTupleBuf() reduce the error checking valgrind can do.

Backpatch: 9.4, like c8f621c43.
2016-03-07 14:24:03 -08:00
Tom Lane 3fc6e2d7f5 Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is.  This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps.  Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step.  We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.

In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan.  It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation.  (A couple of regression test outputs change in consequence of
that.  However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)

There is a great deal left to do here.  This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations.  (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.)  I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.

Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 15:58:22 -05:00
Tom Lane b642e50aea Fix backwards test for Windows service-ness in pg_ctl.
A thinko in a96761391 caused pg_ctl to get it exactly backwards when
deciding whether to report problems to the Windows eventlog or to stderr.
Per bug #14001 from Manuel Mathar, who also identified the fix.
Like the previous patch, back-patch to all supported branches.
2016-03-07 10:40:44 -05:00
Tom Lane 94f1adccd3 Re-fix broken definition for function name in pgbench's exprscan.l.
Wups, my first try wasn't quite right either.  Too focused on fixing
the existing bug, not enough on not introducing new ones.
2016-03-06 21:45:34 -05:00
Tom Lane 3899caf772 Fix broken definition for function name in pgbench's exprscan.l.
As written, this would accept e.g. 123e9 as a function name.  Aside
from being mildly astonishing, that would come back to haunt us if
we ever try to add float constants to the expression syntax.  Insist
that function names start with letters (or at least non-digits).

In passing reset yyline as well as yycol when starting a new expression.
This variable is useless since it's used nowhere, but if we're going
to have it we should have it act sanely.
2016-03-06 21:04:25 -05:00
Andres Freund fd45d16f62 Fix wrong allocation size in c8f621c43.
In c8f621c43 I forgot to account for MAXALIGN when allocating a new
tuplebuf in ReorderBufferGetTupleBuf(). That happens to currently not
cause active problems on a number of platforms because the affected
pointer is already aligned, but others, like ppc and hppa, trigger this
in the regression test, due to a debug memset clearing memory.

Fix that.

Backpatch: 9.4, like the previous commit.
2016-03-06 16:27:20 -08:00
Tom Lane b3e05097e5 Fix not-terribly-safe coding in NIImportOOAffixes() and NIImportAffixes().
There were two places in spell.c that supposed that they could search
for a location in a string produced by lowerstr() and then transpose
the offset into the original string.  But this fails completely if
lowerstr() transforms any characters into characters of different byte
length, as can happen in Turkish UTF8 for instance.

We'd added some comments about this coding in commit 51e78ab4ff,
but failed to realize that it was not merely confusing but wrong.

Coverity complained about this code years ago, but in such an opaque
fashion that nobody understood what it was on about.  I'm not entirely
sure that this issue *is* what it's on about, actually, but perhaps
this patch will shut it up -- and in any case the problem is clear.

Back-patch to all supported branches.
2016-03-06 19:20:55 -05:00
Tom Lane cb0ca0c995 Fix unportable usage of <ctype.h> functions.
isdigit(), isspace(), etc are likely to give surprising results if passed a
signed char.  We should always cast the argument to unsigned char to avoid
that.  Error in commit d78a7d9c7f, found by buildfarm member gaur.
2016-03-06 18:23:53 -05:00
Andres Freund c8f621c43a logical decoding: Fix handling of large old tuples with replica identity full.
When decoding the old version of an UPDATE or DELETE change, and if that
tuple was bigger than MaxHeapTupleSize, we either Assert'ed out, or
failed in more subtle ways in non-assert builds.  Normally individual
tuples aren't bigger than MaxHeapTupleSize, with big datums toasted.
But that's not the case for the old version of a tuple for logical
decoding; the replica identity is logged as one piece. With the default
replica identity btree limits that to small tuples, but that's not the
case for FULL.

Change the tuple buffer infrastructure to separate allocate over-large
tuples, instead of always going through the slab cache.

This unfortunately requires changing the ReorderBufferTupleBuf
definition, we need to store the allocated size someplace. To avoid
requiring output plugins to recompile, don't store HeapTupleHeaderData
directly after HeapTupleData, but point to it via t_data; that leaves
rooms for the allocated size.  As there's no reason for an output plugin
to look at ReorderBufferTupleBuf->t_data.header, remove the field. It
was just a minor convenience having it directly accessible.

Reported-By: Adam Dratwiński
Discussion: CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
2016-03-05 18:02:20 -08:00
Andres Freund 0bda14d54c logical decoding: old/newtuple in spooled UPDATE changes was switched around.
Somehow I managed to flip the order of restoring old & new tuples when
de-spooling a change in a large transaction from disk. This happens to
only take effect when a change is spooled to disk which has old/new
versions of the tuple. That only is the case for UPDATEs where he
primary key changed or where replica identity is changed to FULL.

The tests didn't catch this because either spooled updates, or updates
that changed primary keys, were tested; not both at the same time.

Found while adding tests for the following commit.

Backpatch: 9.4, where logical decoding was added
2016-03-05 18:02:20 -08:00
Andres Freund d9e903f3cb logical decoding: Tell reorderbuffer about all xids.
Logical decoding's reorderbuffer keeps transactions in an LSN ordered
list for efficiency. To make that's efficiently possible upper-level
xids are forced to be logged before nested subtransaction xids.  That
only works though if these records are all looked at: Unfortunately we
didn't do so for e.g. row level locks, which are otherwise uninteresting
for logical decoding.

This could lead to errors like:
"ERROR: subxact logged without previous toplevel record".

It's not sufficient to just look at row locking records, the xid could
appear first due to a lot of other types of records (which will trigger
the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny).
So invent infrastructure to tell reorderbuffer about xids seen, when
they'd otherwise not pass through reorderbuffer.c.

Reported-By: Jarred Ward
Bug: #13844
Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org
Backpatch: 9.4, where logical decoding was added
2016-03-05 18:02:20 -08:00
Joe Conway dc7d70ea05 Expose control file data via SQL accessible functions.
Add four new SQL accessible functions: pg_control_system(),
pg_control_checkpoint(), pg_control_recovery(), and pg_control_init()
which expose a subset of the control file data.

Along the way move the code to read and validate the control file to
src/common, where it can be shared by the new backend functions
and the original pg_controldata frontend program.

Patch by me, significant input, testing, and review by Michael Paquier.
2016-03-05 11:10:19 -08:00
Fujii Masao d34794f7d5 Ignore recovery_min_apply_delay until recovery has reached consistent state
Previously recovery_min_apply_delay was applied even before recovery
had reached consistency. This could cause us to wait a long time
unexpectedly for read-only connections to be allowed. It's problematic
because the standby was useless during that wait time.

This patch changes recovery_min_apply_delay so that it's applied once
the database has reached the consistent state. That is, even if the delay
is set, the standby tries to replay WAL records as fast as possible until
it has reached consistency.

Author: Michael Paquier
Reviewed-By: Julien Rouhaud
Reported-By: Greg Clough
Backpatch: 9.4, where recovery_min_apply_delay was added
Bug: #13770
Discussion: http://www.postgresql.org/message-id/20151111155006.2644.84564@wrigleys.postgresql.org
2016-03-06 02:29:04 +09:00
Tom Lane 60690a6fe8 Make stats regression test robust in the face of parallel query.
Historically, the wait_for_stats() function in this test has simply checked
for a report of an indexscan on tenk2, corresponding to the last command
issued before we expect stats updates to appear.  However, with parallel
query that indexscan could be done by a parallel worker that will emit
its stats counters to the collector before the session's main backend does
(a full second before, in fact, thanks to the "pg_sleep(1.0)" added by
commit 957d08c81f).  That leaves a sizable window in which an
autovacuum-triggered write of the stats files would present a state in
which the indexscan on tenk2 appears to have been done, but none of the
write updates performed by the test have been.  This is evidently the
explanation for intermittent failures seen by me and on buildfarm member
mandrill.

To fix, we should check separately for both the tenk2 seqscan and indexscan
counts, since those might be reported by different processes that could be
delayed arbitrarily on an overloaded test machine.  And we need to check
for at least one update-related count.  If we ever allow parallel workers
to do writes, this will get even more complicated ... but in view of all
the other hard problems that will entail, I don't feel a need to solve this
one today.

Per research by Rahila Syed and myself; part of this patch is Rahila's.
2016-03-04 16:20:49 -05:00
Robert Haas 708020eb7b Fix typo in comment.
Thomas Munro
2016-03-04 15:46:30 -05:00
Robert Haas 6fcde8a5c8 Minor improvements to transaction manager README.
A simple SELECT is handled by PortalRunSelect, not ProcessQuery.  Also,
the previous indentation was unclear: change it so that a deeper level
of indentation indicates that the outer function calls the inner one.

Stas Kelvich
2016-03-04 14:12:28 -05:00
Robert Haas 17b124d303 Fix SerializeSnapshot not to overrun the allocated space.
Rushabh Lathia
2016-03-04 13:48:36 -05:00
Teodor Sigaev 0e7557dc8d Fix Windows build broken by d78a7d9c7f 2016-03-04 21:36:49 +03:00
Robert Haas df4685fb0c Minor optimizations based on ParallelContext having nworkers_launched.
Originally, we didn't have nworkers_launched, so code that used parallel
contexts had to be preprared for the possibility that not all of the
workers requested actually got launched.  But now we can count on knowing
the number of workers that were successfully launched, which can shave
off a few cycles and simplify some code slightly.

Amit Kapila, reviewed by Haribabu Kommi, per a suggestion from Peter
Geoghegan.
2016-03-04 12:59:10 -05:00
Robert Haas 546cd0d766 Fix InitializeSessionUserId not to deference NULL rolename pointer.
Dmitriy Sarafannikov, reviewed by Michael Paquier and Haribabu Kommi,
with a minor fix by me.
2016-03-04 12:28:09 -05:00
Teodor Sigaev d78a7d9c7f Improve support of Hunspell in ispell dictionary.
Now it's possible to load recent version of Hunspell for several languages.
To handle these dictionaries Hunspell patch adds support for:
* FLAG long - sets the double extended ASCII character flag type
* FLAG num - sets the decimal number flag type (from 1 to 65535)
* AF parameter - alias for flag's set

Also it moves test dictionaries into separate directory.

Author: Artur Zakirov with editorization by me
2016-03-04 20:08:47 +03:00
Robert Haas 9445db925e Fix query-based tab completion for multibyte characters.
The existing code confuses the byte length of the string (which is
relevant when passing it to pg_strncasecmp) with the character length
of the string (which is relevant when it is used with the SQL substring
function).  Separate those two concepts.

Report and patch by Kyotaro Horiguchi, reviewed by Thomas Munro and
reviewed and further revised by me.
2016-03-04 11:53:20 -05:00
Alvaro Herrera 52fe6f4e02 Add 'tap_tests' flag in config_default.pl
This makes the flag more visible for testers using the default file as a
template, increasing the likelyhood that the test suite will be run.
Also have the flag be displayed in the fake "configure" output, if set.

This patch is two new lines only, but perltidy decides to shift things
around which makes it appear a bit bigger.

Author: Michaël Paquier
Reviewed-by: Craig Ringer
Discussion: https://www.postgresql.org/message-id/CAB7nPqRet6UAP2APhZAZw%3DVhJ6w-Q-gGLdZkrOqFgd2vc9-ZDw%40mail.gmail.com
2016-03-04 13:04:53 -03:00
Peter Eisentraut 1fa2a6b1d4 Add prerequisite for KOI8-U.TXT
This was missed when the encoding was added.
2016-03-03 20:44:47 -05:00
Peter Eisentraut b497abc602 Make some adjustments in variable assignments
These variables aren't really used for anything interesting, but it
seems the existing grouping was somewhat nonsensical.
2016-03-03 20:44:47 -05:00
Peter Eisentraut 7a4a813c99 Add missing rules related to EUC_JIS_2004 and SHIFT_JIS_2004 encodings
This was apparently forgotten in commit
75c6519ff6.
2016-03-03 20:44:47 -05:00
Alvaro Herrera d561f1caec pgbench: accept unambiguous builtin prefixes for -b
This makes it easier to use "-b se" instead of typing the full "-b
select-only".

Author: Fabien Coelho
Reviewed-by: Michaël Paquier
2016-03-03 19:37:13 -03:00
Alvaro Herrera 2c83f435a3 Rework PostgresNode's psql method
This makes the psql() method much more capable: it captures both stdout
and stderr; it now returns the psql exit code rather than stdout; a
timeout can now be specified, as can ON_ERROR_STOP behavior; it gained a
new "on_error_die" (defaulting to off) parameter to raise an exception
if there's any problem.  Finally, additional parameters to psql can be
passed if there's need for further tweaking.

For convenience, a new safe_psql() method retains much of the old
behavior of psql(), except that it uses on_error_die on, so that
problems like syntax errors in SQL commands can be detected more easily.

Many existing TAP test files now use safe_psql, which is what is really
wanted.  A couple of ->psql() calls are now added in the commit_ts
tests, which verify that the right thing is happening on certain errors.
Some ->command_fails() calls in recovery tests that were verifying that
psql failed also became ->psql() calls now.

Author: Craig Ringer. Some tweaks by Álvaro Herrera
Reviewed-By: Michaël Paquier
2016-03-03 17:58:30 -03:00
Alvaro Herrera 7d9a4301c0 perltidy PostgresNode and SimpleTee
Also, mention in README that Perl files should be perltidy'ed.  This
isn't really the best place (since we have Perl files elsewhere in the
tree) and this is already in pgindent's README, but this subdir is
likely to get hacked a whole lot more than the other Perl files, so it
seems okay to spend two lines on this.

Author: Craig Ringer
2016-03-03 13:21:35 -03:00
Alvaro Herrera 5bec1ad464 Fix mistakes in recovery tests
One test was relying on method remove_tree that isn't implemented in the
oldest Perl we support; fix it by using the older rmtree instead.

Another test had a typo in a SQL command, which isn't noticed because
the PostgresNode->psql() method doesn't check that queries return
correctly.  That's undesirable and will also be fixed later on, but for
now let's make the test actually work.

Author: Craig Ringer
2016-03-03 12:51:47 -03:00
Simon Riggs c7111d11b1 Revert buggy optimization of index scans
606c0123d6 attempted to reduce cost of index scans using > and <
strategies, though got that completely wrong in a few complex cases.

Revert whole patch until we find a safe optimization.
2016-03-03 09:53:43 +00:00
Magnus Hagander 6c90996a4c Add prefix to pl/pgsql global variables and functions
Rename pl/pgsql global variables to always have a plpgsql_ prefix,
so they don't conflict with other shared libraries loaded.
2016-03-03 10:45:59 +01:00
Andres Freund 7c17aac69d logical decoding: fix decoding of a commit's commit time.
When adding replication origins in 5aa235042, I somehow managed to set
the timestamp of decoded transactions to InvalidXLogRecptr when decoding
one made without a replication origin. Fix that, and the wrong type of
the new commit_time variable.

This didn't trigger a regression test failure because we explicitly
don't show commit timestamps in the regression tests, as they obviously
are variable. Add a test that checks that a decoded commit's timestamp
is within minutes of NOW() from before the commit.

Reported-By: Weiping Qu
Diagnosed-By: Artur Zakirov
Discussion: 56D4197E.9050706@informatik.uni-kl.de,
    56D42918.1010108@postgrespro.ru
Backpatch: 9.5, where 5aa235042 originates.
2016-03-02 23:42:21 -08:00
Tom Lane a9d199f6d3 Fix json_to_record() bug with nested objects.
A thinko concerning nesting depth caused json_to_record() to produce bogus
output if a field of its input object contained a sub-object with a field
name matching one of the requested output column names.  Per bug #13996
from Johann Visagie.

I added a regression test case based on his example, plus parallel tests
for json_to_recordset, jsonb_to_record, jsonb_to_recordset.  The latter
three do not exhibit the same bug (which suggests that we may be missing
some opportunities to share code...) but testing seems like a good idea
in any case.

Back-patch to 9.4 where these functions were introduced.
2016-03-02 23:31:39 -05:00
Tom Lane eb43e851d6 Create stub functions to support pg_upgrade of old contrib/tsearch2.
Commits 9ff60273e3 and dbe2328959 adjusted the declarations
of some core functions referenced by contrib/tsearch2's install script,
forgetting that in a pg_upgrade situation, we'll be trying to restore
operator class definitions that reference the old signatures.  We've
hit this problem before; solve it in the same way as before, namely by
installing stub functions that have the expected signature and just
invoke the correct function.  Per report from Jeff Janes.

(Someday we ought to stop supporting contrib/tsearch2, but I'm not
sure today is that day.)
2016-03-02 17:37:54 -05:00
Alvaro Herrera cc6077d4d5 Prefix temp data dirs with the node name
This makes it easier to relate the temporary data dirs to each node in
a test script.

Author: Kyotaro Horiguchi
Reviewed-By: Craig Ringer, Alvaro Herrera
2016-03-02 18:22:45 -03:00
Tom Lane c8c7c93de8 Fix PL/Tcl's encoding conversion logic.
PL/Tcl appears to contain logic to convert strings between the database
encoding and UTF8, which is the only encoding modern Tcl will deal with.
However, that code has been disabled since commit 034895125d, which
made it "#if defined(UNICODE_CONVERSION)" and neglected to provide any way
for that symbol to become defined.  That might have been all right back
in 2001, but these days we take a dim view of allowing strings with
incorrect encoding into the database.

Remove the conditional compilation, fix warnings about signed/unsigned char
conversions, clean up assorted places that didn't bother with conversions.
(Notably, there were lots of assumptions that database table and field
names didn't need conversion...)

Add a regression test based on plpython_unicode.  It's not terribly
thorough, but better than no test at all.
2016-03-02 13:30:14 -05:00
Tom Lane e2609323eb Make PL/Tcl require Tcl 8.4 or later.
As of commit 2878220682, PL/Tcl will not
compile against pre-8.0 Tcl, whereas it used to work (more or less anyway)
with quite prehistoric versions.  As long as we're moving these goalposts,
let's reinstall them at someplace that has some thought behind it.  This
commit sets the minimum allowed Tcl version at 8.4, and rips out some bits
of compatibility cruft that are in consequence no longer needed.  Reasons
for requiring 8.4 include:

* 8.4 was released in 2002; there seems little reason to believe that
anyone would want to use older versions with Postgres 9.6+.

* We have no buildfarm members testing anything older than 8.4, and
thus no way to know if it's broken.

* We need at least 8.1 to allow enforcement of database encoding
security (8.1 standardized Tcl on using UTF8 internally, before that
it was pretty unpredictable).

* Some versions between 8.1 and 8.4 allowed the backend to become
multithreaded, which is disastrous.  We need at least 8.4 to be able
to disable the Tcl notifier subsystem to prevent that.

A small side benefit is that we can make the code more readable by
doing s/CONST84/const/g.
2016-03-02 12:24:30 -05:00
Tom Lane 2878220682 Convert PL/Tcl to use Tcl's "object" interfaces.
The original implementation of Tcl was all strings, but they improved
performance significantly by introducing typed "objects" (integers,
lists, code, etc).  It's past time we made use of that; that happened
in Tcl 8.0 which was released in 1997.

This patch also modernizes some of the error-reporting code, which may
cause small changes in the spelling of complaints about bad calls to
PL/Tcl-provided commands.

Jim Nasby and Karl Lehenbauer, reviewed by Victor Wagner
2016-03-02 12:07:31 -05:00
Tom Lane 3b8d721553 Fix TAP tests for older Perls.
Commit 7132810c (Retain tempdirs for failed tests) used Test::More's
is_passing method, but that was added in Test::More 0.89_01 which is
sometime later than Perl 5.10.1.  Popular platforms such as RHEL6 don't
have that, nevermind some of our older dinosaurs.  Do it the hard way.

Michael Paquier, based on research by Craig Ringer
2016-03-02 01:06:31 -05:00
Robert Haas a892234f83 Change the format of the VM fork to add a second bit per page.
The new bit indicates whether every tuple on the page is already frozen.
It is cleared only when the all-visible bit is cleared, and it can be
set only when we vacuum a page and find that every tuple on that page is
both visible to every transaction and in no need of any future
vacuuming.

A future commit will use this new bit to optimize away full-table scans
that would otherwise be triggered by XID wraparound considerations.  A
page which is merely all-visible must still be scanned in that case, but
a page which is all-frozen need not be.  This commit does not attempt
that optimization, although that optimization is the goal here.  It
seems better to get the basic infrastructure in place first.

Per discussion, it's very desirable for pg_upgrade to automatically
migrate existing VM forks from the old format to the new format.  That,
too, will be handled in a follow-on patch.

Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit
Kapila, Simon Riggs, Andres Freund, and others, and substantially
revised by me.
2016-03-01 21:49:41 -05:00
Tom Lane 68c521eb92 Improve coverage of pltcl regression tests.
Test composite-type arguments and the argisnull and spi_lastoid Tcl
commmands.  This stuff was not covered before, but needs to be exercised
since the upcoming Tcl object-conversion patch changes these code paths
(and broke at least one of them).
2016-03-01 20:01:16 -05:00
Alvaro Herrera 9def031bd2 Add more tests for commit_timestamp feature
These tests verify that 1) WAL replay preserves the stored value,
2) a streaming standby server replays the value obtained from the
master, and 3) the behavior is sensible in the face of repeated
configuration changes.

One annoyance is that tmp_check/ subdir from the TAP tests is clobbered
when the pg_regress test runs in the same subdirectory.  This is
bothersome but not too terrible a problem, since the pg_regress test is
not run anyway if the TAP tests fail (unless "make -k" is used).

I had these tests around since commit 69e7235c93e2; add them now that we
have the recovery test framework in place.
2016-03-01 19:53:18 -03:00
Alvaro Herrera 88802e0680 TAP tests: retain temp dirs on test failure
This makes it easier to study the reason for the failure.

Author: Kyotaro Horiguchi
Reviewed-By: Craig Ringer
2016-03-01 19:50:13 -03:00
Robert Haas 212bba93ce Fix incorrect comment.
PQmblen and PQdsplen return information about characters, not words.

Kyotaro Horiguchi
2016-03-01 13:31:44 -05:00
Robert Haas aec64e8f45 Fix mistake in extensible node code.
I believe that I (rhaas) introduced this bug while editing the patch
that became bcac23de73.

Report and patch from KaiGai Kohei.
2016-03-01 13:17:09 -05:00
Robert Haas 7e137f846d Extend pgbench's expression syntax to support a few built-in functions.
Fabien Coelho, reviewed mostly by Michael Paquier and me, but also by
Heikki Linnakangas, BeomYong Lee, Kyotaro Horiguchi, Oleksander
Shulgin, and Álvaro Herrera.
2016-03-01 13:08:30 -05:00
Peter Eisentraut bd6cf3f237 Add Unicode map generation scripts as rule prerequisites
That way, the rules will trigger when the scripts change.
2016-02-29 21:19:28 -05:00
Peter Eisentraut cc074bf6c1 Fix comments
Some of these comments were copied and pasted without updating them,
some of them were duplicates.
2016-02-29 21:19:24 -05:00
Peter Eisentraut 9a3e06baa2 UCS_to_most.pl: Make executable, for consistency with other scripts 2016-02-29 21:19:17 -05:00
Tom Lane 3d523564c5 Suppress scary-looking log messages from async-notify isolation test.
I noticed that the async-notify test results in log messages like these:

LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost

This is because it unceremoniously disconnects a client session that is
about to have some NOTIFY messages delivered to it.  Such log messages
during a regression test might well cause people to go looking for a
problem that doesn't really exist (it did cause me to waste some time that
way).  We can shut it up by adding an UNLISTEN command to session teardown.

Patch HEAD only; this doesn't seem significant enough to back-patch.
2016-02-29 19:29:19 -05:00
Tom Lane 8d8ff5f7db Improve error message for rejecting RETURNING clauses with dropped columns.
This error message was written with only ON SELECT rules in mind, but since
then we also made RETURNING-clause targetlists go through the same logic.
This means that you got a rather off-topic error message if you tried to
add a rule with RETURNING to a table having dropped columns.  Ideally we'd
just support that, but some preliminary investigation says that it might be
a significant amount of work.  Seeing that Nicklas Avén's complaint is the
first one we've gotten about this in the ten years or so that the code's
been like that, I'm unwilling to put much time into it.  Instead, improve
the error report by issuing a different message for RETURNING cases, and
revise the associated comment based on this investigation.

Discussion: 1456176604.17219.9.camel@jordogskog.no
2016-02-29 19:11:38 -05:00
Alvaro Herrera 5847397dec Minor tweaks for new src/test/recovery
Author: Michael Paquier
2016-02-29 18:16:59 -03:00
Alvaro Herrera 10b4852215 Fix typos
Author: Amit Langote
2016-02-29 18:11:58 -03:00
Alvaro Herrera 54638f5708 Make new isolationtester test more stable
The original coding of the test was relying too much on the ordering in
which backends are awakened once an advisory lock which they wait for is
released.  Change the code so that each backend uses its own advisory
lock instead, so that the output becomes stable.  Also add a few seconds
of sleep between lock releases, so that the test isn't broken in
overloaded buildfarm animals, as suggested by Tom Lane.

Per buildfarm members spoonbill and guaibasaurus.

Discussion: https://www.postgresql.org/message-id/19294.1456551587%40sss.pgh.pa.us
2016-02-29 16:34:56 -03:00
Tom Lane c110678a47 Remove useless unary plus.
It's harmless, but might confuse readers.  Seems to have been introduced
in 6bc8ef0b7f.  Back-patch, just to avoid cosmetic cross-branch
differences.

Amit Langote
2016-02-29 10:48:40 -05:00
Tom Lane 05893712cc Fix build under OPTIMIZER_DEBUG.
In commit 19a541143a I replaced RelOptInfo.width with
RelOptInfo.reltarget.width, but I missed updating debug_print_rel()
for that because it's not compiled by default.
Reported by Salvador Fandino, patch by Michael Paquier.
2016-02-29 10:14:12 -05:00
Dean Rasheed 41fedc2462 Fix incorrect varlevelsup in security_barrier_replace_vars().
When converting an RTE with securityQuals into a security barrier
subquery RTE, ensure that the Vars in the new subquery's targetlist
all have varlevelsup = 0 so that they correctly refer to the
underlying base relation being wrapped.

The original code was creating new Vars by copying them from existing
Vars referencing the base relation found elsewhere in the query, but
failed to account for the fact that such Vars could come from sublink
subqueries, and hence have varlevelsup > 0. In practice it looks like
this could only happen with nested security barrier views, where the
outer view has a WHERE clause containing a correlated subquery, due to
the order in which the Vars are processed.

Bug: #13988
Reported-by: Adam Guthrie
Backpatch-to: 9.4, where updatable SB views were introduced
2016-02-29 12:28:06 +00:00
Tom Lane 907e4dd2b1 Avoid multiple free_struct_lconv() calls on same data.
A failure partway through PGLC_localeconv() led to a situation where
the next call would call free_struct_lconv() a second time, leading
to free() on already-freed strings, typically leading to a core dump.
Add a flag to remember whether we need to do that.

Per report from Thom Brown.  His example case only provokes the failure
as far back as 9.4, but nonetheless this code is obviously broken, so
back-patch to all supported branches.
2016-02-28 23:39:20 -05:00
Andrew Dunstan 26fdff1b8f Allow multiple --temp-config arguments to pg_regress
This means that if, for example, TEMP_CONFIG is set and a Makefile
explicitly sets a temp-config file, both will now be used.

Patch from John Gorman.
2016-02-28 09:38:43 -05:00
Andrew Dunstan 87cc6b57a9 Respect TEMP_CONFIG when pg_regress_check and friends are called
This reverts commit 9117985b6b in favor of
a more general solution.
2016-02-27 12:28:21 -05:00
Alvaro Herrera c9578135f7 Add isolationtester spec for old heapam.c bug
In 0e5680f473, I fixed a bug in heapam that caused spurious deadlocks
when multiple updates concurrently attempted to modify the old version
of an updated tuple whose new version was key-share locked.  I proposed
an isolationtester spec file that reproduced the bug, but back then
isolationtester wasn't mature enough to be able to run it.  Now that
38f8bdcac4 is in the tree, we can have this spec file too.

Discussion: https://www.postgresql.org/message-id/20141212205254.GC1768%40alvh.no-ip.org
2016-02-26 17:11:15 -03:00
Alvaro Herrera 74d58425c7 Apply last revision of recovery patch
I applied the previous-to-last revision of Michaël's submitted patch
instead of the last; these two tweaks pointed out by Craig were left out
of the previous commit by accident.
2016-02-26 16:22:53 -03:00
Alvaro Herrera 49148645f7 Add a test framework for recovery
This long-awaited framework is an expansion of the existing PostgresNode
stuff to support additional features for recovery testing; the recovery
tests included in this commit are a starting point that cover some of
the recovery features we have.  More scripts are expected to be added
later.

Author: Michaël Paquier, a bit of help from Amir Rohan
Reviewed by: Amir Rohan, Stas Kelvich, Kyotaro Horiguchi, Victor Wagner,
Craig Ringer, Álvaro Herrera
Discussion: http://www.postgresql.org/message-id/CAB7nPqTf7V6rswrFa=q_rrWeETUWagP=h8LX8XAov2Jcxw0DRg@mail.gmail.com
Discussion: http://www.postgresql.org/message-id/trinity-b4a8035d-59af-4c42-a37e-258f0f28e44a-1443795007012@3capp-mailcom-lxa08
2016-02-26 16:13:30 -03:00
Alvaro Herrera 89ac7004da Move some code from RewindTest into PostgresNode
Some code in the RewindTest test suite is more generally useful than
just for that suite, so put it where other test suites can reach it.

Some postgresql.conf parameters change their default values when a
cluster is initialized with 'allows_streaming' than the previous
behavior; most notably, autovacuum is no longer turned off.

(Also, we no longer call pg_ctl promote with -w, but that flag doesn't
actually do anything in promote so there's no behavior change.)

Author: Michael Paquier
2016-02-26 13:24:22 -03:00
Robert Haas 7bea19d0a9 On second thought, disable parallelism for prepared statements.
CREATE TABLE .. AS EXECUTE can turn an apparently read-only query into
a write operation, which parallel query can't handle.  It's a bit of a
shame that requires us to avoid parallel query for queries prepared via
PREPARE in all cases, but for right now it does.
2016-02-26 16:33:37 +05:30
Robert Haas 35746bc348 Add new FDW API to test for parallel-safety.
This is basically a bug fix; the old code assumes that a ForeignScan
is always parallel-safe, but for postgres_fdw, for example, this is
definitely false.  It should be true for file_fdw, though, since a
worker can read a file from the filesystem just as well as any other
backend process.

Original patch by Thomas Munro.  Documentation, and changes to the
comments, by me.
2016-02-26 16:14:46 +05:30
Alvaro Herrera e64009303d Add POD docs to PostgresNode
Also, the dump_info method got split into another method that returns
the stuff as a string instead of just printing it to stdout.

Add a new README in src/test/perl too.

Author: Craig Ringer
Reviewed by: Michaël Paquier
2016-02-25 21:31:52 -03:00
Alvaro Herrera bda0b08198 Add README in src/test and src/test/modules
Author: Craig Ringer
Reviewed by: Michaël Paquier
2016-02-25 21:08:32 -03:00
Alvaro Herrera 343f709c06 Fix typos
Backpatch to: 9.4
2016-02-25 20:50:20 -03:00
Robert Haas 57a6a72b6b Enable parallelism for prepared statements and extended query protocol.
Parallel query can't handle running a query only partially rather than
to completion.  However, there seems to be no way to run a statement
prepared via SQL PREPARE other than to completion, so we can enable it
there without a problem.

The situation is more complicated for the extend query protocol.
libpq seems to provide no way to send an Execute message with a
non-zero rowcount, but some other client might.  If that happens, and
a parallel plan was chosen, we'll execute the parallel plan without
using any workers, which may be somewhat inefficient but should still
work.  Hopefully this won't be a problem; users can always set
max_parallel_degree=0 to avoid choosing parallel plans in the first
place.

Amit Kapila, reviewed by me.
2016-02-25 13:02:18 +05:30
Noah Misch 25924ac47a Clean the last few TAP suite tmp_check directories.
Back-patch to 9.5, where the suites were introduced.
2016-02-24 23:41:54 -05:00
Noah Misch 4163588783 MSVC: Clean tmp_check directory of pg_controldata test suite.
Back-patch to 9.4, where the suite was introduced.
2016-02-24 23:41:33 -05:00
Tom Lane 52f5d578d6 Create a function to reliably identify which sessions block which others.
This patch introduces "pg_blocking_pids(int) returns int[]", which returns
the PIDs of any sessions that are blocking the session with the given PID.
Historically people have obtained such information using a self-join on
the pg_locks view, but it's unreasonably tedious to do it that way with any
modicum of correctness, and the addition of parallel queries has pretty
much broken that approach altogether.  (Given some more columns in the view
than there are today, you could imagine handling parallel-query cases with
a 4-way join; but ugh.)

The new function has the following behaviors that are painful or impossible
to get right via pg_locks:

1. Correctly understands which lock modes block which other ones.

2. In soft-block situations (two processes both waiting for conflicting lock
modes), only the one that's in front in the wait queue is reported to
block the other.

3. In parallel-query cases, reports all sessions blocking any member of
the given PID's lock group, and reports a session by naming its leader
process's PID, which will be the pg_backend_pid() value visible to
clients.

The motivation for doing this right now is mostly to fix the isolation
tests.  Commit 38f8bdcac4 lobotomized
isolationtester's is-it-waiting query by removing its ability to recognize
nonconflicting lock modes, as a crude workaround for the inability to
handle soft-block situations properly.  But even without the lock mode
tests, the old query was excessively slow, particularly in
CLOBBER_CACHE_ALWAYS builds; some of our buildfarm animals fail the new
deadlock-hard test because the deadlock timeout elapses before they can
probe the waiting status of all eight sessions.  Replacing the pg_locks
self-join with use of pg_blocking_pids() is not only much more correct, but
a lot faster: I measure it at about 9X faster in a typical dev build with
Asserts, and 3X faster in CLOBBER_CACHE_ALWAYS builds.  That should provide
enough headroom for the slower CLOBBER_CACHE_ALWAYS animals to pass the
test, without having to lengthen deadlock_timeout yet more and thus slow
down the test for everyone else.
2016-02-22 14:31:43 -05:00
Tom Lane 73bf8715aa Remove redundant PGPROC.lockGroupLeaderIdentifier field.
We don't really need this field, because it's either zero or redundant with
PGPROC.pid.  The use of zero to mark "not a group leader" is not necessary
since we can just as well test whether lockGroupLeader is NULL.  This does
not save very much, either as to code or data, but the simplification seems
worthwhile anyway.
2016-02-22 11:20:35 -05:00
Andres Freund ea56b06cf7 Fix wrong keysize in PrivateRefCountHash creation.
In 4b4b680c3 I accidentally used sizeof(PrivateRefCountArray) instead of
sizeof(PrivateRefCountEntry) when creating the refcount overflow
hashtable. As the former is bigger than the latter, this luckily only
resulted in a slightly increased memory usage when many buffers are
pinned in a backend.

Reported-By: Takashi Horikawa
Discussion: 73FA3881462C614096F815F75628AFCD035A48C3@BPXM01GP.gisp.nec.co.jp
Backpatch: 9.5, where thew new ref count infrastructure was introduced
2016-02-21 22:48:44 -08:00
Tom Lane c7a1c5a6b6 Cosmetic improvements in new config_info code.
Coverity griped about use of unchecked strcpy() into a local variable.
There's unlikely to be any actual bug there, since no caller would be
passing a path longer than MAXPGPATH, but nonetheless use of strlcpy()
seems preferable.

While at it, get rid of unmaintainable separation between list of
field names and list of field values in favor of initializing them
in parallel.  And we might as well declare get_configdata()'s path
argument as const char *, even though no current caller needs that.
2016-02-21 11:38:24 -05:00
Andrew Dunstan 94c745eb18 Fix two-argument jsonb_object when called with empty arrays
Some over-eager copy-and-pasting on my part resulted in a nonsense
result being returned in this case. I have adopted the same pattern for
handling this case as is used in the one argument form of the function,
i.e. we just skip over the code that adds values to the object.

Diagnosis and patch from Michael Paquier, although not quite his
solution.

Fixes bug #13936.

Backpatch to 9.5 where jsonb_object was introduced.
2016-02-21 10:30:49 -05:00
Robert Haas 88aca5662d Fix incorrect decision about which lock to take.
Spotted by Tom Lane.
2016-02-21 17:06:41 +05:30
Robert Haas d91a4a6c85 Cosmetic improvements to group locking.
Reflow text in lock manager README so that it fits within 80 columns.
Correct some mistakes.  Expand the README to explain not only why group
locking exists but also the data structures that support it.  Improve
comments related to group locking several files.  Change the name of a
macro argument for improved clarity.

Most of these problems were reported by Tom Lane, but I found a few
of them myself.

Robert Haas and Tom Lane
2016-02-21 15:42:02 +05:30
Dean Rasheed 740d71842b Further fixing to make pg_size_bytes() portable.
Not all compilers support "long long" and the "LL" integer literal
suffix, so use a cast to int64 instead.
2016-02-20 15:49:26 +00:00
Dean Rasheed ad7cc1c554 Fix pg_size_bytes() to be more portable.
Commit 53874c5228 broke various 32-bit
buildfarm machines because it incorrectly used an 'L' suffix for what
needed to be a 64-bit literal. Thanks to Michael Paquier for helping
to diagnose this.
2016-02-20 11:03:04 +00:00
Dean Rasheed 53874c5228 Add pg_size_bytes() to parse human-readable size strings.
This will parse strings in the format produced by pg_size_pretty() and
return sizes in bytes. This allows queries to be written with clauses
like "pg_total_relation_size(oid) > pg_size_bytes('10 GB')".

Author: Pavel Stehule with various improvements by Vitaly Burovoy
Discussion: http://www.postgresql.org/message-id/CAFj8pRD-tGoDKnxdYgECzA4On01_uRqPrwF-8LdkSE-6bDHp0w@mail.gmail.com
Reviewed-by: Vitaly Burovoy, Oleksandr Shulgin, Kyotaro Horiguchi,
    Michael Paquier and Robert Haas
2016-02-20 09:57:27 +00:00
Noah Misch 5882ca6686 Call xlc __isync() after, not before, associated compare-and-swap.
Architecture reference material specifies this order, and s_lock.h
inline assembly agrees.  The former order failed to provide mutual
exclusion to lwlock.c and perhaps to other clients.  The two xlc
buildfarm members, hornet and mandrill, have failed sixteen times with
duplicate key errors involving pg_class_oid_index or pg_type_oid_index.
Back-patch to 9.5, where commit b64d92f1a5
introduced atomics.

Reviewed by Andres Freund and Tom Lane.
2016-02-19 22:47:50 -05:00
Simon Riggs 481725c0ba Correct StartupSUBTRANS for page wraparound
StartupSUBTRANS() incorrectly handled cases near the max pageid in the subtrans
data structure, which in some cases could lead to errors in startup for Hot
Standby.
This patch wraps the pageids correctly, avoiding any such errors.
Identified by exhaustive crash testing by Jeff Janes.

Jeff Janes
2016-02-19 08:31:12 +00:00
Peter Eisentraut a914a04142 pg_dump: Fix inconsistent sscanf() conversions
It was using %u to read a string that was earlier produced by snprintf with %d
into a signed integer variable.  This seems to work in practice but is
incorrect.

found by cppcheck
2016-02-18 20:12:38 -05:00
Tom Lane 19a541143a Add an explicit representation of the output targetlist to Paths.
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist).  However, there are good
reasons to remove that assumption.  For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free".  While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node).  Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next.  Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit.  It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.

While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist.  To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)

I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation.  Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't.  Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths.  This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more.  This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
2016-02-18 20:02:03 -05:00
Bruce Momjian 3386f34cdc pg_upgrade: suppress creation of delete script
Suppress creation of the pg_upgrade delete script when the new data
directory is inside the old data directory.

Reported-by: IRC

Backpatch-through: 9.3, where delete script tests were added
2016-02-18 18:32:27 -05:00
Peter Eisentraut 18777c38e9 Improve error message about active replication slot
The old phrasing was awkward if a replication slot is activated and
deactivated repeatedly.
2016-02-17 21:23:28 -05:00
Joe Conway fc8a81e3e7 Revert inadvertant change in pg_config behavior
In commit a5c43b88 the behavior of command line pg_config was
inadvertantly changed to include the config name when specific
configs are requested, similar to when none are requested and
all are emitted. This breaks scripts that expect to use
pg_config for e.g. PGXS. Revert the behavior to the previous.
2016-02-17 10:00:34 -08:00
Joe Conway a5c43b8869 Add new system view, pg_config
Move and refactor the underlying code for the pg_config client
application to src/common in support of sharing it with a new
system information SRF called pg_config() which makes the same
information available via SQL. Additionally wrap the SRF with a
new system view, as called pg_config.

Patch by me with extensive input and review by Michael Paquier
and additional review by Alvaro Herrera.
2016-02-17 09:12:06 -08:00
Robert Haas f1f5ec1efa Reuse abbreviated keys in ordered [set] aggregates.
When processing ordered aggregates following a sort that could make use
of the abbreviated key optimization, only call the equality operator to
compare successive pairs of tuples when their abbreviated keys were not
equal.

Peter Geoghegan, reviewd by Andreas Karlsson and by me.
2016-02-17 15:40:00 +05:30
Tom Lane 66f503868b Make plpython cope with funny characters in function names.
A function name that's double-quoted in SQL can contain almost any
characters, but we were using that name directly as part of the name
generated for the Python-level function, and Python doesn't like
anything that isn't pretty much a standard identifier.  To fix,
replace anything that isn't an ASCII letter or digit with an underscore
in the generated name.  This doesn't create any risk of duplicate Python
function names because we were already appending the function OID to
the generated name to ensure uniqueness.  Per bug #13960 from Jim Nasby.

Patch by Jim Nasby, modified a bit by me.  Back-patch to all
supported branches.
2016-02-16 21:08:15 -05:00
Michael Meskes 868898739a Changed expected result to list IPv6 local interface too. 2016-02-16 14:34:10 +01:00
Michael Meskes fc1ae7d2eb Change ecpg lexer to accept comments with line breaks in CPP lines. 2016-02-16 14:24:54 +01:00
Joe Conway 851636bfda Move DATA entry to correct position
In commit 7b4bfc87 the DATA and DESCR entries for the new
row_security_active() function were inadvertantly put after
the PROVOLATILE defines, rather than before as they should
have been placed. Move them up where they belong.

Backpatch to 9.5 where the new entries were introduced.
2016-02-15 16:38:47 -08:00
Andres Freund 7975c5e0a9 Allow the WAL writer to flush WAL at a reduced rate.
Commit 4de82f7d7 increased the WAL flush rate, mainly to increase the
likelihood that hint bits can be set quickly. More quickly set hint bits
can reduce contention around the clog et al.  But unfortunately the
increased flush rate can have a significant negative performance impact,
I have measured up to a factor of ~4.  The reason for this slowdown is
that if there are independent writes to the underlying devices, for
example because shared buffers is a lot smaller than the hot data set,
or because a checkpoint is ongoing, the fdatasync() calls force cache
flushes to be emitted to the storage.

This is achieved by flushing WAL only if the last flush was longer than
wal_writer_delay ago, or if more than wal_writer_flush_after (new GUC)
unflushed blocks are pending. Based on some tests the default for
wal_writer_delay is 1MB, which seems to work well both on SSD and
rotational media.

To avoid negative performance impact due to 4de82f7d7 an earlier
commit (db76b1e) made SetHintBits() more likely to succeed; preventing
performance regressions in the pgbench tests I performed.

Discussion: 20160118163908.GW10941@awork2.anarazel.de
2016-02-16 00:56:34 +01:00
Alvaro Herrera 5df44d14ba pgbench: avoid FD_ISSET on an invalid file descriptor
The original code wasn't careful to test the file descriptor returned by
PQsocket() for an invalid socket.  If an invalid socket did turn up,
that would amount to calling FD_ISSET with fd = -1, whereby undefined
behavior can be invoked.

To fix, test file descriptor for validity and stop further processing if
that fails.

Problem noticed by Coverity.

There is an existing FD_ISSET callsite that does check for invalid
sockets beforehand, but the error message reported by it was
strerror(errno); in testing the aforementioned change, that turns out to
result in "bad socket: Success" which isn't terribly helpful.  Instead
use PQerrorMessage() in both places which is more likely to contain an
useful error message.

Backpatch-through: 9.1.
2016-02-15 20:33:43 -03:00
Tom Lane 8c95ae81fa Suppress compiler warnings about useless comparison of unsigned to zero.
Reportedly, some compilers warn about tests like "c < 0" if c is unsigned,
and hence complain about the character range checks I added in commit
3bb3f42f37.  This is a bit of a pain since
the regex library doesn't really want to assume that chr is unsigned.
However, since any such reconfiguration would involve manual edits of
regcustom.h anyway, we can put it on the shoulders of whoever wants to
do that to adjust this new range-checking macro correctly.

Per gripes from Coverity and Andres.
2016-02-15 17:12:16 -05:00
Andres Freund db76b1efbb Allow SetHintBits() to succeed if the buffer's LSN is new enough.
Previously we only allowed SetHintBits() to succeed if the commit LSN of
the last transaction touching the page has already been flushed to
disk. We can't generally change the LSN of the page, because we don't
necessarily have the required locks on the page. But the required LSN
interlock does not mean the commit record has to be flushed immediately,
it just requires that the commit record will be flushed before the page is
written out. Therefore if the buffer LSN is newer than the commit LSN,
the hint bit can be safely set.

In a number of scenarios (e.g. pgbench) this noticeably increases the
number of hint bits are set. But more importantly it also keeps the
success rate up when flushing WAL less frequently. That was the original
reason for commit 4de82f7d7, which has negative performance consequences
in a number of scenarios. This will allow a followup commit to reduce
the flush rate.

Discussion: 20160118163908.GW10941@awork2.anarazel.de
2016-02-15 22:48:51 +01:00
Joe Conway cfafd8bead Correct Copyright year from 2015 to 2016
Looks like this patch went in after Copyright messages
were updated for 2016 and it missed the boat. Fixed.
2016-02-15 13:19:35 -08:00
Fujii Masao 31b6606c48 Make concurrent refresh check early that there is a unique index on matview.
In REFRESH MATERIALIZED VIEW command, CONCURRENTLY option is only
allowed if there is at least one unique index with no WHERE clause on
one or more columns of the matview. Previously, concurrent refresh
checked the existence of a unique index on the matview after filling
the data to new snapshot, i.e., after calling refresh_matview_datafill().
So, when there was no unique index, we could need to wait a long time
before we detected that and got the error. It was a waste of time.

To eliminate such wasting time, this commit changes concurrent refresh
so that it checks the existence of a unique index at the beginning of
the refresh operation, i.e., before starting any time-consuming jobs.
If CONCURRENTLY option is not allowed due to lack of a unique index,
concurrent refresh can immediately detect it and emit an error.

Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Fujii Masao
2016-02-16 02:15:44 +09:00
Noah Misch 9449c4b1ec Replace broken link in comment. 2016-02-15 02:35:52 -05:00
Tom Lane 9b92e76f7b Make GetLockStatusData's header comment resemble reality.
The API spec for this function was changed completely (and for the better)
by commit 3cba8999b3, but it didn't bother
with anything as mundane as updating the comments.
2016-02-13 15:42:31 -05:00
Bruce Momjian 13a6fa3634 pg_upgrade: Add C comment about NextXID delimiter
We don't test the catversion for the NextXID delimiter change, we just
test the string contents;  explain why.

Reported-by: Michael Paquier
2016-02-12 17:53:36 -05:00
Joe Conway 59a884e985 Change delimiter used for display of NextXID
NextXID has been rendered in the form of a pg_lsn even though it
really is not. This can cause confusion, so change the format from
%u/%u to %u:%u, per discussion on hackers.

Complaint by me, patch by me and Bruce, reviewed by Michael Paquier
and Alvaro. Applied to HEAD only.

Author: Joe Conway, Bruce Momjian
Reviewed-by: Michael Paquier, Alvaro Herrera
Backpatch-through: master
2016-02-12 14:23:59 -08:00
Tom Lane e84e06d2b3 Increase deadlock_timeout some more in the deadlock-hard isolation test.
The previous value of 5s is inadequate for the buildfarm's
CLOBBER_CACHE_ALWAYS animals: they take long enough to do the is-it-waiting
queries that the timeout expires, allowing the database state to change,
before isolationtester is done looking.  Perhaps 10s will be enough.
(If it isn't, I'm inclined to reduce the number of sessions involved.)
2016-02-12 17:22:42 -05:00
Tom Lane dca369320f Revert "isolationtester: don't repeat the is-it-waiting query when retrying a step."
This mostly reverts commit 9c9782f066.
I left in the parts that rearranged removal of completed waiting steps;
but the idea of not rechecking a step's blocked-ness isn't working.
2016-02-12 17:12:23 -05:00
Tom Lane 3992188c2a Revert "Still further tweaking of deadlock isolation tests."
This reverts commit d03130d378.
That was dependent on an isolationtester.c change that now proves
to be broken; we will need to find another solution.
2016-02-12 17:02:59 -05:00
Alvaro Herrera 34f13cc484 pgbench: cleanup use of a "logfile" parameter
There is no reason to have the per-thread logfile file pointer as a
separate parameter in various functions: it's much simpler to put it in
the per-thread state struct instead, which is already being passed to
all functions that need the log file anyway.  Change the callsites in
which it was used as a boolean to test whether logging is active, so
that they use the use_log global variable instead.

No backpatch, even though this exists since commit a887c486d5 of March
2010, because this is just for cleanliness' sake and the surrounding
code has been modified a lot recently anyway.
2016-02-12 17:30:46 -03:00
Alvaro Herrera db94419ffd pgbench: fix segfault with empty sql file
Commit 1d0c3b3f8a introduced a bug that causes pgbench to crash if an
empty script file is specified.  Fix it by rejecting such files at
startup, which is the historical and intended behavior.

Reported-By: Jeff Janes
Discussion: https://www.postgresql.org/message-id/CAMkU=1zxKUbLPOt9hQWFp14pTc=V0cGo2GQBbn2GsK2Pu+8ZfA@mail.gmail.com
2016-02-12 17:14:45 -03:00
Tom Lane d03130d378 Still further tweaking of deadlock isolation tests.
It turns out that there is a second race condition in the new deadlock-hard
test: once the deadlock detector fires, it's uncertain whether step s7a8 or
step s8a1 will report first, because killing s8's transaction unblocks s7.
So far, s7 has only been seen to report first in CLOBBER_CACHE_ALWAYS
builds, but it's pretty reproducible there, and in theory it should
sometimes occur in normal builds too.  If s7 were a bit slower than usual,
that could also break the test, since the existing expected-file assumes
that we'll see s7a8 report the first time we check it after s8a1 completes.
To fix, add a post-lock delay to s7a8.
2016-02-12 14:19:57 -05:00
Tom Lane 9c9782f066 isolationtester: don't repeat the is-it-waiting query when retrying a step.
If we're retrying a step, then we already decided it was blocked on a lock,
and there's no need to recheck that.  The original coding of commit
38f8bdcac4 resulted in a large number of
is-it-waiting queries when dealing with multiple concurrently-blocked
sessions, which is fairly pointless and also results in test failures in
CLOBBER_CACHE_ALWAYS builds, where the is-it-waiting query is quite slow.

This definition also permits appending pg_sleep() calls to steps where it's
needed to control the order of finish of concurrent steps.  Before, that
did not work nicely because we'd decide that a step performing a sleep was
not blocked and hang up waiting for it to finish, rather than noticing the
completion of the concurrent step we're supposed to notice first.

In passing, revise handling of removal of completed waiting steps
to make it a bit less messy.
2016-02-12 14:10:36 -05:00
Tom Lane a361490806 Re-pgindent isolationtester.c.
Need to do some more hacking on this, and got annoyed that it's not
indent clean.
2016-02-12 13:36:13 -05:00
Peter Eisentraut 29b4b7bda6 Fix whitespace 2016-02-12 12:08:40 -05:00
Robert Haas bcac23de73 Introduce extensible node types.
An extensible node is always tagged T_Extensible, but the extnodename
field identifies it more specifically; it may also include arbitrary
private data.  Extensible nodes can be copied, tested for equality,
serialized, and deserialized, but the core system doesn't know
anything about them otherwise.  Some extensions may find it useful to
include these nodes in fdw_private or custom_private lists in lieu of
arm-wrestling their data into a format that the core code can
understand.

Along the way, so as not to burden the authors of such extensible
node types too much, expose the functions for writing serialized
tokens, and for serializing and deserializing bitmapsets.

KaiGai Kohei, per a design suggested by me.  Reviewed by Andres Freund
and by me, and further edited by me.
2016-02-12 09:38:11 -05:00
Robert Haas 63461a63f9 Make builtin lwlock tranche names consistent.
Previously, we had a mix of styles.

Amit Kapila
2016-02-12 08:07:11 -05:00
Tom Lane caefc11ef6 Further tweaking of deadlock isolation tests.
The new deadlock-soft-2 test has a timing dependency too: it supposes
that isolationtester will detect step s1b as waiting before the deadlock
detector runs and grants it the lock.  Adjust deadlock_timeout to ensure
that that's true even in CLOBBER_CACHE_ALWAYS builds, where the wait
detection query is quite slow.  Per buildfarm member jaguarundi.
2016-02-11 23:21:33 -05:00
Tom Lane f144f73242 Refactor check_functional_grouping() to use get_primary_key_attnos().
If we ever get around to allowing functional dependency to be proven
from other things besides simple primary keys, this code will need to
be rethought, but that was true anyway.  In the meantime, we might as
well not have two very-similar routines for scanning pg_constraint.

David Rowley, reviewed by Julien Rouhaud
2016-02-11 17:52:03 -05:00
Tom Lane d4c3a156cb Remove GROUP BY columns that are functionally dependent on other columns.
If a GROUP BY clause includes all columns of a non-deferred primary key,
as well as other columns of the same relation, those other columns are
redundant and can be dropped from the grouping; the pkey is enough to
ensure that each row of the table corresponds to a separate group.
Getting rid of the excess columns will reduce the cost of the sorting or
hashing needed to implement GROUP BY, and can indeed remove the need for
a sort step altogether.

This seems worth testing for since many query authors are not aware of
the GROUP-BY-primary-key exception to the rule about queries not being
allowed to reference non-grouped-by columns in their targetlists or
HAVING clauses.  Thus, redundant GROUP BY items are not uncommon.  Also,
we can make the test pretty cheap in most queries where it won't help
by not looking up a rel's primary key until we've found that at least
two of its columns are in GROUP BY.

David Rowley, reviewed by Julien Rouhaud
2016-02-11 17:34:59 -05:00
Tom Lane 72eee410d4 Move pg_constraint.h function declarations to new file pg_constraint_fn.h.
A pending patch requires exporting a function returning Bitmapset from
catalog/pg_constraint.c.  As things stand, that would mean including
nodes/bitmapset.h in pg_constraint.h, which might be hazardous for the
client-side includability of that header.  It's not entirely clear whether
any client-side code needs to include pg_constraint.h, but it seems prudent
to assume that there is some such code somewhere.  Therefore, split off the
function definitions into a new file pg_constraint_fn.h, similarly to what
we've done for some other catalog header files.
2016-02-11 15:51:28 -05:00
Tom Lane 2564be360a Fix typo in comment. 2016-02-11 15:20:14 -05:00
Tom Lane d18643c4a6 Shift the responsibility for emitting "database system is shut down".
Historically this message has been emitted at the end of ShutdownXLOG().
That's not an insane place for it in a standalone backend, but in the
postmaster environment we've grown a fair amount of stuff that happens
later, including archiver/walsender shutdown, stats collector shutdown,
etc.  Recent buildfarm experimentation showed that on slower machines
there could be many seconds' delay between finishing ShutdownXLOG() and
actual postmaster exit.  That's fairly confusing, both for testing
purposes and for DBAs.  Hence, move the code that prints this message
into UnlinkLockFiles(), so that it comes out just after we remove the
postmaster's pidfile.  That is a more appropriate definition of "is shut
down" from the point of view of "pg_ctl stop", for example.  In general,
removing the pidfile should be the last externally-visible action of
either a postmaster or a standalone backend; compare commit
d73d14c271 for instance.  So this seems
like a reasonably future-proof approach.
2016-02-11 14:14:22 -05:00
Robert Haas c319991bca Use separate lwlock tranches for buffer, lock, and predicate lock managers.
This finishes the work - spread across many commits over the last
several months - of putting each type of lock other than the named
individual locks into a separate tranche.

Amit Kapila
2016-02-11 14:07:33 -05:00
Tom Lane b11d07b6a3 Make new deadlock isolation test more reproducible.
The original formulation of 4c9864b9b4
was extremely timing-sensitive, because it arranged for the deadlock
detector to be running (and possibly unblocking the current query)
at almost exactly the same time as isolationtester would be probing
to see if the query is blocked.  The committed expected-file assumed
that the deadlock detection would finish first, but we see the opposite
on both fast and slow buildfarm animals.  Adjust the deadlock timeout
settings to make it predictable that isolationtester *will* see the
query as waiting before deadlock detection unblocks it.

I used a 5s timeout for the same reasons mentioned in
a7921f71a3.
2016-02-11 11:59:11 -05:00
Tom Lane d9dc2b4149 Code review for isolationtester changes.
Fix a few oversights in 38f8bdcac4982215beb9f65a19debecaf22fd470:
don't leak memory in run_permutation(), remember when we've issued
a cancel rather than issuing another one every 10ms,
fix some typos in comments.
2016-02-11 11:30:52 -05:00
Teodor Sigaev 07d25a964b Improve error reporting in format()
Clarify invalid format conversion type error message and add hint.

Author: Jim Nasby
2016-02-11 18:11:11 +03:00
Robert Haas a455878d99 Rename PGPROC fields related to group XID clearing again.
Commit 0e141c0fbb introduced a new
facility to reduce ProcArrayLock contention by clearing several XIDs
from the ProcArray under a single lock acquisition.  The names
initially chosen were deemed not to be very good choices, so commit
4aec49899e renamed them.  But now it
seems like we still didn't get it right.  A pending patch wants to
add similar infrastructure for batching CLOG updates, so the names
need to be clear enough to allow a new set of structure members with
a related purpose.

Amit Kapila
2016-02-11 08:55:24 -05:00
Robert Haas 4c9864b9b4 Add some isolation tests for deadlock detection and resolution.
Previously, we had no test coverage for the deadlock detector.
2016-02-11 08:38:09 -05:00
Robert Haas 38f8bdcac4 Modify the isolation tester so that multiple sessions can wait.
This allows testing of deadlock scenarios.  Scenarios that would
previously have been considered invalid are now simply taken as a
scenario in which more than one backend will wait.
2016-02-11 08:36:30 -05:00
Robert Haas c9882c60f4 Specify permutations for isolation tests with "invalid" permutations.
This is a necessary prerequisite for forthcoming changes to allow deadlock
scenarios to be tested by the isolation tester.  It is also a good idea on
general principle, since these scenarios add no useful test coverage not
provided by other scenarios, but do to take time to execute.
2016-02-11 08:33:24 -05:00
Noah Misch 64d89a93c0 In pg_rewind test suite, triple promote timeout to 90s.
Thirty seconds was not consistently enough for promotion to complete on
buildfarm members sungazer and tern.  Experiments suggest 43s would have
been enough.  Back-patch to 9.5, where pg_rewind was introduced.
2016-02-10 20:34:57 -05:00
Noah Misch 2ffa869620 Accept pg_ctl timeout from the PGCTLTIMEOUT environment variable.
Many automated test suites call pg_ctl.  Buildfarm members axolotl,
hornet, mandrill, shearwater, sungazer and tern have failed when server
shutdown took longer than the pg_ctl default 60s timeout.  This addition
permits slow hosts to easily raise the timeout without us editing a
--timeout argument into every test suite pg_ctl call.  Back-patch to 9.1
(all supported versions) for the sake of automated testing.

Reviewed by Tom Lane.
2016-02-10 20:34:02 -05:00
Tom Lane 51e78ab4ff Avoid use of sscanf() to parse ispell dictionary files.
It turns out that on FreeBSD-derived platforms (including OS X), the
*scanf() family of functions is pretty much brain-dead about multibyte
characters.  In particular it will apply isspace() to individual bytes
of input even when those bytes are part of a multibyte character, thus
allowing false recognition of a field-terminating space.

We appear to have little alternative other than instituting a coding
rule that *scanf() is not to be used if the input string might contain
multibyte characters.  (There was some discussion of relying on "%ls",
but that probably just moves the portability problem somewhere else,
and besides it doesn't fully prevent BSD *scanf() from using isspace().)

This patch is a down payment on that: it gets rid of use of sscanf()
to parse ispell dictionary files, which are certainly at great risk
of having a problem.  The code is cleaner this way anyway, though
a bit longer.

In passing, improve a few comments.

Report and patch by Artur Zakirov, reviewed and somewhat tweaked by me.
Back-patch to all supported branches.
2016-02-10 19:30:11 -05:00
Tom Lane c5e9b77127 Revert "Temporarily make pg_ctl and server shutdown a whole lot chattier."
This reverts commit 3971f64843 and a
couple of followon debugging commits; I think we've learned what we can
from them.
2016-02-10 16:01:04 -05:00
Robert Haas 79a7ff0fe5 Code cleanup in the wake of recent LWLock refactoring.
As of commit c1772ad922, there's no
longer any way of requesting additional LWLocks in the main tranche,
so we don't need NumLWLocks() or LWLockAssign() any more.  Also,
some of the allocation counters that we had previously aren't needed
any more either.

Amit Kapila
2016-02-10 09:58:09 -05:00
Tom Lane 41d505a7ff Add still more chattiness in server shutdown.
Further investigation says that there may be some slow operations after
we've finished ShutdownXLOG(), so add some more log messages to try to
isolate that.  This is all temporary code too.
2016-02-09 19:36:30 -05:00
Tom Lane 7351e18286 Add more chattiness in server shutdown.
Early returns from the buildfarm show that there's a bit of a gap in the
logging I added in 3971f64843b02e4a: the portion of CreateCheckPoint()
after CheckPointGuts() can take a fair amount of time.  Add a few more
log messages in that section of code.  This too shall be reverted later.
2016-02-09 11:21:46 -05:00
Tom Lane 3971f64843 Temporarily make pg_ctl and server shutdown a whole lot chattier.
This is a quick hack, due to be reverted when its purpose has been served,
to try to gather information about why some of the buildfarm critters
regularly fail with "postmaster does not shut down" complaints.  Maybe they
are just really overloaded, but maybe something else is going on.  Hence,
instrument pg_ctl to print the current time when it starts waiting for
postmaster shutdown and when it gives up, and add a lot of logging of the
current time in the server's checkpoint and shutdown code paths.

No attempt has been made to make this pretty.  I'm not even totally sure
if it will build on Windows, but we'll soon find out.
2016-02-08 18:43:11 -05:00
Tom Lane 0231f83856 Re-pgindent varlena.c.
Just to make sure previous commit worked ...
2016-02-08 15:17:40 -05:00