Commit Graph

41450 Commits

Author SHA1 Message Date
Andrew Dunstan
69ce3ead1e Add libxml2 include path for MSVC builds
On Unix this path is detected via the use of xml2-config, but that's not
available on Windows. This means that users building with libxml2 will
no longer need to move things around from the standard libxml2
installation for MSVC builds.

Backpatch to all live branches.
2017-05-12 10:22:20 -04:00
Tom Lane
4d4cdc5065 Doc fix: scale(numeric) returns integer, not numeric.
Thinko in commit abb173392, which introduced this function.

Report: https://postgr.es/m/20170511215234.1795.54347@wrigleys.postgresql.org
2017-05-11 18:09:28 -04:00
Tom Lane
128e133cc6 Increase MAX_SYSCACHE_CALLBACKS to provide more room for extensions.
Increase from the historical value of 32 to 64.  We are up to 31 callers
of CacheRegisterSyscacheCallback() in HEAD, so if they were all to be
exercised in one process that would leave only one slot for add-on modules.
It's probably not possible for that to happen, but still we clearly need
more daylight here.  (At some point it might be worth making the array
dynamically resizable; but since we've never heard a complaint of "out of
syscache_callback_list slots" happening in the field, I doubt it's worth
it yet.)

Back-patch as far as 9.4, which is where we increased the companion limit
MAX_RELCACHE_CALLBACKS (cf commit f01d1ae3a).  It's not as urgent in
released branches, which have only a couple dozen call sites in core, but
it still seems that somebody might hit the limit before these branches die.

Discussion: https://postgr.es/m/12184.1494450131@sss.pgh.pa.us
2017-05-11 14:51:33 -04:00
Peter Eisentraut
7f335e780c psql: Add missing translation markers 2017-05-10 10:15:36 -04:00
Alvaro Herrera
bfaba24829 Ignore PQcancel errors properly
Add a (void) cast to all PQcancel() calls that purposefully don't check
the return value, to keep compilers and static checkers happy.

Per Coverity.
2017-05-09 14:58:51 -03:00
Tom Lane
ca9cfed883 Stamp 9.6.3. 2017-05-08 17:15:12 -04:00
Tom Lane
935e77d527 Further patch rangetypes_selfuncs.c's statistics slot management.
Values in a STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM slot are float8,
not of the type of the column the statistics are for.

This bug is at least partly the fault of sloppy specification comments
for get_attstatsslot()/free_attstatsslot(): the type OID they want is that
of the stavalues entries, not of the underlying column.  (I double-checked
other callers and they seem to get this right.)  Adjust the comments to be
more correct.

Per buildfarm.

Security: CVE-2017-7484
2017-05-08 15:02:57 -04:00
Tom Lane
2d5e7b4a91 Last-minute updates for release notes.
Security: CVE-2017-7484, CVE-2017-7485, CVE-2017-7486
2017-05-08 12:57:27 -04:00
Tom Lane
cad1594322 Fix possibly-uninitialized variable.
Oversight in e2d4ef8de et al (my fault not Peter's).  Per buildfarm.

Security: CVE-2017-7484
2017-05-08 11:18:54 -04:00
Noah Misch
c928addfcc Match pg_user_mappings limits to information_schema.user_mapping_options.
Both views replace the umoptions field with NULL when the user does not
meet qualifications to see it.  They used different qualifications, and
pg_user_mappings documented qualifications did not match its implemented
qualifications.  Make its documentation and implementation match those
of user_mapping_options.  One might argue for stronger qualifications,
but these have long, documented tenure.  pg_user_mappings has always
exhibited this problem, so back-patch to 9.2 (all supported versions).

Michael Paquier and Feike Steenbergen.  Reviewed by Jeff Janes.
Reported by Andrew Wheelwright.

Security: CVE-2017-7486
2017-05-08 07:24:27 -07:00
Noah Misch
aafbd1df96 Restore PGREQUIRESSL recognition in libpq.
Commit 65c3bf19fd moved handling of the,
already then, deprecated requiressl parameter into conninfo_storeval().
The default PGREQUIRESSL environment variable was however lost in the
change resulting in a potentially silent accept of a non-SSL connection
even when set.  Its documentation remained.  Restore its implementation.
Also amend the documentation to mark PGREQUIRESSL as deprecated for
those not following the link to requiressl.  Back-patch to 9.3, where
commit 65c3bf1 first appeared.

Behavior has been more complex when the user provides both deprecated
and non-deprecated settings.  Before commit 65c3bf1, libpq operated
according to the first of these found:

  requiressl=1
  PGREQUIRESSL=1
  sslmode=*
  PGSSLMODE=*

(Note requiressl=0 didn't override sslmode=*; it would only suppress
PGREQUIRESSL=1 or a previous requiressl=1.  PGREQUIRESSL=0 had no effect
whatsoever.)  Starting with commit 65c3bf1, libpq ignored PGREQUIRESSL,
and order of precedence changed to this:

  last of requiressl=* or sslmode=*
  PGSSLMODE=*

Starting now, adopt the following order of precedence:

  last of requiressl=* or sslmode=*
  PGSSLMODE=*
  PGREQUIRESSL=1

This retains the 65c3bf1 behavior for connection strings that contain
both requiressl=* and sslmode=*.  It retains the 65c3bf1 change that
either connection string option overrides both environment variables.
For the first time, PGSSLMODE has precedence over PGREQUIRESSL; this
avoids reducing security of "PGREQUIRESSL=1 PGSSLMODE=verify-full"
configurations originating under v9.3 and later.

Daniel Gustafsson

Security: CVE-2017-7485
2017-05-08 07:24:27 -07:00
Peter Eisentraut
0294ac2a5d Translation updates
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: dd6f9ed0d9d7b33d328761dcdd3a70f44aaff6ff
2017-05-08 10:10:54 -04:00
Peter Eisentraut
c33c423622 Add security checks to selectivity estimation functions
Some selectivity estimation functions run user-supplied operators over
data obtained from pg_statistic without security checks, which allows
those operators to leak pg_statistic data without having privileges on
the underlying tables.  Fix by checking that one of the following is
satisfied: (1) the user has table or column privileges on the table
underlying the pg_statistic data, or (2) the function implementing the
user-supplied operator is leak-proof.  If neither is satisfied, planning
will proceed as if there are no statistics available.

At least one of these is satisfied in most cases in practice.  The only
situations that are negatively impacted are user-defined or
not-leak-proof operators on a security-barrier view.

Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Peter Eisentraut <peter_e@gmx.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>

Security: CVE-2017-7484
2017-05-08 09:18:57 -04:00
Tom Lane
3178f467c8 Release notes for 9.6.3, 9.5.7, 9.4.12, 9.3.17, 9.2.21. 2017-05-07 16:56:02 -04:00
Tom Lane
fab2d0d7f4 Guard against null t->tm_zone in strftime.c.
The upstream IANA code does not guard against null TM_ZONE pointers in this
function, but in our code there is such a check in the other pre-existing
use of t->tm_zone.  We do have some places that set pg_tm.tm_zone to NULL.
I'm not entirely sure it's possible to reach strftime with such a value,
but I'm not sure it isn't either, so be safe.

Per Coverity complaint.
2017-05-07 12:33:18 -04:00
Tom Lane
f754728170 Install the "posixrules" timezone link in MSVC builds.
Somehow, we'd missed ever doing this.  The consequences aren't too
severe: basically, the timezone library would fall back on its hardwired
notion of the DST transition dates to use for a POSIX-style zone name,
rather than obeying US/Eastern which is the intended behavior.  The net
effect would only be to obey current US DST law further back than it
ought to apply; so it's not real surprising that nobody noticed.

David Rowley, per report from Amit Kapila

Discussion: https://postgr.es/m/CAA4eK1LC7CaNhRAQ__C3ht1JVrPzaAXXhEJRnR5L6bfYHiLmWw@mail.gmail.com
2017-05-07 11:57:41 -04:00
Tom Lane
5042e9492b Restore fullname[] contents before falling through in pg_open_tzfile().
Fix oversight in commit af2c5aa88: if the shortcut open() doesn't work,
we need to reset fullname[] to be just the name of the toplevel tzdata
directory before we fall through into the pre-existing code.  This failed
to be exposed in my (tgl's) testing because the fall-through path is
actually never taken under normal circumstances.

David Rowley, per report from Amit Kapila

Discussion: https://postgr.es/m/CAA4eK1LC7CaNhRAQ__C3ht1JVrPzaAXXhEJRnR5L6bfYHiLmWw@mail.gmail.com
2017-05-07 11:34:41 -04:00
Stephen Frost
ef42c11037 pg_dump: Don't leak memory in buildDefaultACLCommands()
buildDefaultACLCommands() didn't destroy the string buffer created in
certain cases, leading to a memory leak.  Fix by destroying the buffer
before returning from the function.

Spotted by Coverity.

Author: Michael Paquier

Back-patch to 9.6 where buildDefaultACLCommands() was added.
2017-05-06 22:58:22 -04:00
Stephen Frost
92b15224b4 RLS: Fix ALL vs. SELECT+UPDATE policy usage
When we add the SELECT-privilege based policies to the RLS with check
options (such as for an UPDATE statement, or when we have INSERT ...
RETURNING), we need to be sure and use the 'USING' case if the policy is
actually an 'ALL' policy (which could have both a USING clause and an
independent WITH CHECK clause).

This could result in policies acting differently when built using ALL
(when the ALL had both USING and WITH CHECK clauses) and when building
the policies independently as SELECT and UPDATE policies.

Fix this by adding an explicit boolean to add_with_check_options() to
indicate when the USING policy should be used, even if the policy has
both USING and WITH CHECK policies on it.

Reported by: Rod Taylor

Back-patch to 9.5 where RLS was introduced.
2017-05-06 21:46:41 -04:00
Tom Lane
a24a1a2ec4 Document current_role.
This system function has been there a very long time, but somehow escaped
being listed in func.sgml.

Fabien Coelho and Tom Lane

Discussion: https://postgr.es/m/alpine.DEB.2.20.1705061027580.3896@lancre
2017-05-06 14:19:52 -04:00
Alvaro Herrera
19a4033785 Allow MSVC to build with Tcl 8.6.
Commit eaba54c20c added support for Tcl 8.6 for configure-supported
platforms after verifying that pltcl works without further changes, but
the MSVC tooling wasn't updated accordingly.  Update MSVC to match,
restructuring the code to avoid duplicating the logic for every Tcl
version supported.

Backpatch to all live branches, like eaba54c20c.  In 9.4 and previous,
change the patch to use backslashes rather than forward, as in the rest
of the file.

Reported by Paresh More, who also tested the patch I provided.
Discussion: https://postgr.es/m/CAAgiCNGVw3ssBtSi3ZNstrz5k00ax=UV+_ZEHUeW_LMSGL2sew@mail.gmail.com
2017-05-05 12:05:34 -03:00
Heikki Linnakangas
bd05ad8b05 Give nicer error message when connecting to a v10 server requiring SCRAM.
This is just to give the user a hint that they need to upgrade, if they try
to connect to a v10 server that uses SCRAM authentication, with an older
client.

Commit to all stable branches, but not master.

Discussion: https://www.postgresql.org/message-id/bbf45d92-3896-eeb7-7399-2111d517261b@pivotal.io
2017-05-05 11:24:02 +03:00
Peter Eisentraut
071d13395c Fix cursor_to_xml in tableforest false mode
It only produced <row> elements but no wrapping <table> element.

By contrast, cursor_to_xmlschema produced a schema that is now correct
but did not previously match the XML data produced by cursor_to_xml.

In passing, also fix a minor misunderstanding about moving cursors in
the tests related to this.

Reported-by: filip@jirsak.org
Based-on-patch-by: Thomas Munro <thomas.munro@enterprisedb.com>
2017-05-04 21:17:46 -04:00
Tom Lane
855f0e9247 Fix pfree-of-already-freed-tuple when rescanning a GiST index-only scan.
GiST's getNextNearest() function attempts to pfree the previously-returned
tuple if any (that is, scan->xs_hitup in HEAD, or scan->xs_itup in older
branches).  However, if we are rescanning a plan node after ending a
previous scan early, those tuple pointers could be pointing to garbage,
because they would be pointing into the scan's pageDataCxt or queueCxt
which has been reset.  In a debug build this reliably results in a crash,
although I think it might sometimes accidentally fail to fail in
production builds.

To fix, clear the pointer field anyplace we reset a context it might
be pointing into.  This may be overkill --- I think probably only the
queueCxt case is involved in this bug, so that resetting in gistrescan()
would be sufficient --- but dangling pointers are generally bad news,
so let's avoid them.

Another plausible answer might be to just not bother with the pfree in
getNextNearest().  The reconstructed tuples would go away anyway in the
context resets, and I'm far from convinced that freeing them a bit earlier
really saves anything meaningful.  I'll stick with the original logic in
this patch, but if we find more problems in the same area we should
consider that approach.

Per bug #14641 from Denis Smirnov.  Back-patch to 9.5 where this
logic was introduced.

Discussion: https://postgr.es/m/20170504072034.24366.57688@wrigleys.postgresql.org
2017-05-04 13:59:13 -04:00
Tom Lane
56064d5512 Remove useless and rather expensive stanza in matview regression test.
This removes a test case added by commit b69ec7cc9, which was intended
to exercise a corner case involving the rule used at that time that
materialized views were unpopulated iff they had physical size zero.
We got rid of that rule very shortly later, in commit 1d6c72a55, but
kept the test case.  However, because the case now asks what VACUUM
will do to a zero-sized physical file, it would be pretty surprising
if the answer were ever anything but "nothing" ... and if things were
indeed that broken, surely we'd find it out from other tests.  Since
the test involves a table that's fairly large by regression-test
standards (100K rows), it's quite slow to run.  Dropping it should
save some buildfarm cycles, so let's do that.

Discussion: https://postgr.es/m/32386.1493831320@sss.pgh.pa.us
2017-05-03 19:37:01 -04:00
Tom Lane
c521d5a8d7 Improve performance of timezone loading, especially pg_timezone_names view.
tzparse() would attempt to load the "posixrules" timezone database file on
each call.  That might seem like it would only be an issue when selecting a
POSIX-style zone name rather than a zone defined in the timezone database,
but it turns out that each zone definition file contains a POSIX-style zone
string and tzload() will call tzparse() to parse that.  Thus, when scanning
the whole timezone file tree as we do in the pg_timezone_names view,
"posixrules" was read repetitively for each zone definition file.  Fix
that by caching the file on first use within any given process.  (We cache
other zone definitions for the life of the process, so there seems little
reason not to cache this one as well.)  This probably won't help much in
processes that never run pg_timezone_names, but even one additional SET
of the timezone GUC would come out ahead.

An even worse problem for pg_timezone_names is that pg_open_tzfile()
has an inefficient way of identifying the canonical case of a zone name:
it basically re-descends the directory tree to the zone file.  That's not
awful for an individual "SET timezone" operation, but it's pretty horrid
when we're inspecting every zone in the database.  And it's pointless too
because we already know the canonical spelling, having just read it from
the filesystem.  Fix by teaching pg_open_tzfile() to avoid the directory
search if it's not asked for the canonical name, and backfilling the
proper result in pg_tzenumerate_next().

In combination these changes seem to make the pg_timezone_names view
about 3x faster to read, for me.  Since a scan of pg_timezone_names
has up to now been one of the slowest queries in the regression tests,
this should help some little bit for buildfarm cycle times.

Back-patch to all supported branches, not so much because it's likely
that users will care much about the view's performance as because
tracking changes in the upstream IANA timezone code is really painful
if we don't keep all the branches in sync.

Discussion: https://postgr.es/m/27962.1493671706@sss.pgh.pa.us
2017-05-02 21:50:42 -04:00
Tom Lane
d56b8b41b3 Ensure commands in extension scripts see the results of preceding DDL.
Due to a missing CommandCounterIncrement() call, parsing of a non-utility
command in an extension script would not see the effects of the immediately
preceding DDL command, unless that command's execution ends with
CommandCounterIncrement() internally ... which some do but many don't.
Report by Philippe Beaudoin, diagnosis by Julien Rouhaud.

Rather remarkably, this bug has evaded detection since extensions were
invented, so back-patch to all supported branches.

Discussion: https://postgr.es/m/2cf7941e-4e41-7714-3de8-37b1a8f74dff@free.fr
2017-05-02 18:05:54 -04:00
Andrew Dunstan
f06caa09d9 Fix perl thinko in commit fed6df486d
Report and fix from Vaishnavi Prabakaran

Backpatch to 9.4 like original.
2017-05-02 08:22:45 -04:00
Tom Lane
1fdc3f6e8a Update time zone data files to tzdata release 2017b.
DST law changes in Chile, Haiti, and Mongolia.  Historical corrections for
Ecuador, Kazakhstan, Liberia, and Spain.

The IANA crew continue their campaign to replace invented time zone
abbrevations with numeric GMT offsets.  This update changes numerous zones
in South America, the Pacific and Indian oceans, and some Asian and Middle
Eastern zones.  I kept these abbreviations in the tznames/ data files,
however, so that we will still accept them for input.  (We may want to
start trimming those files someday, but I think we should wait for the
upstream dust to settle before deciding what to do.)

In passing, add MESZ (Mitteleuropaeische Sommerzeit) to the tznames lists;
since we accept MEZ (Mitteleuropaeische Zeit) it seems rather strange not
to take the other one.  And fix some incorrect, or at least obsolete,
comments that certain abbreviations are not traceable to the IANA data.
2017-05-01 11:53:42 -04:00
Andrew Dunstan
66510630d8 Allow vcregress.pl to run an arbitrary TAP test set
Currently only provision for running the bin checks in a single step is
provided for. Now these tests can be run individually, as well as tests
in other locations (e.g. src.test/recover).

Also provide for suppressing unnecessary temp installs by setting the
NO_TEMP_INSTALL environment variable just as the Makefiles do.

Backpatch to 9.4.
2017-05-01 10:52:21 -04:00
Tom Lane
6872d96a3b Sync our copy of the timezone library with IANA release tzcode2017b.
zic no longer mishandles some transitions in January 2038 when it
attempts to work around Qt bug 53071.  This fixes a bug affecting
Pacific/Tongatapu that was introduced in zic 2016e.  localtime.c
now contains a workaround, useful when loading a file generated by
a buggy zic.

There are assorted cosmetic changes as well, notably relocation
of a bunch of #defines.
2017-04-30 15:14:06 -04:00
Peter Eisentraut
0a65b18d2d doc: Fix typo in 9.6 release notes
Author: Huong Dangminh <huo-dangminh@ys.jp.nec.com>
2017-04-28 15:31:25 -04:00
Robert Haas
8a9c83bfaf Fix VALIDATE CONSTRAINT to consider NO INHERIT attribute.
Currently, trying to validate a NO INHERIT constraint on the parent will
search for the constraint in child tables (where it is not supposed to
exist), wrongly causing a "constraint does not exist" error.

Amit Langote, per a report from Hans Buschmann.

Discussion: http://postgr.es/m/20170421184012.24362.19@wrigleys.postgresql.org
2017-04-28 14:48:44 -04:00
Andres Freund
29e8c881dd Don't use on-disk snapshots for exported logical decoding snapshot.
Logical decoding stores historical snapshots on disk, so that logical
decoding can restart without having to reconstruct a snapshot from
scratch (for which the resources are not guaranteed to be present
anymore).  These serialized snapshots were also used when creating a
new slot via the walsender interface, which can export a "full"
snapshot (i.e. one that can read all tables, not just catalog ones).

The problem is that the serialized snapshots are only useful for
catalogs and not for normal user tables.  Thus the use of such a
serialized snapshot could result in an inconsistent snapshot being
exported, which could lead to queries returning wrong data.  This
would only happen if logical slots are created while another logical
slot already exists.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
2017-04-27 15:29:33 -07:00
Tom Lane
53b1578460 Cope with glibc too old to have epoll_create1().
Commit fa31b6f4e supposed that we didn't have to worry about that
anymore, but it seems that RHEL5 is like that, and that's still
a supported platform.  Put back the prior coding under an #ifdef,
adding an explicit fcntl() to retain the desired CLOEXEC property.

Discussion: https://postgr.es/m/12307.1493325329@sss.pgh.pa.us
2017-04-27 17:13:54 -04:00
Andres Freund
28afff347a Preserve required !catalog tuples while computing initial decoding snapshot.
The logical decoding machinery already preserved all the required
catalog tuples, which is sufficient in the course of normal logical
decoding, but did not guarantee that non-catalog tuples were preserved
during computation of the initial snapshot when creating a slot over
the replication protocol.

This could cause a corrupted initial snapshot being exported.  The
time window for issues is usually not terribly large, but on a busy
server it's perfectly possible to it hit it.  Ongoing decoding is not
affected by this bug.

To avoid increased overhead for the SQL API, only retain additional
tuples when a logical slot is being created over the replication
protocol.  To do so this commit changes the signature of
CreateInitDecodingContext(), but it seems unlikely that it's being
used in an extension, so that's probably ok.

In a drive-by fix, fix handling of
ReplicationSlotsComputeRequiredXmin's already_locked argument, which
should only apply to ProcArrayLock, not ReplicationSlotControlLock.

Reported-By: Erik Rijkers
Analyzed-By: Petr Jelinek
Author: Petr Jelinek, heavily editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
2017-04-27 13:13:36 -07:00
Tom Lane
866452cd8b Make latch.c more paranoid about child-process cases.
Although the postmaster doesn't currently create a self-pipe or any
latches, there's discussion of it doing so in future.  It's also
conceivable that a shared_preload_libraries extension would try to
create such a thing in the postmaster process today.  In that case
the self-pipe FDs would be inherited by forked child processes.
latch.c was entirely unprepared for such a case and could suffer an
assertion failure, or worse try to use the inherited pipe if somebody
called WaitLatch without having called InitializeLatchSupport in that
process.  Make it keep track of whether InitializeLatchSupport has been
called in the *current* process, and do the right thing if state has
been inherited from a parent.

Apply FD_CLOEXEC to file descriptors created in latch.c (the self-pipe,
as well as epoll event sets).  This ensures that child processes spawned
in backends, the archiver, etc cannot accidentally or intentionally mess
with these FDs.  It also ensures that we end up with the right state
for the self-pipe in EXEC_BACKEND processes, which otherwise wouldn't
know to close the postmaster's self-pipe FDs.

Back-patch to 9.6, mainly to keep latch.c looking similar in all branches
it exists in.

Discussion: https://postgr.es/m/8322.1493240739@sss.pgh.pa.us
2017-04-27 15:07:36 -04:00
Tom Lane
e880df25ec Allow multiple bgworkers to be launched per postmaster iteration.
Previously, maybe_start_bgworker() would launch at most one bgworker
process per call, on the grounds that the postmaster might otherwise
neglect its other duties for too long.  However, that seems overly
conservative, especially since bad effects only become obvious when
many hundreds of bgworkers need to be launched at once.  On the other
side of the coin is that the existing logic could result in substantial
delay of bgworker launches, because ServerLoop isn't guaranteed to
iterate immediately after a signal arrives.  (My attempt to fix that
by using pselect(2) encountered too many portability question marks,
and in any case could not help on platforms without pselect().)
One could also question the wisdom of using an O(N^2) processing
method if the system is intended to support so many bgworkers.

As a compromise, allow that function to launch up to 100 bgworkers
per call (and in consequence, rename it to maybe_start_bgworkers).
This will allow any normal parallel-query request for workers
to be satisfied immediately during sigusr1_handler, avoiding the
question of whether ServerLoop will be able to launch more promptly.

There is talk of rewriting the postmaster to use a WaitEventSet to
avoid the signal-response-delay problem, but I'd argue that this change
should be kept even after that happens (if it ever does).

Backpatch to 9.6 where parallel query was added.  The issue exists
before that, but previous uses of bgworkers typically aren't as
sensitive to how quickly they get launched.

Discussion: https://postgr.es/m/4707.1493221358@sss.pgh.pa.us
2017-04-26 16:17:29 -04:00
Peter Eisentraut
86e640a694 postgres_fdw: Fix join push down with extensions
Objects in an extension are shippable to a foreign server if the
extension is part of the foreign server definition's shippable
extensions list.  But this was not properly considered in some cases
when checking whether a join condition can be pushed to a foreign server
and the join condition uses an object from a shippable extension.  So
the join would never be pushed down in those cases.

So, the list of extensions needs to be made available in fpinfo of the
relation being considered to be pushed down before any expressions are
assessed for being shippable.  Fix foreign_join_ok() to do that for a
join relation.

David Rowley and Ashutosh Bapat, per report from David Rowley

reduced version of patch 332bec1e60 for
backpatch to 9.6, first release with join push down
2017-04-26 09:14:21 -04:00
Tom Lane
e615605239 Revert "Use pselect(2) not select(2), if available, to wait in postmaster's loop."
This reverts commit b9515b6287.

Buildfarm results suggest that some platforms have versions of pselect(2)
that are not merely non-atomic, but flat out non-functional.  Revert the
use-pselect patch to confirm this diagnosis (and exclude the no-SA_RESTART
patch as the source of trouble).  If it's so, we should probably look into
blacklisting specific platforms that have broken pselect.

Discussion: https://postgr.es/m/9696.1493072081@sss.pgh.pa.us
2017-04-24 18:29:55 -04:00
Tom Lane
b9515b6287 Use pselect(2) not select(2), if available, to wait in postmaster's loop.
Traditionally we've unblocked signals, called select(2), and then blocked
signals again.  The code expects that the select() will be cancelled with
EINTR if an interrupt occurs; but there's a race condition, which is that
an already-pending signal will be delivered as soon as we unblock, and then
when we reach select() there will be nothing preventing it from waiting.
This can result in a long delay before we perform any action that
ServerLoop was supposed to have taken in response to the signal.  As with
the somewhat-similar symptoms fixed by commit 893902085, the main practical
problem is slow launching of parallel workers.  The window for trouble is
usually pretty short, corresponding to one iteration of ServerLoop; but
it's not negligible.

To fix, use pselect(2) in place of select(2) where available, as that's
designed to solve exactly this problem.  Where not available, we continue
to use the old way, and are no worse off than before.

pselect(2) has been required by POSIX since about 2001, so most modern
platforms should have it.  A bigger portability issue is that some
implementations are said to be non-atomic, ie pselect() isn't really
any different from unblock/select/reblock.  Still, we're no worse off
than before on such a platform.

There is talk of rewriting the postmaster to use a WaitEventSet and
not do signal response work in signal handlers, at which point this
could be reverted, since we'd be using a self-pipe to solve the race
condition.  But that's not happening before v11 at the earliest.

Back-patch to 9.6.  The problem exists much further back, but the
worst symptom arises only in connection with parallel query, so it
does not seem worth taking any portability risks in older branches.

Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
2017-04-24 14:03:14 -04:00
Tom Lane
dfa4baf91c Run the postmaster's signal handlers without SA_RESTART.
The postmaster keeps signals blocked everywhere except while waiting
for something to happen in ServerLoop().  The code expects that the
select(2) will be cancelled with EINTR if an interrupt occurs; without
that, followup actions that should be performed by ServerLoop() itself
will be delayed.  However, some platforms interpret the SA_RESTART
signal flag as meaning that they should restart rather than cancel
the select(2).  Worse yet, some of them restart it with the original
timeout delay, meaning that a steady stream of signal interrupts can
prevent ServerLoop() from iterating at all if there are no incoming
connection requests.

Observable symptoms of this, on an affected platform such as HPUX 10,
include extremely slow parallel query startup (possibly as much as
30 seconds) and failure to update timestamps on the postmaster's sockets
and lockfiles when no new connections arrive for a long time.

We can fix this by running the postmaster's signal handlers without
SA_RESTART.  That would be quite a scary change if the range of code
where signals are accepted weren't so tiny, but as it is, it seems
safe enough.  (Note that postmaster children do, and must, reset all
the handlers before unblocking signals; so this change should not
affect any child process.)

There is talk of rewriting the postmaster to use a WaitEventSet and
not do signal response work in signal handlers, at which point it might
be appropriate to revert this patch.  But that's not happening before
v11 at the earliest.

Back-patch to 9.6.  The problem exists much further back, but the
worst symptom arises only in connection with parallel query, so it
does not seem worth taking any portability risks in older branches.

Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
2017-04-24 13:00:23 -04:00
Tom Lane
63f64d2824 Fix postmaster's handling of fork failure for a bgworker process.
This corner case didn't behave nicely at all: the postmaster would
(partially) update its state as though the process had started
successfully, and be quite confused thereafter.  Fix it to act
like the worker had crashed, instead.

In passing, refactor so that do_start_bgworker contains all the
state-change logic for bgworker launch, rather than just some of it.

Back-patch as far as 9.4.  9.3 contains similar logic, but it's just
enough different that I don't feel comfortable applying the patch
without more study; and the use of bgworkers in 9.3 was so small
that it doesn't seem worth the extra work.

transam/parallel.c is still entirely unprepared for the possibility
of bgworker startup failure, but that seems like material for a
separate patch.

Discussion: https://postgr.es/m/4905.1492813727@sss.pgh.pa.us
2017-04-24 12:16:58 -04:00
Andres Freund
39369b4145 Zero padding in replication origin's checkpointed on disk-state.
This seems to be largely cosmetic, avoiding valgrind bleats and the
like. The uninitialized padding influences the CRC of the on-disk
entry, but because it's also used when verifying the CRC, that doesn't
cause spurious failures.  Backpatch nonetheless.

It's a bit unfortunate that contrib/test_decoding/sql/replorigin.sql
doesn't exercise the checkpoint path, but checkpoints are fairly
expensive on weaker machines, and we'd have to stop/start for that to
be meaningful.

Author: Andres Freund
Discussion: https://postgr.es/m/20170422183123.w2jgiuxtts7qrqaq@alap3.anarazel.de
Backpatch: 9.5, where replication origins were introduced
2017-04-23 15:54:53 -07:00
Tom Lane
f5885488da Fix order of arguments to SubTransSetParent().
ProcessTwoPhaseBuffer (formerly StandbyRecoverPreparedTransactions)
mixed up the parent and child XIDs when calling SubTransSetParent to
record the transactions' relationship in pg_subtrans.

Remarkably, analysis by Simon Riggs suggests that this doesn't lead to
visible problems (at least, not in non-Assert builds).  That might
explain why we'd not noticed it before.  Nonetheless, it's surely wrong.

This code was born broken, so back-patch to all supported branches.

Discussion: https://postgr.es/m/20110.1492905318@sss.pgh.pa.us
2017-04-23 13:11:08 -04:00
Andrew Dunstan
11927e575d Fix TAP infrastructure to support Mingw better
archive_command and restore_command need to refer to Windows paths, not
Msys virtual file system paths, as postgres is completely unaware of the
latter, so prefix them with the Windows path to the virtual file system
root. Clean psql output of carriage returns.
2017-04-23 09:39:43 -04:00
Tom Lane
9d5f0718d7 Make PostgresNode.pm check server status more carefully.
PostgresNode blithely ignored the exit status of pg_ctl, and in general
made no effort to be sure that the server was running when it should be.
This caused it to miss server crashes, which is a serious shortcoming
in a test scaffold.  Make it complain if pg_ctl fails, and modify the
start and stop logic to complain if the server doesn't start, or doesn't
stop, when expected.

Also, have it turn off the "restart_after_crash" configuration parameter
in created clusters, as bitter experience has shown that leaving that on
can mask crashes too.

We might at some point need variant functions that allow for, eg,
server start failure to be expected.  But no existing test case appears
to want that, and it surely shouldn't be the default behavior.

Note that this *will* break the buildfarm, as it will expose known
bugs that the previous testing failed to.  I'm committing it despite
that, to verify that we get the expected failures in the buildfarm
not just in manual testing.

Back-patch into 9.6 where PostgresNode was introduced.  (The 9.6
branch is not expected to show any failures.)

Discussion: https://postgr.es/m/21432.1492886428@sss.pgh.pa.us
2017-04-22 18:18:25 -04:00
Tom Lane
551cc9af57 Make PostgresNode::append_conf append a newline automatically.
Although the documentation for append_conf said clearly that it didn't
add a newline, many test authors seem to have forgotten that ... or maybe
they just consulted the example at the top of the POD documentation,
which clearly shows adding a config entry without bothering to add a
trailing newline.  The worst part of that is that it works, as long as
you don't do it more than once, since the backend isn't picky about
whether config files end with newlines.  So there's not a strong forcing
function reminding test authors not to do it like that.  Upshot is that
this is a terribly fragile way to go about things, and there's at least
one existing test case that is demonstrably broken and not testing what
it thinks it is.

Let's just make append_conf append a newline, instead; that is clearly
way safer than the old definition.

I also cleaned up a few call sites that were unnecessarily ugly.
(I left things alone in places where it's plausible that additional
config lines would need to be added someday.)

Back-patch the change in append_conf itself to 9.6 where it was added,
as having a definitional inconsistency between branches would obviously
be pretty hazardous for back-patching TAP tests.  The other changes are
just cosmetic and don't need to be back-patched.

Discussion: https://postgr.es/m/19751.1492892376@sss.pgh.pa.us
2017-04-22 16:58:15 -04:00
Peter Eisentraut
95a23165fd doc: Update link
The reference "That is the topic of the next section." has been
incorrect since the materialized views documentation got inserted
between the section "rules-views" and "rules-update".

Author: Zertrin <postgres_wiki@zertrin.org>
2017-04-21 19:43:50 -04:00
Tom Lane
f07f6b62c9 Avoid depending on non-POSIX behavior of fcntl(2).
The POSIX standard does not say that the success return value for
fcntl(F_SETFD) and fcntl(F_SETFL) is zero; it says only that it's not -1.
We had several calls that were making the stronger assumption.  Adjust
them to test specifically for -1 for strict spec compliance.

The standard further leaves open the possibility that the O_NONBLOCK
flag bit is not the only active one in F_SETFL's argument.  Formally,
therefore, one ought to get the current flags with F_GETFL and store
them back with only the O_NONBLOCK bit changed when trying to change
the nonblock state.  In port/noblock.c, we were doing the full pushup
in pg_set_block but not in pg_set_noblock, which is just weird.  Make
both of them do it properly, since they have little business making
any assumptions about the socket they're handed.  The other places
where we're issuing F_SETFL are working with FDs we just got from
pipe(2), so it's reasonable to assume the FDs' properties are all
default, so I didn't bother adding F_GETFL steps there.

Also, while pg_set_block deserves some points for trying to do things
right, somebody had decided that it'd be even better to cast fcntl's
third argument to "long".  Which is completely loony, because POSIX
clearly says the third argument for an F_SETFL call is "int".

Given the lack of field complaints, these missteps apparently are not
of significance on any common platforms.  But they're still wrong,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/30882.1492800880@sss.pgh.pa.us
2017-04-21 15:55:56 -04:00